« Return to Thread: [vwnc] [vw76] Point>>hash still not good enough

Re: [vwnc] [vw76] Point>>hash still not good enough

by Andres Valloud-3 :: Rate this Message:

Reply to Author | View in Thread

Failing that, there is always subclassing:

Point subclass: #PointForThisParticularUsage

PointForThisParticularUsage>>hash

  ^self someValueThatMakesSenseInThisParticularContext

Andres.

Hernan Wilkinson wrote:

> Hi all,
>  I agree with Andres when he says that it is impossible to have the best
> solution. For me, it means that having a message #hash as responsibility of
> the objects is not the right solution. Another object should have that
> responsibility therefore different hash algorithms could be use depending on
> the context... for sure somebody else thought about it, but here are some
> Smalltalks lines of what I mean:
>
>   Set hashingWith: [ :aPoint | aPoint quickHash ].
>   Set hashingWith: [ :aPoint | aPoint goodDistributionHash ].
>   Set hashingWith: [ :aPoint | aPoint x bitXor: aPoint y ].
>   etc.
>
> Bye,
> Hernan.
>
>
>
> On Mon, May 12, 2008 at 6:13 PM, Andres Valloud <AVALLOUD@...>
> wrote:
>
>  
>> Reinout,
>>
>> In the past, your comments made me conclude that you prefer hash
>> functions that take little time to evaluate over hash functions that
>> distribute hash values very well but take their time to do so.
>>
>> However, now that Point>>hash substantially improves what was shipped
>> with VW 7.5 and before in terms of distribution while keeping
>> computational complexity at some sort of bare minimum, it is also a
>> problem because it does not distribute well enough in cases in which it
>> seems that a 2d array (or hash bucket strategy as opposed to open
>> addressing linear probing) would do better for you.
>>
>> What I am basically hearing in response to this situation is "make the
>> hash function more expensive so my problem goes away".  However, if
>> Point>>hash is modified to do more work in order to distribute better,
>> then it starts becoming a target of your previous admonitions regarding
>> execution speed.
>>
>> While I do not necessarily disagree with improving the distribution of
>> Point>>hash further at some expense of hash value computation, my
>> concern is that it is not possible to win.
>>
>> Regarding the hash analysis tool, the stated measurements are direct
>> predictors of hashed collection performance.  See chapter 2 of the hash
>> book.
>>
>> Andres.
>>
>>
>> Reinout Heeck wrote:
>>    
>>> On May 9, 2008, at 9:33 PM, Andres Valloud wrote:
>>>
>>>
>>>      
>>>> Reinout Heeck wrote:
>>>>
>>>>        
>>>>> Valloud, Andres wrote:
>>>>>
>>>>>
>>>>>          
>>>>>> Hmmm... I think what is going on is that clustering is not letting
>>>>>> the
>>>>>> dictionary do its job quickly.
>>>>>>
>>>>>>
>>>>>>            
>>>>> Yes, it surprised me that this case was not fixed in the hashing
>>>>> overhaul.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>          
>>>> For all hash functions there will be datasets that cause them to
>>>> behave
>>>> badly.  This cannot be avoided.
>>>>
>>>>
>>>>        
>>>>> I had hoped these [value judgment] days would be over...
>>>>>
>>>>>          
>>> Heh :-)
>>>
>>> What I meant there was the days of worrying about clustered values.
>>>
>>>
>>>
>>> More generally, before the hashing overhaul I had run into three
>>> problems repeatedly:
>>> 1) Poor performance of hashed collections holding strings.
>>> 2) Poor performance of hashed collections holding clustered integer
>>> values.
>>> 3) Poor abstraction of hash combination 'for dummies' leading to poor
>>> #hash implementations on domain objects.
>>>
>>> An earlier conversation we had extinguished my hopes for any solution
>>> to 3) in the overhaul but I had fully expected both 1) and 2) to be
>>> resolved by the redesign. Now I get the impression only 1) has been
>>> addressed adequately.
>>>
>>> You seem to be dismissing 2) as being a special case that needs not
>>> perform with the base collection implementations. Since this is a
>>> value judgment we can keep debating this, instead I can provide you a
>>> simple datapoint: my shop would be relieved when 2) is addressed.
>>>
>>>
>>> I had a quick look (for the first time) at the hash analysis tool in
>>> order to measure performance of some hashed collection access
>>> patterns. I only found quality assessments of #hash implementations
>>> for integer based values but no benchmark tool to measure run time of
>>> collection accesses - it seems this is not supported. Is there some
>>> other tool I need to load in order to measure the performance of
>>> hashed collections?
>>>
>>>
>>>
>>> Thanks for your suggestions regarding Point>>hash, I'll experiment
>>> with them when I'm back at work (bank holiday here today).
>>>
>>> Cheers,
>>>
>>> Reinout
>>> -------
>>>
>>> _______________________________________________
>>> vwnc mailing list
>>> vwnc@...
>>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>>
>>>
>>>      
>> _______________________________________________
>> vwnc mailing list
>> vwnc@...
>> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>>
>>    
>
>  
> ------------------------------------------------------------------------
>
> _______________________________________________
> vwnc mailing list
> vwnc@...
> http://lists.cs.uiuc.edu/mailman/listinfo/vwnc
>  

_______________________________________________
vwnc mailing list
vwnc@...
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

 « Return to Thread: [vwnc] [vw76] Point>>hash still not good enough

LightInTheBox - Buy quality products at wholesale price