« Return to Thread: [vwnc] [vw76] Point>>hash still not good enough

Re: [vwnc] [vw76] Point>>hash still not good enough

by Andres Valloud-3 :: Rate this Message:

Reply to Author | View in Thread

Reinout Heeck wrote:
> Valloud, Andres wrote:
>  
>> Hmmm... I think what is going on is that clustering is not letting the
>> dictionary do its job quickly.
>>    
> Yes, it surprised me that this case was not fixed in the hashing overhaul.
>
>
>  

For all hash functions there will be datasets that cause them to behave
badly.  This cannot be avoided.

> I had hoped these [value judgment] days would be over...
>  

It is always a value judgment.  We could just resort to using MD5 for
everything, but then that would be too costly.  Plus, it would be a bad
hash if anybody decided to build a dictionary of MD5 collisions --- and
I know some people who have actually dealt with this particular matter.  
In other words, it is not possible to thoroughly "win" this game.

> In our practice this is not so rare, rougly in any system that works
> with integer IDs (in our case order numbers) you will tend to see such
> clusters of numbers being used as dictioanry keys or set members. This
> time around I ran into it with UI code though, I'm busy overlaying the
> display of dataset cells with a short animation whenever the value in a
> cell changes, here I need a map from cell coordinates to some helper
> objects and indeed the coordinates of these Points come mostly in clusters.
>  

Ok, then I guess something more complex is warranted.  Instead of what
you suggested, you may want to try this instead:

Point>>hash

  | xHash yHash |
  xHash := self x hash hashMultiply.
  xHash := (xHash bitShift: -14) bitXor:
    ((xHash bitAnd: 16r3FFF) bitShift: 15).
  ^xHash bitXor: self y hash hashMultiply


There are other, much more complicated implementations which are also
possible, but I'd rather see the need before using heavy handed stuff.

> My point is that the value judgment above does not hold water in my
> daily experience (whit the caveat that I am putting strong demands on
> the base library [...])

Right... and sooner or later it will be the case that somebody runs into
a situation in which the hash function above is also problematic...

> Another
> solution to removing the impedance mismatch would be to fix the
> collections, for example by throwing in an extra #hashMultiply in
> #findKeyOrNil:, I'll play some more with that idea coming week.
>  

Applying hashMultiply to every single hash value will cause problems
(truncation to 28 bits), will disturb perfect hash functions for no good
reason, and may introduce distribution issues that were not present
before its use.  Moreover, it is equivalent to a permutation mod 2^28,
and as such it will not improve distribution when there are no
collisions to begin with.  In other words, it may be more of a
performance hit than a benefit.  I'd strongly recommend against doing that.

>> The last observation to make is that nothing pays more than knowing your
>> data set.
>>    
> But that comes at a price, you will have to be prepared to have your
> client code express these details.
>  

Yes, there is always a price for everything.  This is normal.

>> PS: did you get to the section of the hash book that talks about
>> Point>>hash yet? :)
>>  
>>    
> No, exercise 1.12 (Fermat's last theorem) has got a total grip on my
> mind, when I thought to myself 'how would a hardware hacker solve that'
> I had the (mis?)fortune of having a vague insight on how to restate the
> problem so that a weaker proof is required. Whenever I pick up the book
> now I cannot help myself - I just wander off thinking about that
> exercise. So I guess I'll have to bite the bullet and sit down and write
> out my thoughts on this - get it out of my system so I can continue with
> your book :-)
>  

Heh :)... funny how that happens... I had something like that occur to
me with Concrete Mathematics' exercise 5.112... to show that
$\binom{2n}{n}$ is always divisible by 4 or 9, except when (IIRC) 2n =
64 or 2n = 256.  It is marked as a research problem, although I think
Granville et al solved it in 1996.

Andres.
_______________________________________________
vwnc mailing list
vwnc@...
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

 « Return to Thread: [vwnc] [vw76] Point>>hash still not good enough

LightInTheBox - Buy quality products at wholesale price