« Return to Thread: [vwnc] [vw76] Point>>hash still not good enough

Re: [vwnc] [vw76] Point>>hash still not good enough

by Reinout Heeck-2 :: Rate this Message:

Reply to Author | View in Thread

Valloud, Andres wrote:
> Hmmm... I think what is going on is that clustering is not letting the
> dictionary do its job quickly.
Yes, it surprised me that this case was not fixed in the hashing overhaul.


> There are a number of things to observe.  First is that doing this,
>
>     d := Dictionary new: 1300000.
>  

Here the client of the dictionary has to be aware of the deficiencies in
int hashedcollection implementations and tune it's way around it. IMO
that is the wrong place (repetitious, fragile and in general cases like
library code impossible to pre-tune).
I had hoped these days would be over...

> The value judgment here
> was that in practice it would be rare to see a dictionary of all the
> integer points between 0@0 and 999@999 (or 1@1 and 1000@1000), and so
> the simpler hash function already provides far more value than the old
> one.
>  
In our practice this is not so rare, rougly in any system that works
with integer IDs (in our case order numbers) you will tend to see such
clusters of numbers being used as dictioanry keys or set members. This
time around I ran into it with UI code though, I'm busy overlaying the
display of dataset cells with a short animation whenever the value in a
cell changes, here I need a map from cell coordinates to some helper
objects and indeed the coordinates of these Points come mostly in clusters.

My point is that the value judgment above does not hold water in my
daily experience (whit the caveat that I am putting strong demands on
the base library, I want my code to be able to use a hashed collection
of points when I mean that - I would like that such code to be
performant enough without the need for cluttering it with special
collection tuning at every incarnation).

So (denying this value judgment) I conclude there is an impedance
mismatch between the new #hash implementations and the hashed
collections that use them:
the new hashing may yield clustered hash values if the values being
hashed are clustered - on the other hand the hashed collections break
down with such clustered hash values.
My first proposal tried a quick fix on the first half - the hashing
implementation, but as you point out it has quality issues. Another
solution to removing the impedance mismatch would be to fix the
collections, for example by throwing in an extra #hashMultiply in
#findKeyOrNil:, I'll play some more with that idea coming week.


> The last observation to make is that nothing pays more than knowing your
> data set.
But that comes at a price, you will have to be prepared to have your
client code express these details.
In large applications that need to be maintainable in the light of
frequent requirement changes I want to be able to write code as naively
as possible, so that only business rules are expressed - I don't want
future maintainers to be bogged down by tuning issues when they need to
concentrate on business functionality. Hence my denial of the value
judgment that was applied.


> PS: did you get to the section of the hash book that talks about
> Point>>hash yet? :)
>  
No, exercise 1.12 (Fermat's last theorem) has got a total grip on my
mind, when I thought to myself 'how would a hardware hacker solve that'
I had the (mis?)fortune of having a vague insight on how to restate the
problem so that a weaker proof is required. Whenever I pick up the book
now I cannot help myself - I just wander off thinking about that
exercise. So I guess I'll have to bite the bullet and sit down and write
out my thoughts on this - get it out of my system so I can continue with
your book :-)

R
-
_______________________________________________
vwnc mailing list
vwnc@...
http://lists.cs.uiuc.edu/mailman/listinfo/vwnc

 « Return to Thread: [vwnc] [vw76] Point>>hash still not good enough

LightInTheBox - Buy quality products at wholesale price