Cryptographic hash uniquness (was Simple network client)

View: New views
7 Messages — Rating Filter:   Alert me  

Cryptographic hash uniquness (was Simple network client)

by Peter Verswyvelen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>winds up having a write cache, which is mutable in practice.  The
>interesting thing is that the block's location is the cryptographic
>hash of its contents, which leads to all sorts of neat properties (as
>well as requiring immutability).

That's interesting.  When I developed a version control system for a customer, I also used a cryptographic hash as the database key of file+content in question, but I was afraid I might have clashes (two files with different content generating the same hash)... My intuition told me that the odds of two cryptographic hashes (on meaningful content) colliding was much less than the earth being destroyed by an asteroid... But this is just intuition... What does computer science tell us about this?

Thank you,
Peter







_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Cryptographic hash uniquness (was Simple network client)

by Bulat Ziganshin-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Peter,

Thursday, January 31, 2008, 8:01:36 PM, you wrote:

> files with different content generating the same hash)... My
> intuition told me that the odds of two cryptographic hashes (on
> meaningful content) colliding was much less than the earth being
> destroyed by an asteroid... But this is just intuition... What does
> computer science tell us about this?

you may be interested to know that widely used rsync algorithms relies
on 128-bit hashes and its author speculated about its reliability:

http://samba.org/~tridge/phd_thesis.pdf


--
Best regards,
 Bulat                            mailto:Bulat.Ziganshin@...

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Cryptographic hash uniquness (was Simple network client)

by Lanny Ripple :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Depending on which hash you use you can get upwards of "document
hask keys won't collide before the heat-death of the universe".

There is of course a lot more to it than that.  Google around
about hashing, cryptography, and cryptographic hash functions.
There are many good websites that will go into the "lot more to
it" without having to have a degree in mathematics to understand
them.

   -ljr

Peter Verswyvelen wrote:

>> winds up having a write cache, which is mutable in practice.  The
>> interesting thing is that the block's location is the cryptographic
>> hash of its contents, which leads to all sorts of neat properties (as
>> well as requiring immutability).
>
> That's interesting.  When I developed a version control system for a customer, I also used a cryptographic hash as the database key of file+content in question, but I was afraid I might have clashes (two files with different content generating the same hash)... My intuition told me that the odds of two cryptographic hashes (on meaningful content) colliding was much less than the earth being destroyed by an asteroid... But this is just intuition... What does computer science tell us about this?
>
> Thank you,
> Peter
>
>
>
>
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe@...
> http://www.haskell.org/mailman/listinfo/haskell-cafe

--
Lanny Ripple <lanny@...>
ScmDB / Cisco Systems, Inc.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Cryptographic hash uniquness (was Simple network client)

by Magnus Therning :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 1/31/08, Peter Verswyvelen <bf3@...> wrote:
>winds up having a write cache, which is mutable in practice.  The
>interesting thing is that the block's location is the cryptographic
>hash of its contents, which leads to all sorts of neat properties (as
>well as requiring immutability).

That's interesting.  When I developed a version control system for a customer, I also used a cryptographic hash as the database key of file+content in question, but I was afraid I might have clashes (two files with different content generating the same hash)... My intuition told me that the odds of two cryptographic hashes (on meaningful content) colliding was much less than the earth being destroyed by an asteroid... But this is just intuition... What does computer science tell us about this?

Tongue firmly in cheek...
Well, if you would turn to software engineering you would probably find that "never ever, happens sooner than you think" ;-)

/M


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Cryptographic hash uniquness (was Simple network client)

by Richard Kelsall :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Bulat Ziganshin wrote:

> Hello Peter,
>
> Thursday, January 31, 2008, 8:01:36 PM, you wrote:
>
>> files with different content generating the same hash)... My
>> intuition told me that the odds of two cryptographic hashes (on
>> meaningful content) colliding was much less than the earth being
>> destroyed by an asteroid... But this is just intuition... What does
>> computer science tell us about this?
>
> you may be interested to know that widely used rsync algorithms relies
> on 128-bit hashes and its author speculated about its reliability:
>
> http://samba.org/~tridge/phd_thesis.pdf
>

Interesting paper. Thank you. I had a quick read of the bit relating to
hashes and my understanding is this. He uses a weak, quick and simple
hash to deal with 99.99% (I invented that percentage) of cases - if the
hash is different we know the files are definitely different - if the
hash collides he then does a strong, slow, secure hash and relies on
this as the answer. But he's relying on the strong hash rather than
doing a byte by byte comparison because there is a major cost (a network
transmission of the file) to doing the proper byte by byte comparison.
If you had both files accessible at a low cost it might be better to
byte by byte compare them when you get a collision rather than use
the strong hash.

The right approach may be to assume that collisions will occur and
cater for this properly in the program somehow. A good tip for testing
this sort of thing is to have the length of the hash (maximum size of
the array or whatever you want to test) as a parameter that you can
turn down to a very low value to induce collisions (overflows etc) to
see whether the program still works. And then turn it back up for live
use.

A cryptographic hash appears as a completely random function of the
input so the likelihood of a collision is simply 2^(bits in hash).


Richard.
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Cryptographic hash uniquness (was Simple network client)

by Graham Klyne-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

This is a very late response ... but I did some calculations as part of some
work I did a while ago:
   http://www.ietf.org/rfc/rfc2938.txt
(See appendix A "The birthday paradox")

#g
--

Peter Verswyvelen wrote:
>> winds up having a write cache, which is mutable in practice.  The
>> interesting thing is that the block's location is the cryptographic
>> hash of its contents, which leads to all sorts of neat properties (as
>> well as requiring immutability).
>
> That's interesting.  When I developed a version control system for a customer, I also used a cryptographic hash as the database key of file+content in question, but I was afraid I might have clashes (two files with different content generating the same hash)... My intuition told me that the odds of two cryptographic hashes (on meaningful content) colliding was much less than the earth being destroyed by an asteroid... But this is just intuition... What does computer science tell us about this?
>
> Thank you,
> Peter

--
Graham Klyne
Contact info: http://www.ninebynine.org/#Contact

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: Cryptographic hash uniquness (was Simple network client)

by Richard Kelsall :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Graham Klyne wrote:
> This is a very late response ... but I did some calculations as part of
> some work I did a while ago:
>   http://www.ietf.org/rfc/rfc2938.txt
> (See appendix A "The birthday paradox")
>
> #g

A memorable summary of the birthday paradox being :

There is a 50% chance of a collision when number of digits in your
population size reaches half the number of digits in the largest
possible population.

e.g. With a 128 bit hash you get a 50% chance of a collision when
you've hashed 2^64 files etc.


Richard.

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@...
http://www.haskell.org/mailman/listinfo/haskell-cafe