Re: GraphX with UUID vertex IDs instead of Long

Christopher Nguyen Mon, 24 Feb 2014 13:05:33 -0800

Deepak, to be sure, I was referring to sequential guarantees with the longs.

I would suggest being careful with taking half the UUID as the probability
of collision can be unexpectedly high. Many bits of the UUID is typically
time-based so collision among those bits is virtually guaranteed with
probability 1 when parallelized. Even if you can optimistically find some
64 uniformly random bits to use, due to the birthday paradox, the collision
probability among 1 billion (2^32) values is something like 1 - exp(-1/2),
or a very uncomfortable 40%. If you have orders of magnitude fewer
edges/vertices, you'd have a wider margin of safety---but estimate it to be
sure.

--
Christopher T. Nguyen
Co-founder & CEO, Adatao <http://adatao.com>
linkedin.com/in/ctnguyen

On Mon, Feb 24, 2014 at 12:38 PM, Deepak Nulu <deepakn...@gmail.com> wrote:

> Thanks Christopher, I will look into the StackOverflow suggestion of
> generating 64-bit UUIDs in the same fashion as 128-bit UUIDs.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-with-UUID-vertex-IDs-instead-of-Long-tp1953p1990.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: GraphX with UUID vertex IDs instead of Long

Reply via email to