With any generated ID like a hash, there will always be the possibility of a 
collision (different ids creating the same generated id).  However, because you 
are using a long, the size of the hash space is quite large.  a collision won't 
become likely until you have around 4 billion vertexes.  If your graph has, 
say, 10 million vertexes, you can be 99.97% sure there are no collisions.  Put 
another way, you would have to generate  3700 graphs. each with 10 million 
vertexes, before you got one with a single collision.

Your other options are:

* Manage your ids, using a cross-reference table, so that you guarantee a 
one-to-one relationship between the id and the long.

* Change the classes you are using in Giraph to use Text instead of Long for 
the vertex ids.


________________________________
From: Panagiotis Eustratiadis [ep.pan....@gmail.com]
Sent: Tuesday, July 29, 2014 3:14 AM
To: user@giraph.apache.org
Subject: Generating unique vertex id's for addVertexRequest

Hello everyone,

I'm looking for a way to generate unique id's (of type Long) for the 
addVertexRequest. For example, a very silly implementation that works for 
graphs with less than 100 vertices would look like this:

public void compute(Iterable<NullWritable> messages) {
...
    long generatedId = generateId(long getId().get());
    addVertexRequest(new LongWritable(generatedId), new DoubleWritable(0));
...
}

private long generateId(long seed) {
    return seed + 100;
}

But as I said, this is just silly. How can I modify the generateId so that I 
know the vertex id is unique regardless of the graph size?

Panagiotis Eustratiadis.

Reply via email to