With any generated ID like a hash, there will always be the possibility of a collision (different ids creating the same generated id). However, because you are using a long, the size of the hash space is quite large. a collision won't become likely until you have around 4 billion vertexes. If your graph has, say, 10 million vertexes, you can be 99.97% sure there are no collisions. Put another way, you would have to generate 3700 graphs. each with 10 million vertexes, before you got one with a single collision.
Your other options are: * Manage your ids, using a cross-reference table, so that you guarantee a one-to-one relationship between the id and the long. * Change the classes you are using in Giraph to use Text instead of Long for the vertex ids. ________________________________ From: Panagiotis Eustratiadis [ep.pan....@gmail.com] Sent: Tuesday, July 29, 2014 3:14 AM To: user@giraph.apache.org Subject: Generating unique vertex id's for addVertexRequest Hello everyone, I'm looking for a way to generate unique id's (of type Long) for the addVertexRequest. For example, a very silly implementation that works for graphs with less than 100 vertices would look like this: public void compute(Iterable<NullWritable> messages) { ... long generatedId = generateId(long getId().get()); addVertexRequest(new LongWritable(generatedId), new DoubleWritable(0)); ... } private long generateId(long seed) { return seed + 100; } But as I said, this is just silly. How can I modify the generateId so that I know the vertex id is unique regardless of the graph size? Panagiotis Eustratiadis.