The only solution i know is usually done via a so-called dictionary outside of giraph (e.g. for semantic web graphs which also have URIs as IDs), through a datastore like HBase/Cassandra, basically the hashmap you mentioned. While initially computationally expensive, it allows you to scale in the long run, because adding an edge is just incrementing a counter in the store and add the mapping.
On Tue, Apr 15, 2014 at 3:33 PM, Martin Neumann <mneum...@spotify.com>wrote: > Hej, > > I have a huge edgelist (several billion edges) where node ID's are URL's. > The algorithm I want to run needs the ID's to be long and there should be > no holes in the ID space (so I cant simply hash the URL's). > > Is anyone aware of a simple solution that does not require a impractical > huge hash map? > > My idea currently is to load the graph into another giraph job and then > assigning a number to each node. This way the mapping of number to URL > would be stored in the Node. > Problem is that I have to assign the numbers in a sequential way to ensure > there are no holes and numbers are unique. No Idea if this is even possible > in Giraph. > > Any input is welcome > > cheers Martin > -- Claudio Martella