The only solution i know is usually done via a so-called dictionary outside
of giraph (e.g. for semantic web graphs which also have URIs as IDs),
through a datastore like HBase/Cassandra, basically the hashmap you
mentioned.
While initially computationally expensive, it allows you to scale in the
long run, because adding an edge is just incrementing a counter in the
store and add the mapping.


On Tue, Apr 15, 2014 at 3:33 PM, Martin Neumann <mneum...@spotify.com>wrote:

> Hej,
>
> I have a huge edgelist (several billion edges) where node ID's are URL's.
> The algorithm I want to run needs the ID's to be long and there should be
> no holes in the ID space (so I cant simply hash the URL's).
>
> Is anyone aware of a simple solution that does not require a impractical
> huge hash map?
>
> My idea currently is to load the graph into another giraph job and then
> assigning a number to each node. This way the mapping of number to URL
> would be stored in the Node.
> Problem is that I have to assign the numbers in a sequential way to ensure
> there are no holes and numbers are unique. No Idea if this is even possible
> in Giraph.
>
> Any input is welcome
>
> cheers Martin
>



-- 
   Claudio Martella

Reply via email to