RE: Changing index of a graph

2014-04-16 Thread Pavan Kumar A
= the count Date: Tue, 15 Apr 2014 23:40:39 +0200 Subject: Re: Changing index of a graph From: mneum...@spotify.com To: user@giraph.apache.org I have a pipeline that creates a graph then does some transformations on it (with Giraph). In the end I want to dump it into Neo4j to allow for cypher

Changing index of a graph

2014-04-15 Thread Martin Neumann
Hej, I have a huge edgelist (several billion edges) where node ID's are URL's. The algorithm I want to run needs the ID's to be long and there should be no holes in the ID space (so I cant simply hash the URL's). Is anyone aware of a simple solution that does not require a impractical huge hash

Re: Changing index of a graph

2014-04-15 Thread Claudio Martella
The only solution i know is usually done via a so-called dictionary outside of giraph (e.g. for semantic web graphs which also have URIs as IDs), through a datastore like HBase/Cassandra, basically the hashmap you mentioned. While initially computationally expensive, it allows you to scale in the

Re: Changing index of a graph

2014-04-15 Thread Lukas Nalezenec
Hi, I did same think in two M/R jobs during preprocesing - it was pretty powerful for web graphs but little bit slow. Solution for Giraph is: 1. Implement own partition which will iterate vertices in order. Use appropriate partitioner. 2. During first iteration you need to rename vertexes in

Re: Changing index of a graph

2014-04-15 Thread Martin Neumann
I have a pipeline that creates a graph then does some transformations on it (with Giraph). In the end I want to dump it into Neo4j to allow for cypher queries. I was told that I could make the batch import for Neo4j a lot faster if I would use Long identifiers without holes, and therefore