Thanks for that! This is the right idea, however I was only using a VertexReader until now – IntNullReverseTextEdgeInputFormat calls for an EdgeReader.
I am not sure this is the way it works but I like the idea of segregating edge and vertex definitions. *That leads to the following questions: can Giraph support the use of a VertexReader and EdgeReader at the same time, that is through the -vif and -eif arguments? * If that works, the idea would be to refactor my input files with: Vertices: vertex_id, vertex_type, ... Edges source_id, target_id with the edge reader working in "reverse" mode. Thanks! On 10 March 2015 at 20:02, Matthew Saltz <sal...@gmail.com> wrote: > Have a look at IntNullReverseTextEdgeInputFormat > <https://giraph.apache.org/apidocs/org/apache/giraph/io/formats/IntNullReverseTextEdgeInputFormat.html>. > It automatically creates reverse edges, but it expects the file format > > <source_id, target_id> > > on each line. If you need to convert it to use longs you can change the > code pretty easily. > > Best, > Matthew > > On Tue, Mar 10, 2015 at 5:37 AM, Young Han <young....@uwaterloo.ca> wrote: > >> The input is assumed to be the vertex followed by a set of *directed* >> edges. So, in your example, leaving out E2 means that the final graph will >> not have the directed edge from V2 to V1. To get an undirected edge, you >> need a pair of directed edges. >> >> Internally, Giraph stores the out-edges of each vertex as an adjacency >> list at that vertex. So, for example, your undirected graph becomes a >> vertex object V1 with an adjacency list {V2} and a vertex object V2 with an >> adjacency list {V1}. The directed graph would be a vertex V1 with adjacency >> list {V2} and a vertex V2 with an empty adjacency list {}. >> >> There's no easy way for Giraph to infer that V2's adjacency list should >> contain V1, because V2 does not track its in-edges. To get around this, you >> can either (1) use an undirected input file with both pairs of edges >> present; (2) have, in your algorithms, all vertices broadcast their ids to >> their out-edge neighbours and then perform mutations to add the missing >> edges in the first two superstep; or (3) modify the code in >> org.apache.giraph.io.* (in giraph-core) to cache and add missing edges >> (i.e., add a new "type" of input format). I'm fairly certain that there >> doesn't already exist an "assume undirected graph" input reader, but I'm >> not too familiar with the code paths and options there so I could be wrong. >> >> Young >> >> On Mon, Mar 9, 2015 at 11:54 PM, G.W. <gwindel...@gmail.com> wrote: >> >>> Hi Giraph Mailing List, >>> >>> I am writing about an undirected graph I am trying to move to Giraph. I >>> have a question about the assumption Giraph makes when processing an input. >>> >>> Let V1 and V2, two vertices connected with a common edge. E1 defines an >>> edge from V1 to V2. E2 defines an edge from V2 to V1. >>> >>> Simply put, these are defined in an input file as: >>> V1, E1 >>> V2, E2 >>> >>> This is working fine, I can process the graph accordingly. >>> >>> I was trying to see what would happen if I was to simplify the input >>> file: >>> V1, E1 >>> V2 >>> >>> When would come the time that V2 is processed in a superstep, Giraph >>> would not suggest E1 as an outgoing edge. My questions is why, knowing >>> that E1 defines an edge from V1 to V2. The graph being undirected (although >>> there is no provision for that in my Giraph computation), shouldn't Giraph >>> assume that V2 is connected to V1? >>> >>> Down the road, the idea would be to streamline the input format, hence >>> my question. >>> >>> Thanks! >>> >>> >>> >> >