Hi guys, Here's a follow-up on this based on everyone's positive feedback.
I was able to load edges and vertices separately, leveraging the -eip (and -eif) and -vip (and -eip) flags. Implementation wise, since I may have disconnected vertices, I went ahead with creating one EdgeReader and one VertexReader. Those readers are initialised in their respective InputFormat classes. Once done and tested, I have implemented reversibility, using IntNullReverseTextEdgeInputFormat as an example, by extending the edge input format class. I am now able to load edges and vertices in two files: vertex_id, property_A, property_B, property_C for vertices definition and vertex_id_source, vertex_id_target for edges, with implicitly creating a reverse edge for each edge. Works like a charm! On 11 March 2015 at 19:23, Matthew Saltz <sal...@gmail.com> wrote: > Hi, > > I believe the answer to your question is yes, though I've never done it. > If you use only the edge reader, only the vertices in your graph that have > at least one edge attached to them will be present in your graph. So, if > you have vertices that are entirely disconnected that you want included, > you'd need to do both a VertexReader and an Edge Reader (though I've never > done this). If you don't have disconnected vertices, you don't need the > VertexReader because Giraph will automatically add all of the vertices in > your edge file to the graph (I think this can be disabled). Use the -eip > flag to specify the edge file. > > Best, > Matthew Saltz > > On Wed, Mar 11, 2015 at 1:54 AM, G.W. <gwindel...@gmail.com> wrote: > >> Thanks for that! >> >> This is the right idea, however I was only using a VertexReader until now >> – IntNullReverseTextEdgeInputFormat calls for an EdgeReader. >> >> I am not sure this is the way it works but I like the idea of segregating >> edge and vertex definitions. >> >> *That leads to the following questions: can Giraph support the use of a >> VertexReader and EdgeReader at the same time, that is through the -vif and >> -eif arguments? * >> >> If that works, the idea would be to refactor my input files with: >> >> Vertices: >> vertex_id, vertex_type, ... >> >> Edges >> source_id, target_id >> >> with the edge reader working in "reverse" mode. >> >> Thanks! >> >> >> >> >> On 10 March 2015 at 20:02, Matthew Saltz <sal...@gmail.com> wrote: >> >>> Have a look at IntNullReverseTextEdgeInputFormat >>> <https://giraph.apache.org/apidocs/org/apache/giraph/io/formats/IntNullReverseTextEdgeInputFormat.html>. >>> It automatically creates reverse edges, but it expects the file format >>> >>> <source_id, target_id> >>> >>> on each line. If you need to convert it to use longs you can change the >>> code pretty easily. >>> >>> Best, >>> Matthew >>> >>> On Tue, Mar 10, 2015 at 5:37 AM, Young Han <young....@uwaterloo.ca> >>> wrote: >>> >>>> The input is assumed to be the vertex followed by a set of *directed* >>>> edges. So, in your example, leaving out E2 means that the final graph will >>>> not have the directed edge from V2 to V1. To get an undirected edge, you >>>> need a pair of directed edges. >>>> >>>> Internally, Giraph stores the out-edges of each vertex as an adjacency >>>> list at that vertex. So, for example, your undirected graph becomes a >>>> vertex object V1 with an adjacency list {V2} and a vertex object V2 with an >>>> adjacency list {V1}. The directed graph would be a vertex V1 with adjacency >>>> list {V2} and a vertex V2 with an empty adjacency list {}. >>>> >>>> There's no easy way for Giraph to infer that V2's adjacency list should >>>> contain V1, because V2 does not track its in-edges. To get around this, you >>>> can either (1) use an undirected input file with both pairs of edges >>>> present; (2) have, in your algorithms, all vertices broadcast their ids to >>>> their out-edge neighbours and then perform mutations to add the missing >>>> edges in the first two superstep; or (3) modify the code in >>>> org.apache.giraph.io.* (in giraph-core) to cache and add missing edges >>>> (i.e., add a new "type" of input format). I'm fairly certain that there >>>> doesn't already exist an "assume undirected graph" input reader, but I'm >>>> not too familiar with the code paths and options there so I could be wrong. >>>> >>>> Young >>>> >>>> On Mon, Mar 9, 2015 at 11:54 PM, G.W. <gwindel...@gmail.com> wrote: >>>> >>>>> Hi Giraph Mailing List, >>>>> >>>>> I am writing about an undirected graph I am trying to move to Giraph. >>>>> I have a question about the assumption Giraph makes when processing an >>>>> input. >>>>> >>>>> Let V1 and V2, two vertices connected with a common edge. E1 defines >>>>> an edge from V1 to V2. E2 defines an edge from V2 to V1. >>>>> >>>>> Simply put, these are defined in an input file as: >>>>> V1, E1 >>>>> V2, E2 >>>>> >>>>> This is working fine, I can process the graph accordingly. >>>>> >>>>> I was trying to see what would happen if I was to simplify the input >>>>> file: >>>>> V1, E1 >>>>> V2 >>>>> >>>>> When would come the time that V2 is processed in a superstep, Giraph >>>>> would not suggest E1 as an outgoing edge. My questions is why, knowing >>>>> that E1 defines an edge from V1 to V2. The graph being undirected >>>>> (although >>>>> there is no provision for that in my Giraph computation), shouldn't Giraph >>>>> assume that V2 is connected to V1? >>>>> >>>>> Down the road, the idea would be to streamline the input format, hence >>>>> my question. >>>>> >>>>> Thanks! >>>>> >>>>> >>>>> >>>> >>> >> >