Thanks for that!

This is the right idea, however I was only using a VertexReader until now
– IntNullReverseTextEdgeInputFormat calls for an EdgeReader.

I am not sure this is the way it works but I like the idea of segregating
edge and vertex definitions.

*That leads to the following questions: can Giraph support the use of a
VertexReader and EdgeReader at the same time, that is through the -vif and
-eif arguments? *

If that works, the idea would be to refactor my input files with:

Vertices:
vertex_id, vertex_type, ...

Edges
source_id, target_id

with the edge reader working in "reverse" mode.

Thanks!




On 10 March 2015 at 20:02, Matthew Saltz <sal...@gmail.com> wrote:

> Have a look at IntNullReverseTextEdgeInputFormat
> <https://giraph.apache.org/apidocs/org/apache/giraph/io/formats/IntNullReverseTextEdgeInputFormat.html>.
> It automatically creates reverse edges, but it expects the file format
>
> <source_id, target_id>
>
> on each line. If you need to convert it to use longs you can change the
> code pretty easily.
>
> Best,
> Matthew
>
> On Tue, Mar 10, 2015 at 5:37 AM, Young Han <young....@uwaterloo.ca> wrote:
>
>> The input is assumed to be the vertex followed by a set of *directed*
>> edges. So, in your example, leaving out E2 means that the final graph will
>> not have the directed edge from V2 to V1. To get an undirected edge, you
>> need a pair of directed edges.
>>
>> Internally, Giraph stores the out-edges of each vertex as an adjacency
>> list at that vertex. So, for example, your undirected graph becomes a
>> vertex object V1 with an adjacency list {V2} and a vertex object V2 with an
>> adjacency list {V1}. The directed graph would be a vertex V1 with adjacency
>> list {V2} and a vertex V2 with an empty adjacency list {}.
>>
>> There's no easy way for Giraph to infer that V2's adjacency list should
>> contain V1, because V2 does not track its in-edges. To get around this, you
>> can either (1) use an undirected input file with both pairs of edges
>> present; (2) have, in your algorithms, all vertices broadcast their ids to
>> their out-edge neighbours and then perform mutations to add the missing
>> edges in the first two superstep; or (3) modify the code in
>> org.apache.giraph.io.* (in giraph-core) to cache and add missing edges
>> (i.e., add a new "type" of input format). I'm fairly certain that there
>> doesn't already exist an "assume undirected graph" input reader, but I'm
>> not too familiar with the code paths and options there so I could be wrong.
>>
>> Young
>>
>> On Mon, Mar 9, 2015 at 11:54 PM, G.W. <gwindel...@gmail.com> wrote:
>>
>>> Hi Giraph Mailing List,
>>>
>>> I am writing about an undirected graph I am trying to move to Giraph. I
>>> have a question about the assumption Giraph makes when processing an input.
>>>
>>> Let V1 and V2, two vertices connected with a common edge. E1 defines an
>>> edge from V1 to V2. E2 defines an edge from V2 to V1.
>>>
>>> Simply put, these are defined in an input file as:
>>> V1, E1
>>> V2, E2
>>>
>>> This is working fine, I can process the graph accordingly.
>>>
>>> I was trying to see what would happen if I was to simplify the input
>>> file:
>>> V1, E1
>>> V2
>>>
>>> When would come the time that V2 is processed in a superstep, Giraph
>>> would not suggest E1 as an  outgoing edge. My questions is why, knowing
>>> that E1 defines an edge from V1 to V2. The graph being undirected (although
>>> there is no provision for that in my Giraph computation), shouldn't Giraph
>>> assume that V2 is connected to V1?
>>>
>>> Down the road, the idea would be to streamline the input format, hence
>>> my question.
>>>
>>> Thanks!
>>>
>>>
>>>
>>
>

Reply via email to