Hi Han,

You are correct: if you are loading the graph with an EdgeInputFormat, but also 
need to load additional data for vertices, you want to use a 
VertexValueInputFormat.
You can see an example in TestEdgeInput.

Alessandro

From: Han JU <ju.han.fe...@gmail.com<mailto:ju.han.fe...@gmail.com>>
Reply-To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
<user@giraph.apache.org<mailto:user@giraph.apache.org>>
Date: Wednesday, May 15, 2013 9:00 AM
To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" 
<user@giraph.apache.org<mailto:user@giraph.apache.org>>
Subject: Re: Questions on input/output format

Thanks Maria.

For the input part, in fact what I want to load is a bipartite graph, so nodes 
are in two separate sets. If I use TextEdgeInputFormat, how could I load data 
for the nodes? (for example a flag indicating in which set the node is).

On the website it says: In the second case, edges will be read by means of an 
EdgeInputFormat. If there is additional data for the vertices, it will be read 
separately by a VertexValueInputFormat. So it seems to me that there should be 
two separate reads: the first one reads all the edges of the bipartite graph, 
and the second one reads the nodes with their data. But I can't find any 
examples of how to do this.




2013/5/15 Maria Stylianou <mars...@gmail.com<mailto:mars...@gmail.com>>
The InputFormat is the code needed to read the input file. So, you cannot have 
two InputFormats, you should choose one of the two.
>From my understanding, TextEdgeInputFormat is more suitable for you as it 
>takes exactly the format of your input file: node1 node2 edgeValue
The TextVertexInputFormat reads files with the format:
nodeId nodeValue {list with edges values}

As for the outputFormat, if you want to print several parameteres/results from 
your code, then I would suggest to create your own outputFormat which will 
extend the TextVertexOutputFormat, and in the convertVertexToLine() you can say 
what to be printed from each vertex.
For example you have this error calculated by each vertex and you can retrieve 
this error from the public method getError(). In the convertVertexToLine(), you 
can have
int error = ((yourMainCodeName) vertex).getError();

and then you shape the line to be printed from each vertex, for example:
Text line = new Text("vertexId: + vertex.getId().toString() + ", error:" + 
error);
return new Text(line);

I hope I didn't make it more complicated :)
Cheers,

On Wed, May 15, 2013 at 12:27 PM, Han JU 
<ju.han.fe...@gmail.com<mailto:ju.han.fe...@gmail.com>> wrote:
Hi,

Some questions:

  - My input file is a text file with edges: node1 node2 edgeValue, I figured 
it out that I should use TextEdgeInputFormat and TextVertexValueInputFormat. 
But how do these two things fit together? Should I prepare another file that 
contains only the node informations for VertexValueInputFormat?

  - If the input file is a sequence file, how should I implement a 
SequenceEdgeInputFormat or SequenceVertexInputFormat? Or they exist already?

  - For output part, what I need to do is after the calculation terminates, 
every vertex need to output many lines. This could be big (for a dataset the 
output size is 400GB). I found only the TextVertexOuputFormat but it seems to 
output a single line per vertex. How should I achieve this?

Thanks a lot!

--
JU Han

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
     GI06 - Fouille de Données et Décisionnel

+33 0619608888<tel:%2B33%200619608888>



--
Maria Stylianou
Intern at Telefonica, Barcelona, Spain
marsty5.wordpress.com<https://urldefense.proofpoint.com/v1/url?u=http://marsty5.wordpress.com&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=%2FMA1LhQgHDYDN0ev6g1A8WJ2iz4%2BSCOorkHoIjBigDA%3D%0A&m=ly1A8EW%2B3qxkaL%2FBzR1bV2EBVXa8HN2%2BMev54iKnLVA%3D%0A&s=4215b3523644bf03776f9b045354be8f31f9fe8f05f34725312e7270bc5931d0>




--
JU Han

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
     GI06 - Fouille de Données et Décisionnel

+33 0619608888

Reply via email to