Hi Han, You are correct: if you are loading the graph with an EdgeInputFormat, but also need to load additional data for vertices, you want to use a VertexValueInputFormat. You can see an example in TestEdgeInput.
Alessandro From: Han JU <ju.han.fe...@gmail.com<mailto:ju.han.fe...@gmail.com>> Reply-To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" <user@giraph.apache.org<mailto:user@giraph.apache.org>> Date: Wednesday, May 15, 2013 9:00 AM To: "user@giraph.apache.org<mailto:user@giraph.apache.org>" <user@giraph.apache.org<mailto:user@giraph.apache.org>> Subject: Re: Questions on input/output format Thanks Maria. For the input part, in fact what I want to load is a bipartite graph, so nodes are in two separate sets. If I use TextEdgeInputFormat, how could I load data for the nodes? (for example a flag indicating in which set the node is). On the website it says: In the second case, edges will be read by means of an EdgeInputFormat. If there is additional data for the vertices, it will be read separately by a VertexValueInputFormat. So it seems to me that there should be two separate reads: the first one reads all the edges of the bipartite graph, and the second one reads the nodes with their data. But I can't find any examples of how to do this. 2013/5/15 Maria Stylianou <mars...@gmail.com<mailto:mars...@gmail.com>> The InputFormat is the code needed to read the input file. So, you cannot have two InputFormats, you should choose one of the two. >From my understanding, TextEdgeInputFormat is more suitable for you as it >takes exactly the format of your input file: node1 node2 edgeValue The TextVertexInputFormat reads files with the format: nodeId nodeValue {list with edges values} As for the outputFormat, if you want to print several parameteres/results from your code, then I would suggest to create your own outputFormat which will extend the TextVertexOutputFormat, and in the convertVertexToLine() you can say what to be printed from each vertex. For example you have this error calculated by each vertex and you can retrieve this error from the public method getError(). In the convertVertexToLine(), you can have int error = ((yourMainCodeName) vertex).getError(); and then you shape the line to be printed from each vertex, for example: Text line = new Text("vertexId: + vertex.getId().toString() + ", error:" + error); return new Text(line); I hope I didn't make it more complicated :) Cheers, On Wed, May 15, 2013 at 12:27 PM, Han JU <ju.han.fe...@gmail.com<mailto:ju.han.fe...@gmail.com>> wrote: Hi, Some questions: - My input file is a text file with edges: node1 node2 edgeValue, I figured it out that I should use TextEdgeInputFormat and TextVertexValueInputFormat. But how do these two things fit together? Should I prepare another file that contains only the node informations for VertexValueInputFormat? - If the input file is a sequence file, how should I implement a SequenceEdgeInputFormat or SequenceVertexInputFormat? Or they exist already? - For output part, what I need to do is after the calculation terminates, every vertex need to output many lines. This could be big (for a dataset the output size is 400GB). I found only the TextVertexOuputFormat but it seems to output a single line per vertex. How should I achieve this? Thanks a lot! -- JU Han Software Engineer Intern @ KXEN Inc. UTC - Université de Technologie de Compiègne GI06 - Fouille de Données et Décisionnel +33 0619608888<tel:%2B33%200619608888> -- Maria Stylianou Intern at Telefonica, Barcelona, Spain marsty5.wordpress.com<https://urldefense.proofpoint.com/v1/url?u=http://marsty5.wordpress.com&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=%2FMA1LhQgHDYDN0ev6g1A8WJ2iz4%2BSCOorkHoIjBigDA%3D%0A&m=ly1A8EW%2B3qxkaL%2FBzR1bV2EBVXa8HN2%2BMev54iKnLVA%3D%0A&s=4215b3523644bf03776f9b045354be8f31f9fe8f05f34725312e7270bc5931d0> -- JU Han Software Engineer Intern @ KXEN Inc. UTC - Université de Technologie de Compiègne GI06 - Fouille de Données et Décisionnel +33 0619608888