Thank you Alessandro. I've learn a lot from the test cases.
2013/5/15 Maria Stylianou <mars...@gmail.com> > Cool, I didn't know that :) So in the command line we have the -eif for > the edgeInputFormat and -vif for the vertexInputFormat? > Keep us updated how it works and what other difficulties you may have! > > > > On Wed, May 15, 2013 at 6:36 PM, Alessandro Presta <alessan...@fb.com>wrote: > >> Hi Han, >> >> You are correct: if you are loading the graph with an EdgeInputFormat, >> but also need to load additional data for vertices, you want to use a >> VertexValueInputFormat. >> You can see an example in TestEdgeInput. >> >> Alessandro >> >> From: Han JU <ju.han.fe...@gmail.com> >> Reply-To: "user@giraph.apache.org" <user@giraph.apache.org> >> Date: Wednesday, May 15, 2013 9:00 AM >> To: "user@giraph.apache.org" <user@giraph.apache.org> >> Subject: Re: Questions on input/output format >> >> Thanks Maria. >> >> For the input part, in fact what I want to load is a bipartite graph, >> so nodes are in two separate sets. If I use TextEdgeInputFormat, how could >> I load data for the nodes? (for example a flag indicating in which set the >> node is). >> >> On the website it says: In the second case, edges will be read by means >> of an EdgeInputFormat. If there is additional data for the vertices, it >> will be read separately by a VertexValueInputFormat. So it seems to me >> that there should be two separate reads: the first one reads all the edges >> of the bipartite graph, and the second one reads the nodes with their data. >> But I can't find any examples of how to do this. >> >> >> >> >> 2013/5/15 Maria Stylianou <mars...@gmail.com> >> >>> The InputFormat is the code needed to read the input file. So, you >>> cannot have two InputFormats, you should choose one of the two. >>> From my understanding, TextEdgeInputFormat is more suitable for you as >>> it takes exactly the format of your input file: node1 node2 edgeValue >>> The TextVertexInputFormat reads files with the format: >>> nodeId nodeValue {list with edges values} >>> >>> As for the outputFormat, if you want to print several >>> parameteres/results from your code, then I would suggest to create your own >>> outputFormat which will extend the TextVertexOutputFormat, and in >>> the convertVertexToLine() you can say what to be printed from each vertex. >>> For example you have this error calculated by each vertex and you can >>> retrieve this error from the public method getError(). In >>> the convertVertexToLine(), you can have >>> int error = ((yourMainCodeName) vertex).getError(); >>> >>> and then you shape the line to be printed from each vertex, for >>> example: >>> Text line = new Text("vertexId: + vertex.getId().toString() + ", error:" >>> + error); >>> return new Text(line); >>> >>> I hope I didn't make it more complicated :) >>> Cheers, >>> >>> On Wed, May 15, 2013 at 12:27 PM, Han JU <ju.han.fe...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> Some questions: >>>> >>>> - My input file is a text file with edges: node1 node2 edgeValue, I >>>> figured it out that I should use TextEdgeInputFormat and >>>> TextVertexValueInputFormat. But how do these two things fit together? >>>> Should I prepare another file that contains only the node informations for >>>> VertexValueInputFormat? >>>> >>>> - If the input file is a sequence file, how should I implement a >>>> SequenceEdgeInputFormat or SequenceVertexInputFormat? Or they exist >>>> already? >>>> >>>> - For output part, what I need to do is after the calculation >>>> terminates, every vertex need to output many lines. This could be big (for >>>> a dataset the output size is 400GB). I found only the TextVertexOuputFormat >>>> but it seems to output a single line per vertex. How should I achieve this? >>>> >>>> Thanks a lot! >>>> >>>> -- >>>> *JU Han* >>>> >>>> Software Engineer Intern @ KXEN Inc. >>>> UTC - Université de Technologie de Compiègne >>>> * **GI06 - Fouille de Données et Décisionnel* >>>> >>>> +33 0619608888 >>>> >>> >>> >>> >>> -- >>> Maria Stylianou >>> Intern at Telefonica, Barcelona, Spain >>> >>> marsty5.wordpress.com<https://urldefense.proofpoint.com/v1/url?u=http://marsty5.wordpress.com&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=%2FMA1LhQgHDYDN0ev6g1A8WJ2iz4%2BSCOorkHoIjBigDA%3D%0A&m=ly1A8EW%2B3qxkaL%2FBzR1bV2EBVXa8HN2%2BMev54iKnLVA%3D%0A&s=4215b3523644bf03776f9b045354be8f31f9fe8f05f34725312e7270bc5931d0> >>> >>> >> >> >> -- >> *JU Han* >> >> Software Engineer Intern @ KXEN Inc. >> UTC - Université de Technologie de Compiègne >> * **GI06 - Fouille de Données et Décisionnel* >> >> +33 0619608888 >> > > > > -- > Maria Stylianou > Intern at Telefonica, Barcelona, Spain > marsty5.wordpress.com<https://urldefense.proofpoint.com/v1/url?u=http://marsty5.wordpress.com&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=%2FMA1LhQgHDYDN0ev6g1A8WJ2iz4%2BSCOorkHoIjBigDA%3D%0A&m=ly1A8EW%2B3qxkaL%2FBzR1bV2EBVXa8HN2%2BMev54iKnLVA%3D%0A&s=4215b3523644bf03776f9b045354be8f31f9fe8f05f34725312e7270bc5931d0> > > -- *JU Han* Software Engineer Intern @ KXEN Inc. UTC - Université de Technologie de Compiègne * **GI06 - Fouille de Données et Décisionnel* +33 0619608888