Cool, I didn't know that :) So in the command line we have the -eif for the edgeInputFormat and -vif for the vertexInputFormat? Keep us updated how it works and what other difficulties you may have!
On Wed, May 15, 2013 at 6:36 PM, Alessandro Presta <alessan...@fb.com>wrote: > Hi Han, > > You are correct: if you are loading the graph with an EdgeInputFormat, > but also need to load additional data for vertices, you want to use a > VertexValueInputFormat. > You can see an example in TestEdgeInput. > > Alessandro > > From: Han JU <ju.han.fe...@gmail.com> > Reply-To: "user@giraph.apache.org" <user@giraph.apache.org> > Date: Wednesday, May 15, 2013 9:00 AM > To: "user@giraph.apache.org" <user@giraph.apache.org> > Subject: Re: Questions on input/output format > > Thanks Maria. > > For the input part, in fact what I want to load is a bipartite graph, so > nodes are in two separate sets. If I use TextEdgeInputFormat, how could I > load data for the nodes? (for example a flag indicating in which set the > node is). > > On the website it says: In the second case, edges will be read by means > of an EdgeInputFormat. If there is additional data for the vertices, it > will be read separately by a VertexValueInputFormat. So it seems to me > that there should be two separate reads: the first one reads all the edges > of the bipartite graph, and the second one reads the nodes with their data. > But I can't find any examples of how to do this. > > > > > 2013/5/15 Maria Stylianou <mars...@gmail.com> > >> The InputFormat is the code needed to read the input file. So, you >> cannot have two InputFormats, you should choose one of the two. >> From my understanding, TextEdgeInputFormat is more suitable for you as it >> takes exactly the format of your input file: node1 node2 edgeValue >> The TextVertexInputFormat reads files with the format: >> nodeId nodeValue {list with edges values} >> >> As for the outputFormat, if you want to print several >> parameteres/results from your code, then I would suggest to create your own >> outputFormat which will extend the TextVertexOutputFormat, and in >> the convertVertexToLine() you can say what to be printed from each vertex. >> For example you have this error calculated by each vertex and you can >> retrieve this error from the public method getError(). In >> the convertVertexToLine(), you can have >> int error = ((yourMainCodeName) vertex).getError(); >> >> and then you shape the line to be printed from each vertex, for example: >> Text line = new Text("vertexId: + vertex.getId().toString() + ", error:" >> + error); >> return new Text(line); >> >> I hope I didn't make it more complicated :) >> Cheers, >> >> On Wed, May 15, 2013 at 12:27 PM, Han JU <ju.han.fe...@gmail.com> wrote: >> >>> Hi, >>> >>> Some questions: >>> >>> - My input file is a text file with edges: node1 node2 edgeValue, I >>> figured it out that I should use TextEdgeInputFormat and >>> TextVertexValueInputFormat. But how do these two things fit together? >>> Should I prepare another file that contains only the node informations for >>> VertexValueInputFormat? >>> >>> - If the input file is a sequence file, how should I implement a >>> SequenceEdgeInputFormat or SequenceVertexInputFormat? Or they exist already? >>> >>> - For output part, what I need to do is after the calculation >>> terminates, every vertex need to output many lines. This could be big (for >>> a dataset the output size is 400GB). I found only the TextVertexOuputFormat >>> but it seems to output a single line per vertex. How should I achieve this? >>> >>> Thanks a lot! >>> >>> -- >>> *JU Han* >>> >>> Software Engineer Intern @ KXEN Inc. >>> UTC - Université de Technologie de Compiègne >>> * **GI06 - Fouille de Données et Décisionnel* >>> >>> +33 0619608888 >>> >> >> >> >> -- >> Maria Stylianou >> Intern at Telefonica, Barcelona, Spain >> >> marsty5.wordpress.com<https://urldefense.proofpoint.com/v1/url?u=http://marsty5.wordpress.com&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=%2FMA1LhQgHDYDN0ev6g1A8WJ2iz4%2BSCOorkHoIjBigDA%3D%0A&m=ly1A8EW%2B3qxkaL%2FBzR1bV2EBVXa8HN2%2BMev54iKnLVA%3D%0A&s=4215b3523644bf03776f9b045354be8f31f9fe8f05f34725312e7270bc5931d0> >> >> > > > -- > *JU Han* > > Software Engineer Intern @ KXEN Inc. > UTC - Université de Technologie de Compiègne > * **GI06 - Fouille de Données et Décisionnel* > > +33 0619608888 > -- Maria Stylianou Intern at Telefonica, Barcelona, Spain marsty5.wordpress.com<https://urldefense.proofpoint.com/v1/url?u=http://marsty5.wordpress.com&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=%2FMA1LhQgHDYDN0ev6g1A8WJ2iz4%2BSCOorkHoIjBigDA%3D%0A&m=ly1A8EW%2B3qxkaL%2FBzR1bV2EBVXa8HN2%2BMev54iKnLVA%3D%0A&s=4215b3523644bf03776f9b045354be8f31f9fe8f05f34725312e7270bc5931d0>