Thanks Maria. For the input part, in fact what I want to load is a bipartite graph, so nodes are in two separate sets. If I use TextEdgeInputFormat, how could I load data for the nodes? (for example a flag indicating in which set the node is).
On the website it says: In the second case, edges will be read by means of an EdgeInputFormat. If there is additional data for the vertices, it will be read separately by a VertexValueInputFormat. So it seems to me that there should be two separate reads: the first one reads all the edges of the bipartite graph, and the second one reads the nodes with their data. But I can't find any examples of how to do this. 2013/5/15 Maria Stylianou <mars...@gmail.com> > The InputFormat is the code needed to read the input file. So, you cannot > have two InputFormats, you should choose one of the two. > From my understanding, TextEdgeInputFormat is more suitable for you as it > takes exactly the format of your input file: node1 node2 edgeValue > The TextVertexInputFormat reads files with the format: > nodeId nodeValue {list with edges values} > > As for the outputFormat, if you want to print several parameteres/results > from your code, then I would suggest to create your own outputFormat which > will extend the TextVertexOutputFormat, and in the convertVertexToLine() > you can say what to be printed from each vertex. > For example you have this error calculated by each vertex and you can > retrieve this error from the public method getError(). In > the convertVertexToLine(), you can have > int error = ((yourMainCodeName) vertex).getError(); > > and then you shape the line to be printed from each vertex, for example: > Text line = new Text("vertexId: + vertex.getId().toString() + ", error:" + > error); > return new Text(line); > > I hope I didn't make it more complicated :) > Cheers, > > On Wed, May 15, 2013 at 12:27 PM, Han JU <ju.han.fe...@gmail.com> wrote: > >> Hi, >> >> Some questions: >> >> - My input file is a text file with edges: node1 node2 edgeValue, I >> figured it out that I should use TextEdgeInputFormat and >> TextVertexValueInputFormat. But how do these two things fit together? >> Should I prepare another file that contains only the node informations for >> VertexValueInputFormat? >> >> - If the input file is a sequence file, how should I implement a >> SequenceEdgeInputFormat or SequenceVertexInputFormat? Or they exist already? >> >> - For output part, what I need to do is after the calculation >> terminates, every vertex need to output many lines. This could be big (for >> a dataset the output size is 400GB). I found only the TextVertexOuputFormat >> but it seems to output a single line per vertex. How should I achieve this? >> >> Thanks a lot! >> >> -- >> *JU Han* >> >> Software Engineer Intern @ KXEN Inc. >> UTC - Université de Technologie de Compiègne >> * **GI06 - Fouille de Données et Décisionnel* >> >> +33 0619608888 >> > > > > -- > Maria Stylianou > Intern at Telefonica, Barcelona, Spain > marsty5.wordpress.com > > -- *JU Han* Software Engineer Intern @ KXEN Inc. UTC - Université de Technologie de Compiègne * **GI06 - Fouille de Données et Décisionnel* +33 0619608888