Hi, I managed to fix it even if I'm still not entirely sure what happened. The fix is to make a new Text object every time a Text is required as input (Text does not implement Cloneable). I guess it
So instead of: Text candidate = e.getTargetVertexId(); ... vertex.setValue(candidate)) The working version is: Text candidate = e.getTargetVertexId(); ... vertex.setValue(new Text(candidate.toString())) @Override public void compute(Vertex<Text, Text, NullWritable> vertex, Iterable<Text> inmessage) throws IOException { boolean changed = false; // first superstep && setup if (getSuperstep() == 0) { //initialize value vertex.setValue(new Text(vertex.getId().toString())); //cheating by checking the neighbors ID's (cuts down 1 iteration) for (Edge<Text, NullWritable> e : vertex.getEdges()) { Text candidate = e.getTargetVertexId(); if (candidate.compareTo(vertex.getValue()) < 0) { changed = true; vertex.setValue(new Text(candidate.toString())); } } } // other superstep else { // read all messages and compare with own state for (Text message : inmessage) { if (message.compareTo(vertex.getValue()) < 0) { changed = true; vertex.setValue(new Text(message.toString())); } } } // if state has changed send a message to all neighbors if (changed && getSuperstep() < limiter) { sendMessageToAllEdges(vertex, new Text(vertex.getValue().toString())); } vertex.voteToHalt(); } On Mon, Mar 3, 2014 at 2:19 PM, Sebastian Schelter <s...@apache.org> wrote: > Hi Martin, > > I'm not sure wether we require InputFormats to be threadsafe. Can someone > answer that question? > > Maybe thats the reason you see this behavior. > > --sebastian > > > > On 03/03/2014 10:05 AM, Martin Neumann wrote: > >> I checked the input just creating the graph and comparing it. While I cant >> say the graph is correct (its to big) its at least consistent. >> >> So the only things where the different output can come from is the >> connected component part (see code further down). I'm completely stomped, >> the code is basically the example code Giraph ships with all I changed was >> replacing the IntWriteable Id's with Text. >> >> Anyone has any Idea what the problem could be, I'm running out of Idea's. >> >> cheers Martin >> >> >> >> @Override >> public void compute(Vertex<Text, Text, NullWritable> vertex, >> Iterable<Text> inmessage) throws IOException { >> boolean changed = false; >> >> // first superstep && setup >> if (getSuperstep() == 0) { >> //initialize value >> vertex.setValue(vertex.getId() >> ); >> //cheating by checking the neighbors ID's (cuts down 1 >> iteration) >> for (Edge<Text, NullWritable> e : vertex.getEdges()) { >> Text candidate = e.getTargetVertexId(); >> if (candidate.compareTo(vertex.getValue()) < 0) { >> changed = true; >> vertex.setValue(candidate); >> } >> } >> } >> >> // other superstep >> else { >> // read all messages and compare with own state >> for (Text message : inmessage) { >> if (message.compareTo(vertex.getValue()) < 0) { >> changed = true; >> vertex.setValue(message); >> } >> } >> } >> >> // if state has changed send a message to all neighbors >> if (changed && getSuperstep() < limiter) { >> sendMessageToAllEdges(vertex, vertex.getValue()); >> } >> vertex.voteToHalt(); >> } >> >> >> >> On Mon, Mar 3, 2014 at 9:56 AM, Martin Neumann <mneum...@spotify.com> >> wrote: >> >> Hey, >>> >>> sorry, yes I intended to send it to the mailing-list, thanks for the hint >>> (I repost it to the list). Guess I should stop working in the middle of >>> the >>> night. :-) >>> >>> I tested the input format and its output is consistent, I always get the >>> same graph when running it. So the bug has to be in the connected >>> component >>> part since that one gives me different outputs for the same input graph. >>> >>> cheers Martin >>> >>> >>> On Mon, Mar 3, 2014 at 8:06 AM, Sebastian Schelter <s...@apache.org> >>> wrote: >>> >>> Martin, >>>> >>>> can you write a MapReduce job that creates your graph and run it with a >>>> simpler inputformat? >>>> >>>> I really suspect that the bug lies somewhere in your input format. >>>> >>>> --sebastian >>>> >>>> >>>> On 03/02/2014 09:48 PM, Martin Neumann wrote: >>>> >>>> I checked the input just creating the graph and comparing it. While I >>>>> cant >>>>> say the graph is correct (for its to big) its at least consistent. >>>>> >>>>> So the only things where the different output can come from is the >>>>> connected component part (see code in the first mail). I'm completely >>>>> stomped, the code is basically the example code Giraph ships with all I >>>>> changed was replacing the IntWriteable Id's with Text. >>>>> >>>>> Anyone has any Idea what the problem could be, I'm running out of >>>>> Idea's. >>>>> >>>>> cheers Martin >>>>> >>>>> >>>>> On Thu, Feb 27, 2014 at 4:59 PM, Sebastian Schelter <s...@apache.org> >>>>> wrote: >>>>> >>>>> Hi Martin >>>>> >>>>>> >>>>>> I don't think that there are problems with comparing and sorting Text >>>>>> writables as Hadoop is basically a big external sorting system. >>>>>> >>>>>> I'm not sure I understand your edge input reader, it looks very >>>>>> complex, >>>>>> maybe there's a bug somewhere. You could try to preprocess your data >>>>>> using >>>>>> Hadoop so that you can use a simple VertexInputFormat and see if your >>>>>> problems still occur. >>>>>> >>>>>> --sebastian >>>>>> >>>>>> >>>>>> On 02/27/2014 04:41 PM, Martin Neumann wrote: >>>>>> >>>>>> Hm >>>>>> >>>>>>> >>>>>>> I ran the job 5 times and made a diff between the outputs and they >>>>>>> are >>>>>>> not >>>>>>> the same. I cant find anything in the code that could lead to this >>>>>>> behaviour. >>>>>>> >>>>>>> The only idea where to look a the moment would be the identifier. Has >>>>>>> anyone experience with String identifier? >>>>>>> Is a possible that there are problems with comparing and sorting >>>>>>> TextWritables? >>>>>>> >>>>>>> cheers Martin >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >