Hi,
I managed to fix it even if I'm still not entirely sure what happened.
The fix is to make a new Text object every time a Text is required as input
(Text does not implement Cloneable). I guess it
So instead of:
Text candidate = e.getTargetVertexId();
...
vertex.setValue(candidate))
The
Hi Martin,
I'm not sure wether we require InputFormats to be threadsafe. Can
someone answer that question?
Maybe thats the reason you see this behavior.
--sebastian
On 03/03/2014 10:05 AM, Martin Neumann wrote:
I checked the input just creating the graph and comparing it. While I cant
say
I checked the input just creating the graph and comparing it. While I cant
say the graph is correct (its to big) its at least consistent.
So the only things where the different output can come from is the
connected component part (see code further down). I'm completely stomped,
the code is basical
Hi Martin
I don't think that there are problems with comparing and sorting Text
writables as Hadoop is basically a big external sorting system.
I'm not sure I understand your edge input reader, it looks very complex,
maybe there's a bug somewhere. You could try to preprocess your data
using
Hm
I ran the job 5 times and made a diff between the outputs and they are not
the same. I cant find anything in the code that could lead to this
behaviour.
The only idea where to look a the moment would be the identifier. Has
anyone experience with String identifier?
Is a possible that there are
The data I have as input is not in a Graph-Format so I use an
EdgeInputFormat to create a Graph. Its also deterministic so the same Graph
should be build with the same input.
Each line in the input is a set of connected vertices.
I create edges in a way that they form a star around the vertex with
Hi Martin,
You are right, this should not happen, your code looks correct. One way
to check the output would be to simply count the number vertices per
component and see if that number stays stable.
Do you supply all vertices in your input data or are some vertices
created during the computa
Hej,
I have modified the connected component example to fit my input data. I
expect it to be deterministic.
But when I run it multiple times it takes a different number of Super
steps. This only happens on the complete dataset and not on my small test
dataset.
(So I cannot check the output for c