Hi,
I am using a mini-cluster of three machines and on them experimented
with severel different (sometimes strrange) reduce settings (from one
single reduce per machine to 10 per machine).
...and the result is (basically) always the same, .i.e. the process gets
stucked (or at least very slow)
It depends on the uniqueness of your input data and maybe on how you
implemented concatenateValues.
Since you're collecting twice for each line, on both subject and object, then
concatenating the original line twice again.
If you have many rows with the same subjects and objects, you'll end up w
On Feb 7, 2008, at 10:35 AM, Holger Stenzhorn wrote:
Hello,
I am creating a small MapReduce application that works on large RDF
dataset files in triple format (i.e. one RDF triple per line,
" .").
In the mapper class I split up the triples into subject and object
and then collect each
Hello,
I am creating a small MapReduce application that works on large RDF
dataset files in triple format (i.e. one RDF triple per line, "
.").
In the mapper class I split up the triples into subject and object and
then collect each subject/object as key plus the related complete triple
as