Re: Improving performance for large values in reduce

2008-02-08 Thread Holger Stenzhorn
Hi, I am using a mini-cluster of three machines and on them experimented with severel different (sometimes strrange) reduce settings (from one single reduce per machine to 10 per machine). ...and the result is (basically) always the same, .i.e. the process gets stucked (or at least very slow)

Re: Improving performance for large values in reduce

2008-02-07 Thread Nathan Wang
It depends on the uniqueness of your input data and maybe on how you implemented concatenateValues. Since you're collecting twice for each line, on both subject and object, then concatenating the original line twice again. If you have many rows with the same subjects and objects, you'll end up w

Re: Improving performance for large values in reduce

2008-02-07 Thread Arun C Murthy
On Feb 7, 2008, at 10:35 AM, Holger Stenzhorn wrote: Hello, I am creating a small MapReduce application that works on large RDF dataset files in triple format (i.e. one RDF triple per line, " ."). In the mapper class I split up the triples into subject and object and then collect each

Improving performance for large values in reduce

2008-02-07 Thread Holger Stenzhorn
Hello, I am creating a small MapReduce application that works on large RDF dataset files in triple format (i.e. one RDF triple per line, " ."). In the mapper class I split up the triples into subject and object and then collect each subject/object as key plus the related complete triple as