Re: question about org.apache.hadoop.mapred.join

2012-04-11 Thread Koert Kuipers
thanks for that answer. makes sense. koert On Tue, Apr 10, 2012 at 1:33 PM, Chris Douglas wrote: > Your understanding is correct. The framework doesn't do anything to > align input splits across datasets. In the situation you describe- > where one can't seek among key groups in the input data- i

Re: question about org.apache.hadoop.mapred.join

2012-04-10 Thread Chris Douglas
Your understanding is correct. The framework doesn't do anything to align input splits across datasets. In the situation you describe- where one can't seek among key groups in the input data- it often makes sense to disable splitting of the individual files by setting the min split size to Integer.

question about org.apache.hadoop.mapred.join

2012-04-10 Thread Koert Kuipers
I read about CompositeInputFormat and how it allows one to join two datasets together as long as those datasets were sorted and partitioned the same way. Ok i think i get it, but something bothers me. It is suggested that two datasets are "sorted and partitioned the same way" if they were both outp