thanks for that answer. makes sense. koert
On Tue, Apr 10, 2012 at 1:33 PM, Chris Douglas wrote:
> Your understanding is correct. The framework doesn't do anything to
> align input splits across datasets. In the situation you describe-
> where one can't seek among key groups in the input data- i
Your understanding is correct. The framework doesn't do anything to
align input splits across datasets. In the situation you describe-
where one can't seek among key groups in the input data- it often
makes sense to disable splitting of the individual files by setting
the min split size to Integer.
I read about CompositeInputFormat and how it allows one to join two
datasets together as long as those datasets were sorted and partitioned the
same way.
Ok i think i get it, but something bothers me. It is suggested that two
datasets are "sorted and partitioned the same way" if they were both
outp