Hi,
multiple load operators in a script start the same number of streams, some of 
them are merged later (e.g. join) and some of them are not. How to know which 
MR Operator should we place these loads at? For example, we got script like 
this:
a = load file1
b = load file2
..
dump

if we join a and b between loads and dump, the two loads (a and b) should be 
placed in the same MR operator. If we sort a and b independently, these two 
loads should be placed in separate MR operators. How to identify these two 
streams are correlated or not?

A further question is, can we specify a directory so that load will read all 
the files in that directory? Since each reducer of a mr job will produce a 
single file, when the subsequent mr job need to read all these files, what do 
we do?

Thanks,
-Gang




Reply via email to