Got it, thanks Prashant!
On Wed, Jul 24, 2013 at 10:41 PM, Prashant Kommireddi
prash1...@gmail.comwrote:
PigStorage by default uses tab as field delimiter. Is 1.txt tab delimited?
If not you would need to define space as the delimiter in the constructor
during the loading - PigStorage(' ').
Hello Keren,
There is nothing wrong in this. One dataset in Hadoop is usually one folder
and not one file. Pig is doing what it is supposed to do and performing a
union on both the files. You would have seen the content of both the files
together while doing dump C.
Since this is a map only job,
You could try something like this :
A = load '/1.txt' using PigStorage(' ') as (x:int, y:chararray,
z:chararray);
B = load '/1_ext.txt' using PigStorage(' ') as (a:int, b:chararray,
c:chararray);
C = union A, B;
D = group C by 1;
E = foreach D generate flatten(C);
store E into '/dir';
Warm
If each job (its child tasks) is running in its own JVM then this should
not be a problem.
Regards,
Shahab
On Thu, Jul 25, 2013 at 2:46 PM, Huy Pham pha...@yahoo-inc.com wrote:
Hi All,
I am writing a class (called Parser) with a couple of static functions
because I don't want millions of