Re: output

2013-07-25 Thread Keren Ouaknine
Got it, thanks Prashant! On Wed, Jul 24, 2013 at 10:41 PM, Prashant Kommireddi prash1...@gmail.comwrote: PigStorage by default uses tab as field delimiter. Is 1.txt tab delimited? If not you would need to define space as the delimiter in the constructor during the loading - PigStorage(' ').

Re: union

2013-07-25 Thread Mohammad Tariq
Hello Keren, There is nothing wrong in this. One dataset in Hadoop is usually one folder and not one file. Pig is doing what it is supposed to do and performing a union on both the files. You would have seen the content of both the files together while doing dump C. Since this is a map only job,

Re: union

2013-07-25 Thread Mohammad Tariq
You could try something like this : A = load '/1.txt' using PigStorage(' ') as (x:int, y:chararray, z:chararray); B = load '/1_ext.txt' using PigStorage(' ') as (a:int, b:chararray, c:chararray); C = union A, B; D = group C by 1; E = foreach D generate flatten(C); store E into '/dir'; Warm

Re: Is it safe to have static methods in Hadoop Framework

2013-07-25 Thread Shahab Yunus
If each job (its child tasks) is running in its own JVM then this should not be a problem. Regards, Shahab On Thu, Jul 25, 2013 at 2:46 PM, Huy Pham pha...@yahoo-inc.com wrote: Hi All, I am writing a class (called Parser) with a couple of static functions because I don't want millions of