You could try something like this : A = load '/1.txt' using PigStorage(' ') as (x:int, y:chararray, z:chararray);
B = load '/1_ext.txt' using PigStorage(' ') as (a:int, b:chararray, c:chararray); C = union A, B; D = group C by 1; E = foreach D generate flatten(C); store E into '/dir'; Warm Regards, Tariq cloudfront.blogspot.com On Thu, Jul 25, 2013 at 12:52 PM, Mohammad Tariq <donta...@gmail.com> wrote: > Hello Keren, > > There is nothing wrong in this. One dataset in Hadoop is usually one > folder and not one file. Pig is doing what it is supposed to do and > performing a union on both the files. You would have seen the content of > both the files together while doing dump C. > > Since this is a map only job, and 2 mappers are getting generated, you are > getting 2 separate files. Which is actually one complete dataset. If you > want to have just one file, you need to force a reduce so that you get all > the results collectively in a single output file. > > HTH > > Warm Regards, > Tariq > cloudfront.blogspot.com > > > On Thu, Jul 25, 2013 at 11:31 AM, Keren Ouaknine <ker...@gmail.com> wrote: > >> Hi, >> >> According to Pig's documention on union, two schemas which have the same >> schema (have the same length and types can be implicitly cast) can be >> concatenated (see http://pig.apache.org/docs/r0.11.1/basic.html#union) >> >> However, when I try with: >> A = load '1.txt' using PigStorage(' ') as (x:int, y:chararray, >> z:chararray); >> B = load '1_ext.txt' using PigStorage(' ') as (a:int, b:chararray, >> c:chararray); >> C = union A, B; >> describe C; >> DUMP C; >> store C into '/home/kereno/Documents/pig-0.11.1/workspace/res'; >> >> with: >> ~/Documents/pig-0.11.1/workspace 130$ more 1.txt 1_ext.txt >> :::::::::::::: >> 1.txt >> :::::::::::::: >> 1 a aleph >> 2 b bet >> 3 g gimel >> :::::::::::::: >> 1_ext.txt >> :::::::::::::: >> 0 a alpha >> 0 b beta >> 0 g gimel >> >> >> I get in result:~/Documents/pig-0.11.1/workspace 0$ more res/part-m-0000* >> :::::::::::::: >> res/part-m-00000 >> :::::::::::::: >> 0 a alpha >> 0 b beta >> 0 g gimel >> :::::::::::::: >> res/part-m-00001 >> :::::::::::::: >> 1 a aleph >> 2 b bet >> 3 g gimel >> >> Whereas I was expecting something like >> 0 a alpha >> 0 b beta >> 0 g gimel >> 1 a aleph >> 2 b bet >> 3 g gimel >> >> [all together] >> >> I understand that two files for non-matching schemas would be generated >> but >> why for union with a matching schema? >> >> Thanks, >> Keren >> >> -- >> Keren Ouaknine >> Web: www.kereno.com >> > >