Hello Keren, There is nothing wrong in this. One dataset in Hadoop is usually one folder and not one file. Pig is doing what it is supposed to do and performing a union on both the files. You would have seen the content of both the files together while doing dump C.
Since this is a map only job, and 2 mappers are getting generated, you are getting 2 separate files. Which is actually one complete dataset. If you want to have just one file, you need to force a reduce so that you get all the results collectively in a single output file. HTH Warm Regards, Tariq cloudfront.blogspot.com On Thu, Jul 25, 2013 at 11:31 AM, Keren Ouaknine <ker...@gmail.com> wrote: > Hi, > > According to Pig's documention on union, two schemas which have the same > schema (have the same length and types can be implicitly cast) can be > concatenated (see http://pig.apache.org/docs/r0.11.1/basic.html#union) > > However, when I try with: > A = load '1.txt' using PigStorage(' ') as (x:int, y:chararray, > z:chararray); > B = load '1_ext.txt' using PigStorage(' ') as (a:int, b:chararray, > c:chararray); > C = union A, B; > describe C; > DUMP C; > store C into '/home/kereno/Documents/pig-0.11.1/workspace/res'; > > with: > ~/Documents/pig-0.11.1/workspace 130$ more 1.txt 1_ext.txt > :::::::::::::: > 1.txt > :::::::::::::: > 1 a aleph > 2 b bet > 3 g gimel > :::::::::::::: > 1_ext.txt > :::::::::::::: > 0 a alpha > 0 b beta > 0 g gimel > > > I get in result:~/Documents/pig-0.11.1/workspace 0$ more res/part-m-0000* > :::::::::::::: > res/part-m-00000 > :::::::::::::: > 0 a alpha > 0 b beta > 0 g gimel > :::::::::::::: > res/part-m-00001 > :::::::::::::: > 1 a aleph > 2 b bet > 3 g gimel > > Whereas I was expecting something like > 0 a alpha > 0 b beta > 0 g gimel > 1 a aleph > 2 b bet > 3 g gimel > > [all together] > > I understand that two files for non-matching schemas would be generated but > why for union with a matching schema? > > Thanks, > Keren > > -- > Keren Ouaknine > Web: www.kereno.com >