You can see the type of join keys are different. One is chararray, the other is int. You have to change them into the same type.
Yong 2011/5/10 Vincent <[email protected]> > According to your advices I wrote the following: > > *A = LOAD 'peoples.txt' USING PigStorage(';') AS (name : chararray, > pets_ids > : chararray); > > B = foreach A GENERATE name, STRSPLIT(pets_ids, ',') AS pets_ids_separated; > DUMP B; > DESCRIBE B; > > C = FOREACH B GENERATE name, FLATTEN(TOBAG(pets_ids_separated)) AS > user_pet_id; > DUMP C; > DESCRIBE C; > > D = LOAD 'pets.txt' USING PigStorage(';') AS (id : int, type : chararray, > race: chararray); > > > reqd_op = JOIN C BY user_pet_id, D BY id PARALLEL 5; > DUMP reqd_op;* > > But I have the following error: > 2011-05-10 15:30:04,036 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1107: Cannot merge join keys, incompatible types > Details at logfile: /local/tmp/test/pig_expand/pig_1305026987213.log > > Any idea, what it goes wrong here? > > Best Regards > > Vincent > > > > On Tue, May 10, 2011 at 3:04 PM, Mridul Muralidharan > <[email protected]>wrote: > > > > > I am not sure I follow your query related to PARALLEL. > > The value for parallel is a static value. > > > > I was using $MY_PARALLEL as a placeholder to specify what sort of > > parallelism you need. > > > > Typically you will have a default value in the script > > > > %default MY_PARALLEL '10' > > > > And override it, when required, using command line pig -param > > MY_PARALLEL=50 ... > > > > > > > > Regards, > > Mridul > > > > > > On Tuesday 10 May 2011 04:26 PM, Vincent wrote: > > > >> Thanks Mridul for your quick answer! > >> > >> According to documentation PARALLEL is setting the number of reduce > >> tasks. So how can I make it taking an UDF instead? Is there any example > >> of such functions in SVN/pig0.8 package? > >> > >> Best Regards > >> > >> Vincent > >> > >> On Tue, May 10, 2011 at 2:02 PM, Mridul Muralidharan > >> <[email protected] <mailto:[email protected]>> wrote: > >> > >> > >> Easy option would be to write your own udf which can catch corner > >> cases, etc .. > >> But assuming your data strictly follows what you mentioned, > >> something like this might help (illustrative only !) : > >> > >> pets = load 'pets.txt' USING PigStorage(';') AS (pet_id:chararray, > >> pet_type:chararray, pet_name:chararray); > >> > >> people = load 'peoples.txt' USING PigStorage(';') AS > >> (user:chararray, ids:chararray); > >> people_t = FOREACH people GENERATE user, STRSPLIT(ids, ','); > >> -- STRSPLIT returns a tuple, not a bag : so convert to bag and > >> flatten it. > >> people_reqd = FOREACH people_t GENERATE user, FLATTEN(TOBAG($1)) as > >> (user_pet_id); > >> > >> > >> reqd_op = JOIN people_reqd BY user_pet_id, pets BY pet_id PARALLEL > >> $MY_PARALLEL; > >> > >> > >> reqd_op should contain what you need ... > >> > >> > >> > >> Regards, > >> Mridul > >> > >> > >> > >> > >> > >> On Tuesday 10 May 2011 03:00 PM, Vincent wrote: > >> > >> Hello dear Pig users, > >> > >> *I am loading a file with the following format:* > >> > >> *$ cat peoples.txt > >> tom;1234,4567,6 > >> anna;27894* > >> First field is a name, second field is a concatenation of an > >> unknown number > >> of pets ids. > >> > >> *I would like to JOIN this file with another one:* > >> > >> *$ cat pets.txt > >> 1234;dog;cocker > >> 4567;mouse;usa > >> 6;cat;persian > >> 27894;cat;manx > >> *Fields are pet's id, pet's type, pet's race. > >> * > >> to get the following result:* > >> > >> *1234;dog;cocker;tom > >> 4567;mouse;usa;tom > >> 6;cat;persian;tom > >> 27894;cat;manx;anna* > >> > >> *Problem is that I don't know how to convert a tuple of fields > >> to lines, > >> i.e. to put the file peoples.txt into the following intermediate > >> format:* > >> *tom,1234 > >> tom,4567 > >> tom,6 > >> anna,27894* > >> > >> Thanks in advance for your help! > >> > >> > >> Vincent Hervieux > >> > >> > >> > >> > > >
