Re: Tuple to lines conversion in Pig

勇胡 Tue, 10 May 2011 04:44:08 -0700

You can see the type of join keys are different. One is chararray, the other
is int. You have to change them into the same type.


Yong

2011/5/10 Vincent <[email protected]>

> According to your advices I wrote the following:
>
> *A = LOAD 'peoples.txt' USING PigStorage(';') AS (name : chararray,
> pets_ids
> : chararray);
>
> B = foreach A GENERATE name, STRSPLIT(pets_ids, ',') AS pets_ids_separated;
> DUMP B;
> DESCRIBE B;
>
> C = FOREACH B GENERATE name, FLATTEN(TOBAG(pets_ids_separated)) AS
> user_pet_id;
> DUMP C;
> DESCRIBE C;
>
> D = LOAD 'pets.txt' USING PigStorage(';') AS (id : int, type : chararray,
> race: chararray);
>
>
> reqd_op = JOIN C BY user_pet_id, D BY id PARALLEL 5;
> DUMP reqd_op;*
>
> But I have the following error:
> 2011-05-10 15:30:04,036 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1107: Cannot merge join keys, incompatible types
> Details at logfile: /local/tmp/test/pig_expand/pig_1305026987213.log
>
> Any idea, what it goes wrong here?
>
> Best Regards
>
> Vincent
>
>
>
> On Tue, May 10, 2011 at 3:04 PM, Mridul Muralidharan
> <[email protected]>wrote:
>
> >
> > I am not sure I follow your query related to PARALLEL.
> > The value for parallel is a static value.
> >
> > I was using $MY_PARALLEL as a placeholder to specify what sort of
> > parallelism you need.
> >
> > Typically you will have a default value in the script
> >
> > %default MY_PARALLEL '10'
> >
> > And override it, when required, using command line pig -param
> > MY_PARALLEL=50 ...
> >
> >
> >
> > Regards,
> > Mridul
> >
> >
> > On Tuesday 10 May 2011 04:26 PM, Vincent wrote:
> >
> >> Thanks Mridul for your quick answer!
> >>
> >> According to documentation PARALLEL is setting the number of reduce
> >> tasks. So how can I make it taking an UDF instead? Is there any example
> >> of such functions in SVN/pig0.8 package?
> >>
> >> Best Regards
> >>
> >> Vincent
> >>
> >> On Tue, May 10, 2011 at 2:02 PM, Mridul Muralidharan
> >> <[email protected] <mailto:[email protected]>> wrote:
> >>
> >>
> >>    Easy option would be to write your own udf which can catch corner
> >>    cases, etc  ..
> >>    But assuming your data strictly follows what you mentioned,
> >>    something like this might help (illustrative only !) :
> >>
> >>    pets = load 'pets.txt'  USING PigStorage(';') AS (pet_id:chararray,
> >>    pet_type:chararray, pet_name:chararray);
> >>
> >>    people = load 'peoples.txt'  USING PigStorage(';') AS
> >>    (user:chararray, ids:chararray);
> >>    people_t = FOREACH people GENERATE user, STRSPLIT(ids, ',');
> >>    -- STRSPLIT returns a tuple, not a bag : so convert to bag and
> >>    flatten it.
> >>    people_reqd = FOREACH people_t GENERATE user, FLATTEN(TOBAG($1)) as
> >>    (user_pet_id);
> >>
> >>
> >>    reqd_op = JOIN people_reqd BY user_pet_id, pets BY pet_id PARALLEL
> >>    $MY_PARALLEL;
> >>
> >>
> >>    reqd_op should contain what you need ...
> >>
> >>
> >>
> >>    Regards,
> >>    Mridul
> >>
> >>
> >>
> >>
> >>
> >>    On Tuesday 10 May 2011 03:00 PM, Vincent wrote:
> >>
> >>        Hello dear Pig users,
> >>
> >>        *I am loading a file with the following format:*
> >>
> >>        *$ cat peoples.txt
> >>        tom;1234,4567,6
> >>        anna;27894*
> >>        First field is a name, second field is a concatenation of an
> >>        unknown number
> >>        of pets ids.
> >>
> >>        *I would like to JOIN this file with another one:*
> >>
> >>        *$ cat pets.txt
> >>        1234;dog;cocker
> >>        4567;mouse;usa
> >>        6;cat;persian
> >>        27894;cat;manx
> >>        *Fields are pet's id, pet's type, pet's race.
> >>        *
> >>        to get the following result:*
> >>
> >>        *1234;dog;cocker;tom
> >>        4567;mouse;usa;tom
> >>        6;cat;persian;tom
> >>        27894;cat;manx;anna*
> >>
> >>        *Problem is that I don't know how to convert a tuple of fields
> >>        to lines,
> >>        i.e. to put the file peoples.txt into the following intermediate
> >>        format:*
> >>        *tom,1234
> >>        tom,4567
> >>        tom,6
> >>        anna,27894*
> >>
> >>        Thanks in advance for your help!
> >>
> >>
> >>             Vincent Hervieux
> >>
> >>
> >>
> >>
> >
>

Re: Tuple to lines conversion in Pig

Reply via email to