Hi

Thanks for the information.

I looked at the datafu bag join but by the look of it still requires a single 
column join to a another single column.

The only way I found of doing it using just the basic commands is:

data = LOAD 'data.txt' as ( col1,col2,col3,col4,col5);
data2 = LOAD 'more.txt' as (col1);
expanded = FOREACH data  GENERATE
FLATTEN(TOBAG(col1,col2,col3,col4,col5)) as flatField;
Joined = JOIN expanded BY flatField, data BY col1;

I'll look into cloning the pig code and creating a function that could do 
something like this, or possible a UDF. I was thinking it'd have to be a filter 
UDF to filter a column over many columns.

Thanks for all your help.

Paul
________________________________
From: Arvind S<mailto:[email protected]>
Sent: ‎28/‎07/‎2015 09:35
To: [email protected]<mailto:[email protected]>
Subject: Re: eqijoin 1 field in dataset to 2 fields in another datasets using OR

check bag joins in
http://datafu.incubator.apache.org/docs/datafu/guide/bag-operations.html

you could bag contents of the 5 or more columns and then join ...  ??

*Cheers !!*
Arvind

On Tue, Jul 28, 2015 at 12:14 PM, paul green <[email protected]> wrote:

> Hi
>
> Thank you for your suggestion. I had thought of using the UNION function
> but thought if there was a more efficient way to do it it would be a great
> feature.
>
>
> Two joins and a union would be okay for two columns but would be less
> efficient if I wanted to check again more columns. So to see if any value
> value from a column in dataset 1 was in columns 2,3,4,5,6 of dataset 2.
>
>
> The only was I could see of doing it would be to do 5 joins and then a
> union. This just feels a like a bad way to do a lookup across many columns
> for a single colum.
>
> Thanks in advance.
> Paul
> ________________________________
> From: Arvind S<mailto:[email protected]>
> Sent: ‎28/‎07/‎2015 05:04
> To: [email protected]<mailto:[email protected]>
> Subject: Re: eqijoin 1 field in dataset to 2 fields in another datasets
> using OR
>
> Suggestion : you can create a join for each column individually ..and then
> union the result.. ??
>
> http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#UNION
>
> *Cheers !!*
> Arvind
>
> On Tue, Jul 28, 2015 at 1:30 AM, paul green <[email protected]> wrote:
>
> > HelloI use Pig at home (currently version 0.13.0) regularly on data sets
> > that vary between 10's Megabytes and 10's Gigabytes. I wanted to be able
> to
> > join two data sets together (ideally filtering). The main problem I am
> > having and have not found an easily solution is:I want to join data set 1
> > to data set 2 like below.data1.txtid, name, job0001,john,
> manager0002,phil,
> > deputydata2.txtid1, id2, id3,
> > label0001,0002,0001,useful0005,0001,0001,useful0000,0010,0009,not
> > usefulCode ProposaldatasetA = LOAD 'data1.txt' USING PigStorage(',') AS
> > (fieldA1, fieldA2, fieldA3);datasetB = LOAD 'data2.txt' USING
> > PigStorage(',') AS (fieldB1, fieldB2, fieldB3, fieldB4);joined = JOIN
> >          datasetA BY fieldA1,              datasetB BY (fieldB1 OR
> fieldB2
> > OR fieldB3);DUMP joined;So essentially I want to join 1 column to n
> columns
> > in the second data set where they are equal. I am not after a partial
> join
> > but an exact join. Is there a feature already in the language to do this,
> > if not, would it be possible to request such a feature?Thanks.
> >
>

Reply via email to