HelloI use Pig at home (currently version 0.13.0) regularly on data sets that 
vary between 10's Megabytes and 10's Gigabytes. I wanted to be able to join two 
data sets together (ideally filtering). The main problem I am having and have 
not found an easily solution is:I want to join data set 1 to data set 2 like 
below.data1.txtid, name, job0001,john, manager0002,phil, deputydata2.txtid1, 
id2, id3, label0001,0002,0001,useful0005,0001,0001,useful0000,0010,0009,not 
usefulCode ProposaldatasetA = LOAD 'data1.txt' USING PigStorage(',') AS 
(fieldA1, fieldA2, fieldA3);datasetB = LOAD 'data2.txt' USING PigStorage(',') 
AS (fieldB1, fieldB2, fieldB3, fieldB4);joined = JOIN               datasetA BY 
fieldA1,              datasetB BY (fieldB1 OR fieldB2 OR fieldB3);DUMP 
joined;So essentially I want to join 1 column to n columns in the second data 
set where they are equal. I am not after a partial join but an exact join. Is 
there a feature already in the language to do this, if not, would it be 
possible to request such a feature?Thanks.                                     

Reply via email to