Hi Pankaj,

Which version of Pig are you using? It works fine for me. I get the
following output as expected:

((2,22),{(2,22,222,2222)},{(2,22,bb,bbb)})

I tested Pig 0.9, 0.10, 0.11, and trunk. All worked for me.

Thanks,
Cheolsoo




On Fri, Jun 14, 2013 at 5:25 AM, Ojha, Pankaj <pankaj.o...@searshc.com>wrote:

> Hi Team,
>
> We are facing an issue when we use IsEmpty UDF with FILTER
>
> Scenario:
> We have two input files:-
>
> Input File 1: - first
> 1|11|111|1111
> 2|22|222|2222
> 3|33|333|3333
> 4|44|444|4444
> 5|55|555|5555
>
> Input File 2: - second
> 1|a|aa|aaa
> 2|22|bb|bbb
> 3|c|cc|ccc
> 6|d|dd|ddd
>
>
> Our requirement is , on grouping these two input files on the first two
> keys, it should give output only when data is present in both the files for
> a particular key otherwise it should print nothing.
> From the above input files, for key values (2,22), it should only print
> output like below :-
>
> ((2,22),{(2,22,222,2222)},{(2,22,bb,bbb)})
>
> To achieve this, we wrote the code as below:-
>
> first = LOAD 'first' USING PigStorage('|') as
> (a:chararray,b:chararray,c:chararray,d:chararray);
>
> second = LOAD 'second' USING PigStorage('|') as
> (aa:chararray,bb:chararray,cc:chararray,dd:chararray);
>
> cogroup_join = COGROUP first BY (a,b) , second BY (aa,bb);
>
> cogroup_join_filter = FILTER cogroup_join BY NOT IsEmpty(second) AND NOT
> IsEmpty(first);
>
> dump cogroup_join_filter;
>
> But, the output for the cogroup_join_filter is:
> ((1,a),{},{(1,a,aa,aaa)})
> ((2,22),{(2,22,222,2222)},{(2,22,bb,bbb)})
> ((3,c),{},{(3,c,cc,ccc)})
> ((6,d),{},{(6,d,dd,ddd)})
>
> In my opinion, IsEmpty should have filtered out other values where it does
> not find corresponding key values same in both input file except for (2,22).
> But the same is not happening.
> Please have a look and provide your view on this.
>
> Thanks & Regards,
> Pankaj Ojha
>
> This message, including any attachments, is the property of Sears Holdings
> Corporation and/or one of its subsidiaries. It is confidential and may
> contain proprietary or legally privileged information. If you are not the
> intended recipient, please delete it without reading the contents. Thank
> you.
>

Reply via email to