Hi Pankaj, Which version of Pig are you using? It works fine for me. I get the following output as expected:
((2,22),{(2,22,222,2222)},{(2,22,bb,bbb)}) I tested Pig 0.9, 0.10, 0.11, and trunk. All worked for me. Thanks, Cheolsoo On Fri, Jun 14, 2013 at 5:25 AM, Ojha, Pankaj <pankaj.o...@searshc.com>wrote: > Hi Team, > > We are facing an issue when we use IsEmpty UDF with FILTER > > Scenario: > We have two input files:- > > Input File 1: - first > 1|11|111|1111 > 2|22|222|2222 > 3|33|333|3333 > 4|44|444|4444 > 5|55|555|5555 > > Input File 2: - second > 1|a|aa|aaa > 2|22|bb|bbb > 3|c|cc|ccc > 6|d|dd|ddd > > > Our requirement is , on grouping these two input files on the first two > keys, it should give output only when data is present in both the files for > a particular key otherwise it should print nothing. > From the above input files, for key values (2,22), it should only print > output like below :- > > ((2,22),{(2,22,222,2222)},{(2,22,bb,bbb)}) > > To achieve this, we wrote the code as below:- > > first = LOAD 'first' USING PigStorage('|') as > (a:chararray,b:chararray,c:chararray,d:chararray); > > second = LOAD 'second' USING PigStorage('|') as > (aa:chararray,bb:chararray,cc:chararray,dd:chararray); > > cogroup_join = COGROUP first BY (a,b) , second BY (aa,bb); > > cogroup_join_filter = FILTER cogroup_join BY NOT IsEmpty(second) AND NOT > IsEmpty(first); > > dump cogroup_join_filter; > > But, the output for the cogroup_join_filter is: > ((1,a),{},{(1,a,aa,aaa)}) > ((2,22),{(2,22,222,2222)},{(2,22,bb,bbb)}) > ((3,c),{},{(3,c,cc,ccc)}) > ((6,d),{},{(6,d,dd,ddd)}) > > In my opinion, IsEmpty should have filtered out other values where it does > not find corresponding key values same in both input file except for (2,22). > But the same is not happening. > Please have a look and provide your view on this. > > Thanks & Regards, > Pankaj Ojha > > This message, including any attachments, is the property of Sears Holdings > Corporation and/or one of its subsidiaries. It is confidential and may > contain proprietary or legally privileged information. If you are not the > intended recipient, please delete it without reading the contents. Thank > you. >