Hi Team, We are facing an issue when we use IsEmpty UDF with FILTER
Scenario: We have two input files:- Input File 1: - first 1|11|111|1111 2|22|222|2222 3|33|333|3333 4|44|444|4444 5|55|555|5555 Input File 2: - second 1|a|aa|aaa 2|22|bb|bbb 3|c|cc|ccc 6|d|dd|ddd Our requirement is , on grouping these two input files on the first two keys, it should give output only when data is present in both the files for a particular key otherwise it should print nothing. >From the above input files, for key values (2,22), it should only print output >like below :- ((2,22),{(2,22,222,2222)},{(2,22,bb,bbb)}) To achieve this, we wrote the code as below:- first = LOAD 'first' USING PigStorage('|') as (a:chararray,b:chararray,c:chararray,d:chararray); second = LOAD 'second' USING PigStorage('|') as (aa:chararray,bb:chararray,cc:chararray,dd:chararray); cogroup_join = COGROUP first BY (a,b) , second BY (aa,bb); cogroup_join_filter = FILTER cogroup_join BY NOT IsEmpty(second) AND NOT IsEmpty(first); dump cogroup_join_filter; But, the output for the cogroup_join_filter is: ((1,a),{},{(1,a,aa,aaa)}) ((2,22),{(2,22,222,2222)},{(2,22,bb,bbb)}) ((3,c),{},{(3,c,cc,ccc)}) ((6,d),{},{(6,d,dd,ddd)}) In my opinion, IsEmpty should have filtered out other values where it does not find corresponding key values same in both input file except for (2,22). But the same is not happening. Please have a look and provide your view on this. Thanks & Regards, Pankaj Ojha This message, including any attachments, is the property of Sears Holdings Corporation and/or one of its subsidiaries. It is confidential and may contain proprietary or legally privileged information. If you are not the intended recipient, please delete it without reading the contents. Thank you.