ClassCastException when using IsEmpty(DIFF()) 
----------------------------------------------

                 Key: PIG-2144
                 URL: https://issues.apache.org/jira/browse/PIG-2144
             Project: Pig
          Issue Type: Bug
            Reporter: Mitesh Singh Jat


I have following input <name>:<nickname>, for which I want to find records 
where name is different from nickname.
{code:title=input/name_nickname.txt}
Bharat:Bharat
Amita:Amita
Mitesh:Mitesh
Reenu:Anshu
Shikha:Shikhu
Shilpa:Shilpi
{code}

I have following script to find records where name is different from nickname.
{code:title=isEmpty_diff.pig}

A = LOAD 'input/name_nickname.txt' using PigStorage(':');

B = FILTER A BY NOT IsEmpty(DIFF($0, $1));

DUMP B;
{code}


The above pig script works with older pig versions (e.g. 0.8.0 (r1043805)) and 
gives following output
{code:title=output of isEmpty_diff.pig}
(Reenu,Anshu)
(Shikha,Shikhu)
(Shilpa,Shilpi)
{code}


However, the above pig script (isEmpty_diff.pig) fails on Pig 0.9 (e.g. 
0.9.0.1105251322 (r1127671)) and newer version of Pig 0.8 (e.g. version 
0.8.0.1105131316 (r1102885)) , with ClassCastException
{code:title=ClassCastException}
java.lang.ClassCastException: org.apache.pig.data.DefaultDataBag cannot be cast 
to java.lang.Boolean
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:75)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:318)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:159)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:184)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:269)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(PONot.java:71)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:148)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:256)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:676)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:336)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
{code}


As a workaround, I used the following pig script.
{code:titlee=isEmpty_diff2.pig}
A = LOAD 'input/name_nickname.txt' using PigStorage(':');

--B = FILTER A BY NOT IsEmpty(DIFF($0, $1));
B1 = FOREACH A GENERATE $0, $1, DIFF($0, $1);
B2 = FILTER B1 BY NOT IsEmpty($2);
B = FOREACH B2 GENERATE $0, $1;

DUMP B;
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to