I am facing very weird problem while multiplication. Pig simplified code snippet- A = LOAD 'file_A' AS (colA1 : double, colA2 : double); describe A; *A: {colA1: double,colA2: double}* B = LOAD 'file_B' AS (colB1 : double, colB2 : double); describe B; *B: {colB1: double,colB2: double}*
joined = JOIN A BY (colA1) LEFT OUTER, B BY (colB1) USING 'replicated'; SPLIT joined INTO split1 IF A::colB1 IS NOT NULL, split2 IF (A::colB1 IS NULL AND A;:colA2 == 2), split3 IF (A::colB1 IS NULL AND A;:colA2 != 2); describe split1; * split1: {A::colA1: double,A::colA2: double,B::colB1: double,B::colB2: double}* D = FOREACH split1 GENERATE (A::colA1 * B::colB1) AS newCol; *Error-* 2014-04-24 10:02:30,458 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 0: Exception while executing [Multiply (Name: Multiply[double] - scope-6 Operator Key: scope-6) children: [[POProject (Name: Project[double][1] - scope-3 Operator Key: scope-3) children: null at []], [POCast (Name: Cast[double] - scope-5 Operator Key: scope-5) children: [[ConstantExpression (Name: Constant(3) - scope-4 Operator Key: scope-4) children: null at []]] at []]] at []]: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.Number Stack tarce- org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [Multiply (Name: Multiply[double] - scope-6 Operator Key: scope-6) children: [[POProject (Name: Project[double][1] - scope-3 Operator Key: scope-3) children: null at []], [POCast (Name: Cast[double] - scope-5 Operator Key: scope-5) children: [[ConstantExpression (Name: Constant(3) - scope-4 Operator Key: scope-4) children: null at []]] at []]] at []]: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.Number at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:338) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:681) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.Number at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Multiply.genericGetNext(Multiply.java:89) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Multiply.getNextDouble(Multiply.java:104) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:317) ... 13 more I tried below options but no luck- 1) Doing addition instead of multiplication and I get similar error. 2) I verified multiplication for double works with few sample files. 3) I tried casting it again to double before multiplication too. 4) I tried storing result before multiplication and loading it back. still same error. I am not sure why it's throwing classCastException when schema has double as data type. Please let me know if need any further information or missing something in above simplified snippet. Any help is very much appreciated. Thanks