Ken Ellinwood created SPARK-1591: ------------------------------------ Summary: scala.MatchError executing custom UDTF Key: SPARK-1591 URL: https://issues.apache.org/jira/browse/SPARK-1591 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 0.9.1 Environment: CentOS 5, Hortonworks 1.3.2, Hadoop 1.2.0, Hive 0.11.0, Spark 0.9.1, Shark 0.9.1, sharkserver2, beeline Reporter: Ken Ellinwood Priority: Minor
My custom UDTF fails to execute in Shark even though it runs fine in Hive. scala.MatchError: [orange, 1, Black, 419] (of class java.util.ArrayList) at scala.runtime.ScalaRunTime$.array_clone(ScalaRunTime.scala:118) at shark.execution.UDTFCollector.collect(UDTFOperator.scala:92) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:91) at com.mycompany.warehouse.hive.HiveUdtfColorTreeTable.process(HiveUdtfColorTreeTable.java:98) at shark.execution.UDTFOperator.explode(UDTFOperator.scala:79) at shark.execution.LateralViewJoinOperator$$anonfun$processPartition$1.apply(LateralViewJoinOperator.scala:141) The code at UDTFOperator.scala, line 92 is making two assumptions which are not true in my case. First, it claims to need to clone the row object. Second, it assumes all rows objects are arrays. In my case the row is represented by ArrayList and does not need to be cloned because my UDTF creates a new one for each row already. The clone operation fails because my row is not an array. I changed my implementation to use an array, but we have a non-trivial number of custom UDFs that all work with Hive and I think they should work in Shark without modification. -- This message was sent by Atlassian JIRA (v6.2#6252)