Hi folks, I have a build of Spark 1.6.1 on which spark sql seems to be functional outside of windowing functions. For example, I can create a simple external table via Hive:
CREATE EXTERNAL TABLE PSTable (pid int, tty string, time string, cmd string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE LOCATION '/user/test/ps'; Ensure that the table is pointing to some valid data, set up spark sql to point to the Hive metastore (we're running Hive 1.2.1) and run a basic test: spark-sql> select * from PSTable; 7239 pts/0 00:24:31 java 9993 pts/9 00:00:00 ps 9994 pts/9 00:00:00 tail 9995 pts/9 00:00:00 sed 9996 pts/9 00:00:00 sed But when I try to run a windowing function which I know runs onHive, I get: spark-sql> select a.pid ,a.time, a.cmd, min(a.time) over (partition by a.cmd order by a.time ) from PSTable a; org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at org.apache.spark.SparkContext.clean(SparkContext.scala:2055) : : Caused by: java.lang.ClassCastException: org.apache.hive.com.esotericsoftware.kryo.Kryo cannot be cast to com.esotericsoftware.kryo.Kryo at org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.serializePlan(HiveShim.scala:178) at org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.writeExternal(HiveShim.scala:191) at java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1458) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) Any thoughts or ideas would be appreciated! Regards, Soam