Hi folks,

I have a build of Spark 1.6.1 on which spark sql seems to be functional
outside of windowing functions. For example, I can create a simple external
table via Hive:

CREATE EXTERNAL TABLE PSTable (pid int, tty string, time string, cmd string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/test/ps';

Ensure that the table is pointing to some valid data, set up spark sql to
point to the Hive metastore (we're running Hive 1.2.1) and run a basic test:

spark-sql> select * from PSTable;
7239    pts/0   00:24:31        java
9993    pts/9   00:00:00        ps
9994    pts/9   00:00:00        tail
9995    pts/9   00:00:00        sed
9996    pts/9   00:00:00        sed

But when I try to run a windowing function which I know runs onHive, I get:

spark-sql> select a.pid ,a.time, a.cmd, min(a.time) over (partition by
a.cmd order by a.time ) from PSTable a;
org.apache.spark.SparkException: Task not serializable
        at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
        at
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
        at
org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
:
:
Caused by: java.lang.ClassCastException:
org.apache.hive.com.esotericsoftware.kryo.Kryo cannot be cast to
com.esotericsoftware.kryo.Kryo
        at
org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.serializePlan(HiveShim.scala:178)
        at
org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.writeExternal(HiveShim.scala:191)
        at
java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1458)
        at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429)
        at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)

Any thoughts or ideas would be appreciated!

Regards,

Soam

Reply via email to