Spark is compiled against a custom fork of Hive 1.2.1 which added shading of Protobuf and removed shading of Kryo. What I think that what's happening here is that stock Hive 1.2.1 is taking precedence so the Kryo instance that it's returning is an instance of shaded/relocated Hive version rather than the unshaded, stock Kryo that Spark is expecting here.
I just so happen to have a patch which reintroduces the shading of Kryo (motivated by other factors): https://github.com/apache/spark/pull/12215; there's a chance that a backport of this patch might fix this problem. However, I'm a bit curious about how your classpath is set up and why stock 1.2.1's shaded Kryo is being used here. /cc +Marcelo Vanzin <van...@cloudera.com> and +Steve Loughran <ste...@hortonworks.com>, who may know more. On Wed, Apr 6, 2016 at 6:08 PM Soam Acharya <s...@altiscale.com> wrote: > Hi folks, > > I have a build of Spark 1.6.1 on which spark sql seems to be functional > outside of windowing functions. For example, I can create a simple external > table via Hive: > > CREATE EXTERNAL TABLE PSTable (pid int, tty string, time string, cmd > string) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > LINES TERMINATED BY '\n' > STORED AS TEXTFILE > LOCATION '/user/test/ps'; > > Ensure that the table is pointing to some valid data, set up spark sql to > point to the Hive metastore (we're running Hive 1.2.1) and run a basic test: > > spark-sql> select * from PSTable; > 7239 pts/0 00:24:31 java > 9993 pts/9 00:00:00 ps > 9994 pts/9 00:00:00 tail > 9995 pts/9 00:00:00 sed > 9996 pts/9 00:00:00 sed > > But when I try to run a windowing function which I know runs onHive, I get: > > spark-sql> select a.pid ,a.time, a.cmd, min(a.time) over (partition by > a.cmd order by a.time ) from PSTable a; > org.apache.spark.SparkException: Task not serializable > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304) > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294) > at > org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2055) > : > : > Caused by: java.lang.ClassCastException: > org.apache.hive.com.esotericsoftware.kryo.Kryo cannot be cast to > com.esotericsoftware.kryo.Kryo > at > org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.serializePlan(HiveShim.scala:178) > at > org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.writeExternal(HiveShim.scala:191) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1458) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429) > at > java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) > > Any thoughts or ideas would be appreciated! > > Regards, > > Soam >