Re: Kryo serialization mismatch in spark sql windowing function

Josh Rosen Wed, 06 Apr 2016 19:00:07 -0700

Spark is compiled against a custom fork of Hive 1.2.1 which added shading
of Protobuf and removed shading of Kryo. What I think that what's happening
here is that stock Hive 1.2.1 is taking precedence so the Kryo instance
that it's returning is an instance of shaded/relocated Hive version rather
than the unshaded, stock Kryo that Spark is expecting here.


I just so happen to have a patch which reintroduces the shading of Kryo
(motivated by other factors): https://github.com/apache/spark/pull/12215;
there's a chance that a backport of this patch might fix this problem.

However, I'm a bit curious about how your classpath is set up and why stock
1.2.1's shaded Kryo is being used here.

/cc +Marcelo Vanzin <van...@cloudera.com> and +Steve Loughran
<ste...@hortonworks.com>, who may know more.

On Wed, Apr 6, 2016 at 6:08 PM Soam Acharya <s...@altiscale.com> wrote:

> Hi folks,
>
> I have a build of Spark 1.6.1 on which spark sql seems to be functional
> outside of windowing functions. For example, I can create a simple external
> table via Hive:
>
> CREATE EXTERNAL TABLE PSTable (pid int, tty string, time string, cmd
> string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ','
> LINES TERMINATED BY '\n'
> STORED AS TEXTFILE
> LOCATION '/user/test/ps';
>
> Ensure that the table is pointing to some valid data, set up spark sql to
> point to the Hive metastore (we're running Hive 1.2.1) and run a basic test:
>
> spark-sql> select * from PSTable;
> 7239    pts/0   00:24:31        java
> 9993    pts/9   00:00:00        ps
> 9994    pts/9   00:00:00        tail
> 9995    pts/9   00:00:00        sed
> 9996    pts/9   00:00:00        sed
>
> But when I try to run a windowing function which I know runs onHive, I get:
>
> spark-sql> select a.pid ,a.time, a.cmd, min(a.time) over (partition by
> a.cmd order by a.time ) from PSTable a;
> org.apache.spark.SparkException: Task not serializable
>         at
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
>         at
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
>         at
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
>         at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
> :
> :
> Caused by: java.lang.ClassCastException:
> org.apache.hive.com.esotericsoftware.kryo.Kryo cannot be cast to
> com.esotericsoftware.kryo.Kryo
>         at
> org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.serializePlan(HiveShim.scala:178)
>         at
> org.apache.spark.sql.hive.HiveShim$HiveFunctionWrapper.writeExternal(HiveShim.scala:191)
>         at
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1458)
>         at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429)
>         at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>
> Any thoughts or ideas would be appreciated!
>
> Regards,
>
> Soam
>

Re: Kryo serialization mismatch in spark sql windowing function

Reply via email to