This is a kryo issue. https://github.com/EsotericSoftware/kryo/issues/124.
It has to do with the lengths of the fieldnames. This issue is fixed in
Kryo 2.23.

What's weird is this doesn't break on Hive itself, only when using
SparkSQL. Attached is the full stacktrace. It might be how SparkSQL is
interacting with Hive that's making this break.

Breaking the aforementioned collection of structs into smaller structs, or
renaming them to be shorter is a ugly workaround.


On Thu, May 28, 2015 at 3:21 PM, yluo <y...@groupon.com> wrote:

> Hi all, I'm using Spark 1.3.1 with Hive 0.13.1. When running a UDF
> accessing
> a hive struct array the query fails with:
>
> Caused by: com.esotericsoftware.kryo.KryoException: Buffer underflow.
> Serialization trace:
> fieldName
>
> (org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector$MyField)
> fields
>
> (org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector)
> listElementObjectInspector
> (org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector)
> argStructArrayOI (com.groupon.hive.udf.filter.StructStringMemberFilterUDF)
>         at com.esotericsoftware.kryo.io.Input.require(Input.java:156)
>         at
> com.esotericsoftware.kryo.io.Input.readAscii_slow(Input.java:580)
>         at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:558)
>         at com.esotericsoftware.kryo.io.Input.readString(Input.java:436)
>         at
>
> com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:157)
>         at
>
> com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:146)
>         at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
>         at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
>         at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>         at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>         at
>
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109)
>         at
>
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>         at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
>         at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
>         at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>         at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
>         at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
>         at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>         at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
>         at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
>         at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>         at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
>         at
>
> org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:918)
>         ... 102 more
>
> Anyone seen anything similar? argStructArrayOI is a Hive
> ListObjectInspector. The field the argStructArrayOI is accessing looks
> like:
>
>
> array<struct&lt;order_by_id:bigint,subscription_id:bigint,unsubscribe_hash:string,country_id:int,optin_hash:string,city_part_id:bigint,subscription_type:string,locale:string>>
>
> The table is a hive table.
>
> Running the same query on Hive works... what's going on here? Any
> suggestions on how to debug this?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/UDF-accessing-hive-struct-array-fails-with-buffer-underflow-from-kryo-tp23078.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Thanks,
Yutong
  [1] com.esotericsoftware.kryo.io.Input.require (Input.java:156)
  [2] com.esotericsoftware.kryo.io.Input.readAscii_slow (Input.java:580)
  [3] com.esotericsoftware.kryo.io.Input.readAscii (Input.java:558)
  [4] com.esotericsoftware.kryo.io.Input.readString (Input.java:436)
  [5] 
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read 
(DefaultSerializers.java:157)
  [6] 
com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read 
(DefaultSerializers.java:146)
  [7] com.esotericsoftware.kryo.Kryo.readObjectOrNull (Kryo.java:699)
  [8] com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read 
(FieldSerializer.java:611)
  [9] com.esotericsoftware.kryo.serializers.FieldSerializer.read 
(FieldSerializer.java:221)
  [10] com.esotericsoftware.kryo.Kryo.readClassAndObject (Kryo.java:729)
  [11] com.esotericsoftware.kryo.serializers.CollectionSerializer.read 
(CollectionSerializer.java:109)
  [12] com.esotericsoftware.kryo.serializers.CollectionSerializer.read 
(CollectionSerializer.java:18)
  [13] com.esotericsoftware.kryo.Kryo.readObject (Kryo.java:648)
  [14] com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read 
(FieldSerializer.java:605)
  [15] com.esotericsoftware.kryo.serializers.FieldSerializer.read 
(FieldSerializer.java:221)
  [16] com.esotericsoftware.kryo.Kryo.readObject (Kryo.java:648)
  [17] com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read 
(FieldSerializer.java:605)
  [18] com.esotericsoftware.kryo.serializers.FieldSerializer.read 
(FieldSerializer.java:221)
  [19] com.esotericsoftware.kryo.Kryo.readObject (Kryo.java:648)
  [20] com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read 
(FieldSerializer.java:605)
  [21] com.esotericsoftware.kryo.serializers.FieldSerializer.read 
(FieldSerializer.java:221)
  [22] com.esotericsoftware.kryo.Kryo.readObject (Kryo.java:626)
  [23] org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo 
(Utilities.java:918)
  [24] sun.reflect.NativeMethodAccessorImpl.invoke0 (native method)
  [25] sun.reflect.NativeMethodAccessorImpl.invoke 
(NativeMethodAccessorImpl.java:57)
  [26] sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
  [27] java.lang.reflect.Method.invoke (Method.java:606)
  [28] org.apache.spark.sql.hive.HiveFunctionWrapper.deserializePlan 
(Shim13.scala:90)
  [29] org.apache.spark.sql.hive.HiveFunctionWrapper.readExternal 
(Shim13.scala:131)
  [30] java.io.ObjectInputStream.readExternalData (ObjectInputStream.java:1,837)
  [31] java.io.ObjectInputStream.readOrdinaryObject 
(ObjectInputStream.java:1,796)
  [32] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350)
  [33] java.io.ObjectInputStream.defaultReadFields 
(ObjectInputStream.java:1,990)
  [34] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915)
  [35] java.io.ObjectInputStream.readOrdinaryObject 
(ObjectInputStream.java:1,798)
  [36] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350)
  [37] java.io.ObjectInputStream.defaultReadFields 
(ObjectInputStream.java:1,990)
  [38] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915)
  [39] java.io.ObjectInputStream.readOrdinaryObject 
(ObjectInputStream.java:1,798)
  [40] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350)
  [41] java.io.ObjectInputStream.readArray (ObjectInputStream.java:1,706)
  [42] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,344)
  [43] java.io.ObjectInputStream.defaultReadFields 
(ObjectInputStream.java:1,990)
  [44] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915)
  [45] java.io.ObjectInputStream.readOrdinaryObject 
(ObjectInputStream.java:1,798)
  [46] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350)
  [47] java.io.ObjectInputStream.defaultReadFields 
(ObjectInputStream.java:1,990)
  [48] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915)
  [49] java.io.ObjectInputStream.readOrdinaryObject 
(ObjectInputStream.java:1,798)
  [50] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350)
  [51] java.io.ObjectInputStream.defaultReadFields 
(ObjectInputStream.java:1,990)
  [52] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915)
  [53] java.io.ObjectInputStream.readOrdinaryObject 
(ObjectInputStream.java:1,798)
  [54] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350)
  [55] java.io.ObjectInputStream.defaultReadFields 
(ObjectInputStream.java:1,990)
  [56] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915)
  [57] java.io.ObjectInputStream.readOrdinaryObject 
(ObjectInputStream.java:1,798)
  [58] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350)
  [59] java.io.ObjectInputStream.defaultReadFields 
(ObjectInputStream.java:1,990)
  [60] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915)
  [61] java.io.ObjectInputStream.readOrdinaryObject 
(ObjectInputStream.java:1,798)
  [62] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350)
  [63] java.io.ObjectInputStream.defaultReadFields 
(ObjectInputStream.java:1,990)
  [64] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915)
  [65] java.io.ObjectInputStream.readOrdinaryObject 
(ObjectInputStream.java:1,798)
  [66] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350)
  [67] java.io.ObjectInputStream.readObject (ObjectInputStream.java:370)
  [68] scala.collection.immutable.$colon$colon.readObject (List.scala:362)
  [69] sun.reflect.GeneratedMethodAccessor3.invoke (null)
  [70] sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
  [71] java.lang.reflect.Method.invoke (Method.java:606)
  [72] java.io.ObjectStreamClass.invokeReadObject (ObjectStreamClass.java:1,017)
  [73] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,893)
  [74] java.io.ObjectInputStream.readOrdinaryObject 
(ObjectInputStream.java:1,798)
  [75] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350)
  [76] java.io.ObjectInputStream.defaultReadFields 
(ObjectInputStream.java:1,990)
  [77] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915)
  [78] java.io.ObjectInputStream.readOrdinaryObject 
(ObjectInputStream.java:1,798)
  [79] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350)
  [80] java.io.ObjectInputStream.defaultReadFields 
(ObjectInputStream.java:1,990)
  [81] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915)
  [82] java.io.ObjectInputStream.readOrdinaryObject 
(ObjectInputStream.java:1,798)
  [83] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350)
  [84] java.io.ObjectInputStream.readObject (ObjectInputStream.java:370)
  [85] org.apache.spark.serializer.JavaDeserializationStream.readObject 
(JavaSerializer.scala:62)
  [86] org.apache.spark.serializer.JavaSerializerInstance.deserialize 
(JavaSerializer.scala:87)
  [87] org.apache.spark.scheduler.ResultTask.runTask (ResultTask.scala:57)
  [88] org.apache.spark.scheduler.Task.run (Task.scala:56)
  [89] org.apache.spark.executor.Executor$TaskRunner.run (Executor.scala:200)
  [90] java.util.concurrent.ThreadPoolExecutor.runWorker 
(ThreadPoolExecutor.java:1,145)
  [91] java.util.concurrent.ThreadPoolExecutor$Worker.run 
(ThreadPoolExecutor.java:615)
  [92] java.lang.Thread.run (Thread.java:745)
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to