This is a kryo issue. https://github.com/EsotericSoftware/kryo/issues/124. It has to do with the lengths of the fieldnames. This issue is fixed in Kryo 2.23.
What's weird is this doesn't break on Hive itself, only when using SparkSQL. Attached is the full stacktrace. It might be how SparkSQL is interacting with Hive that's making this break. Breaking the aforementioned collection of structs into smaller structs, or renaming them to be shorter is a ugly workaround. On Thu, May 28, 2015 at 3:21 PM, yluo <y...@groupon.com> wrote: > Hi all, I'm using Spark 1.3.1 with Hive 0.13.1. When running a UDF > accessing > a hive struct array the query fails with: > > Caused by: com.esotericsoftware.kryo.KryoException: Buffer underflow. > Serialization trace: > fieldName > > (org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector$MyField) > fields > > (org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector) > listElementObjectInspector > (org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector) > argStructArrayOI (com.groupon.hive.udf.filter.StructStringMemberFilterUDF) > at com.esotericsoftware.kryo.io.Input.require(Input.java:156) > at > com.esotericsoftware.kryo.io.Input.readAscii_slow(Input.java:580) > at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:558) > at com.esotericsoftware.kryo.io.Input.readString(Input.java:436) > at > > com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:157) > at > > com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:146) > at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) > at > > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) > at > > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) > at > > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109) > at > > com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648) > at > > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605) > at > > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648) > at > > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605) > at > > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648) > at > > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605) > at > > com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) > at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626) > at > > org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:918) > ... 102 more > > Anyone seen anything similar? argStructArrayOI is a Hive > ListObjectInspector. The field the argStructArrayOI is accessing looks > like: > > > array<struct<order_by_id:bigint,subscription_id:bigint,unsubscribe_hash:string,country_id:int,optin_hash:string,city_part_id:bigint,subscription_type:string,locale:string>> > > The table is a hive table. > > Running the same query on Hive works... what's going on here? Any > suggestions on how to debug this? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/UDF-accessing-hive-struct-array-fails-with-buffer-underflow-from-kryo-tp23078.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Thanks, Yutong
[1] com.esotericsoftware.kryo.io.Input.require (Input.java:156) [2] com.esotericsoftware.kryo.io.Input.readAscii_slow (Input.java:580) [3] com.esotericsoftware.kryo.io.Input.readAscii (Input.java:558) [4] com.esotericsoftware.kryo.io.Input.readString (Input.java:436) [5] com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read (DefaultSerializers.java:157) [6] com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read (DefaultSerializers.java:146) [7] com.esotericsoftware.kryo.Kryo.readObjectOrNull (Kryo.java:699) [8] com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read (FieldSerializer.java:611) [9] com.esotericsoftware.kryo.serializers.FieldSerializer.read (FieldSerializer.java:221) [10] com.esotericsoftware.kryo.Kryo.readClassAndObject (Kryo.java:729) [11] com.esotericsoftware.kryo.serializers.CollectionSerializer.read (CollectionSerializer.java:109) [12] com.esotericsoftware.kryo.serializers.CollectionSerializer.read (CollectionSerializer.java:18) [13] com.esotericsoftware.kryo.Kryo.readObject (Kryo.java:648) [14] com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read (FieldSerializer.java:605) [15] com.esotericsoftware.kryo.serializers.FieldSerializer.read (FieldSerializer.java:221) [16] com.esotericsoftware.kryo.Kryo.readObject (Kryo.java:648) [17] com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read (FieldSerializer.java:605) [18] com.esotericsoftware.kryo.serializers.FieldSerializer.read (FieldSerializer.java:221) [19] com.esotericsoftware.kryo.Kryo.readObject (Kryo.java:648) [20] com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read (FieldSerializer.java:605) [21] com.esotericsoftware.kryo.serializers.FieldSerializer.read (FieldSerializer.java:221) [22] com.esotericsoftware.kryo.Kryo.readObject (Kryo.java:626) [23] org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo (Utilities.java:918) [24] sun.reflect.NativeMethodAccessorImpl.invoke0 (native method) [25] sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:57) [26] sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) [27] java.lang.reflect.Method.invoke (Method.java:606) [28] org.apache.spark.sql.hive.HiveFunctionWrapper.deserializePlan (Shim13.scala:90) [29] org.apache.spark.sql.hive.HiveFunctionWrapper.readExternal (Shim13.scala:131) [30] java.io.ObjectInputStream.readExternalData (ObjectInputStream.java:1,837) [31] java.io.ObjectInputStream.readOrdinaryObject (ObjectInputStream.java:1,796) [32] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350) [33] java.io.ObjectInputStream.defaultReadFields (ObjectInputStream.java:1,990) [34] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915) [35] java.io.ObjectInputStream.readOrdinaryObject (ObjectInputStream.java:1,798) [36] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350) [37] java.io.ObjectInputStream.defaultReadFields (ObjectInputStream.java:1,990) [38] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915) [39] java.io.ObjectInputStream.readOrdinaryObject (ObjectInputStream.java:1,798) [40] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350) [41] java.io.ObjectInputStream.readArray (ObjectInputStream.java:1,706) [42] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,344) [43] java.io.ObjectInputStream.defaultReadFields (ObjectInputStream.java:1,990) [44] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915) [45] java.io.ObjectInputStream.readOrdinaryObject (ObjectInputStream.java:1,798) [46] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350) [47] java.io.ObjectInputStream.defaultReadFields (ObjectInputStream.java:1,990) [48] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915) [49] java.io.ObjectInputStream.readOrdinaryObject (ObjectInputStream.java:1,798) [50] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350) [51] java.io.ObjectInputStream.defaultReadFields (ObjectInputStream.java:1,990) [52] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915) [53] java.io.ObjectInputStream.readOrdinaryObject (ObjectInputStream.java:1,798) [54] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350) [55] java.io.ObjectInputStream.defaultReadFields (ObjectInputStream.java:1,990) [56] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915) [57] java.io.ObjectInputStream.readOrdinaryObject (ObjectInputStream.java:1,798) [58] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350) [59] java.io.ObjectInputStream.defaultReadFields (ObjectInputStream.java:1,990) [60] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915) [61] java.io.ObjectInputStream.readOrdinaryObject (ObjectInputStream.java:1,798) [62] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350) [63] java.io.ObjectInputStream.defaultReadFields (ObjectInputStream.java:1,990) [64] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915) [65] java.io.ObjectInputStream.readOrdinaryObject (ObjectInputStream.java:1,798) [66] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350) [67] java.io.ObjectInputStream.readObject (ObjectInputStream.java:370) [68] scala.collection.immutable.$colon$colon.readObject (List.scala:362) [69] sun.reflect.GeneratedMethodAccessor3.invoke (null) [70] sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) [71] java.lang.reflect.Method.invoke (Method.java:606) [72] java.io.ObjectStreamClass.invokeReadObject (ObjectStreamClass.java:1,017) [73] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,893) [74] java.io.ObjectInputStream.readOrdinaryObject (ObjectInputStream.java:1,798) [75] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350) [76] java.io.ObjectInputStream.defaultReadFields (ObjectInputStream.java:1,990) [77] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915) [78] java.io.ObjectInputStream.readOrdinaryObject (ObjectInputStream.java:1,798) [79] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350) [80] java.io.ObjectInputStream.defaultReadFields (ObjectInputStream.java:1,990) [81] java.io.ObjectInputStream.readSerialData (ObjectInputStream.java:1,915) [82] java.io.ObjectInputStream.readOrdinaryObject (ObjectInputStream.java:1,798) [83] java.io.ObjectInputStream.readObject0 (ObjectInputStream.java:1,350) [84] java.io.ObjectInputStream.readObject (ObjectInputStream.java:370) [85] org.apache.spark.serializer.JavaDeserializationStream.readObject (JavaSerializer.scala:62) [86] org.apache.spark.serializer.JavaSerializerInstance.deserialize (JavaSerializer.scala:87) [87] org.apache.spark.scheduler.ResultTask.runTask (ResultTask.scala:57) [88] org.apache.spark.scheduler.Task.run (Task.scala:56) [89] org.apache.spark.executor.Executor$TaskRunner.run (Executor.scala:200) [90] java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1,145) [91] java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:615) [92] java.lang.Thread.run (Thread.java:745)
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org