I've been able to load a different avro file based on GenericRecord with: val person = sqlContext.avroFile("/tmp/person.avro")
When I try to call `first()` on it, I get "NotSerializableException" exceptions again: person.first() ... 14/11/21 12:59:17 ERROR Executor: Exception in task 0.0 in stage 14.0 (TID 20) java.io.NotSerializableException: org.apache.avro.generic.GenericData$Record at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) ... Apart from this I want to transform the records into pairs of (user_id, record). I can do this by specifying the offset of the user_id column with something like this: person.map(r => (r.getInt(2), r)).take(4).collect() Is there any way to be able to specify the column name ("user_id") instead of needing to know/calculate the offset somehow? Thanks again On Fri, Nov 21, 2014 at 11:48 AM, thomas j <beanb...@googlemail.com> wrote: > Thanks for the pointer Michael. > > I've downloaded spark 1.2.0 from > https://people.apache.org/~pwendell/spark-1.2.0-snapshot1/ and clone and > built the spark-avro repo you linked to. > > When I run it against the example avro file linked to in the documentation > it works. However, when I try to load my avro file (linked to in my > original question) I receive the following error: > > java.lang.RuntimeException: Unsupported type LONG > at scala.sys.package$.error(package.scala:27) > at com.databricks.spark.avro.AvroRelation.com > $databricks$spark$avro$AvroRelation$$toSqlType(AvroRelation.scala:116) > at > com.databricks.spark.avro.AvroRelation$$anonfun$5.apply(AvroRelation.scala:97) > at > com.databricks.spark.avro.AvroRelation$$anonfun$5.apply(AvroRelation.scala:96) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > ... > > If this is useful I'm happy to try loading the various different avro > files I have to try to battle-test spark-avro. > > Thanks > > On Thu, Nov 20, 2014 at 6:30 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> One option (starting with Spark 1.2, which is currently in preview) is to >> use the Avro library for Spark SQL. This is very new, but we would love to >> get feedback: https://github.com/databricks/spark-avro >> >> On Thu, Nov 20, 2014 at 10:19 AM, al b <beanb...@googlemail.com> wrote: >> >>> I've read several posts of people struggling to read avro in spark. The >>> examples I've tried don't work. When I try this solution ( >>> https://stackoverflow.com/questions/23944615/how-can-i-load-avros-in-spark-using-the-schema-on-board-the-avro-files) >>> I get errors: >>> >>> spark java.io.NotSerializableException: >>> org.apache.avro.mapred.AvroWrapper >>> >>> How can I read the following sample file in spark using scala? >>> >>> http://www.4shared.com/file/SxnYcdgJce/sample.html >>> >>> Thomas >>> >> >> >