it works in 2.0.0-SNAPSHOT On Mon, Feb 22, 2016 at 6:24 PM, Michael Armbrust <mich...@databricks.com> wrote:
> I think this will be fixed in 1.6.1. Can you test when we post the first > RC? (hopefully later today) > > On Mon, Feb 22, 2016 at 1:51 PM, Daniel Siegmann < > daniel.siegm...@teamaol.com> wrote: > >> Experimenting with datasets in Spark 1.6.0 I ran into a serialization >> error when using case classes containing a Seq member. There is no >> problem when using Array instead. Nor is there a problem using RDD or >> DataFrame (even if converting the DF to a DS later). >> >> Here's an example you can test in the Spark shell: >> >> import sqlContext.implicits._ >> >> case class SeqThing(id: String, stuff: Seq[Int]) >> val seqThings = Seq(SeqThing("A", Seq())) >> val seqData = sc.parallelize(seqThings) >> >> case class ArrayThing(id: String, stuff: Array[Int]) >> val arrayThings = Seq(ArrayThing("A", Array())) >> val arrayData = sc.parallelize(arrayThings) >> >> >> // Array works fine >> arrayData.collect() >> arrayData.toDF.as[ArrayThing] >> arrayData.toDS >> >> // Seq can't convert directly to DS >> seqData.collect() >> seqData.toDF.as[SeqThing] >> seqData.toDS // Serialization exception >> >> Is this working as intended? Are there plans to support serializing >> arbitrary Seq values in datasets, or must everything be converted to >> Array? >> >> ~Daniel Siegmann >> > >