The following should work: val schema = implicitly[org.apache.spark.sql.Encoder[Course]].schema spark.read.schema(schema).parquet("data.parquet").as[Course]
Note this will only work for nullable files (i.e. if you add a primitive like Int you need to make it an Option[Int]) On Sun, Apr 30, 2017 at 9:12 PM, Mike Wheeler <rotationsymmetr...@gmail.com> wrote: > Hi Spark Users, > > Suppose I have some data (stored in parquet for example) generated as > below: > > package com.company.entity.old > case class Course(id: Int, students: List[Student]) > case class Student(name: String) > > Then usually I can access the data by > > spark.read.parquet("data.parquet").as[Course] > > Now I want to add a new field `address` to Student: > > package com.company.entity.new > case class Course(id: Int, students: List[Student]) > case class Student(name: String, address: String) > > Then obviously running `spark.read.parquet("data.parquet").as[Course]` > on data generated by the old entity/schema will fail because `address` > is missing. > > In this case, what is the best practice to read data generated with > the old entity/schema to the new entity/schema, with the missing field > set to some default value? I know I can manually write a function to > do the transformation from the old to the new. But it is kind of > tedious. Any automatic methods? > > Thanks, > > Mike > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >