Hi Spark Users, Suppose I have some data (stored in parquet for example) generated as below:
package com.company.entity.old case class Course(id: Int, students: List[Student]) case class Student(name: String) Then usually I can access the data by spark.read.parquet("data.parquet").as[Course] Now I want to add a new field `address` to Student: package com.company.entity.new case class Course(id: Int, students: List[Student]) case class Student(name: String, address: String) Then obviously running `spark.read.parquet("data.parquet").as[Course]` on data generated by the old entity/schema will fail because `address` is missing. In this case, what is the best practice to read data generated with the old entity/schema to the new entity/schema, with the missing field set to some default value? I know I can manually write a function to do the transformation from the old to the new. But it is kind of tedious. Any automatic methods? Thanks, Mike --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org