Hi Spark Users,
Suppose I have some data (stored in parquet for example) generated as below:
package com.company.entity.old
case class Course(id: Int, students: List[Student])
case class Student(name: String)
Then usually I can access the data by
spark.read.parquet("data.parquet").as[Course]
Now I want to add a new field `address` to Student:
package com.company.entity.new
case class Course(id: Int, students: List[Student])
case class Student(name: String, address: String)
Then obviously running `spark.read.parquet("data.parquet").as[Course]`
on data generated by the old entity/schema will fail because `address`
is missing.
In this case, what is the best practice to read data generated with
the old entity/schema to the new entity/schema, with the missing field
set to some default value? I know I can manually write a function to
do the transformation from the old to the new. But it is kind of
tedious. Any automatic methods?
Thanks,
Mike
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]