Unfortunately there is not an easy way to add nested columns (though I do
think we should implement the API you attempted to use).
You'll have to build the struct manually.
allData.withColumn("student", struct($"student.name",
coalesce($"student.age", lit(0)) as 'age)
You could automate the cons
Hi Michael,
Thank you for the suggestions. I am wondering how I can make `withColumn`
to handle nested structure?
For example, below is my code to generate the data. I basically add the
`age` field to `Person2`, which is nested in an Array for Course2. Then I
want to fill in 0 for age with age is
Oh, and if you want a default other than null:
import org.apache.spark.sql.functions._
df.withColumn("address", coalesce($"address", lit())
On Mon, May 1, 2017 at 10:29 AM, Michael Armbrust
wrote:
> The following should work:
>
> val schema = implicitly[org.apache.spark.sql.Encoder[Course]].sch
The following should work:
val schema = implicitly[org.apache.spark.sql.Encoder[Course]].schema
spark.read.schema(schema).parquet("data.parquet").as[Course]
Note this will only work for nullable files (i.e. if you add a primitive
like Int you need to make it an Option[Int])
On Sun, Apr 30, 2017
Hi Spark Users,
Suppose I have some data (stored in parquet for example) generated as below:
package com.company.entity.old
case class Course(id: Int, students: List[Student])
case class Student(name: String)
Then usually I can access the data by
spark.read.parquet("data.parquet").as[Course]
N