Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?

Jules Damji Sat, 06 Jan 2018 16:16:56 -0800

Here’s are couple tutorial that shows how to extract Structured nested data


https://databricks.com/blog/2017/06/27/4-sql-high-order-lambda-functions-examine-complex-structured-data-databricks.html

https://databricks.com/blog/2017/06/13/five-spark-sql-utility-functions-extract-explore-complex-data-types.html

Sent from my iPhone
Pardon the dumb thumb typos :)

> On Jan 6, 2018, at 11:42 AM, Hien Luu <[email protected]> wrote:
> 
> Hi Kant,
> 
> I am not sure whether you had come up with a solution yet, but the following
> works for me (in Scala)
> 
> val emp_info = """
>  [ 
>    {"name": "foo", "address": {"state": "CA", "country": "USA"},
> "docs":[{"subject": "english", "year": 2016}]},
>    {"name": "bar", "address": {"state": "OH", "country": "USA"},
> "docs":[{"subject": "math", "year": 2017}]} 
>  ]"""
> 
> import org.apache.spark.sql.types._
> 
> val addressSchema = new StructType().add("state", StringType).add("country",
> StringType)
> val docsSchema = ArrayType(new StructType().add("subject",
> StringType).add("year", IntegerType))
> val employeeSchema = new StructType().add("name", StringType).add("address",
> addressSchema).add("docs", docsSchema)
> 
> val empInfoSchema = ArrayType(employeeSchema)
> 
> empInfoSchema.json
> 
> val empInfoStrDF = Seq((emp_info)).toDF("emp_info_str")
> empInfoStrDF.printSchema
> empInfoStrDF.show(false)
> 
> val empInfoDF = empInfoStrDF.select(from_json('emp_info_str,
> empInfoSchema).as("emp_info"))
> empInfoDF.printSchema
> 
> empInfoDF.select(struct("*")).show(false)
> 
> empInfoDF.select("emp_info.name", "emp_info.address",
> "emp_info.docs").show(false)
> 
> empInfoDF.select(explode('emp_info.getItem("name"))).show
> 
> 
> 
> 
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>

Re: How to convert Array of Json rows into Dataset of specific columns in Spark 2.2.0?

Reply via email to