I am generating a set of tables in pyspark SQL from a JSON source dataset. I am 
writing those tables to disk as CSVs using 
df.write.format(com.databricks.spark.csv).save(…). I have a schema like:

root
 |-- col_1: string (nullable = true)
 |-- col_2: string (nullable = true)
 |-- col_3: timestamp (nullable = true)
...
 |-- properties: struct (nullable = true)
 |    |-- prop_1: string (nullable = true)
 |    |-- prop_2: string (nullable = true)
 |    |-- prop3: string (nullable = true)
…

Currently I am dropping the properties section when I write to CSV, but I would 
like to write it as a JSON column. How can I go about this? My final result 
would be a CSV with col_1, col_2, col_3 as usual but the ‘properties’ column 
would contain formatted JSON objects.

Thanks

Reply via email to