Querying on Deeply Nested JSON Structures

2017-07-15 Thread Patrick
Hi, We need to query deeply nested Json structure. However query is on a single field at a nested level such as mean, median, mode. I am aware of the sql explode function. df = df_nested.withColumn('exploded', explode(top)) But this is too slow. Is there any other strategy that could give us t

Re: Querying on Deeply Nested JSON Structures

2017-07-15 Thread Matt Deaver
I would love to be told otherwise, but I believe your options are to either 1) use the explode function or 2) pre-process the data so you don't have to explode it. On Jul 15, 2017 11:41 AM, "Patrick" wrote: > Hi, > > We need to query deeply nested Json structure. However query is on a > single f

Re: Querying on Deeply Nested JSON Structures

2017-07-16 Thread Burak Yavuz
Have you checked out this blog post? https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html Shows tools and tips on how to work with nested data. You can access data through `field1.field2.field3` and such with JSON. Best, Burak On Sat, Jul