Re: Spark SQL JSON Column Support

2016-09-29 Thread Cody Koeninger
Totally agree that specifying the schema manually should be the baseline. LGTM, thanks for working on it. Seems like it looks good to others too judging by the comment on the PR that it's getting merged to master :) On Thu, Sep 29, 2016 at 2:13 PM, Michael Armbrust wrote: >> Will this be able t

Re: Spark SQL JSON Column Support

2016-09-29 Thread Michael Armbrust
> > Will this be able to handle projection pushdown if a given job doesn't > utilize all the columns in the schema? Or should people have a per-job schema? > As currently written, we will do a little bit of extra work to pull out fields that aren't needed. I think it would be pretty straight fo

Re: Spark SQL JSON Column Support

2016-09-29 Thread Cody Koeninger
Will this be able to handle projection pushdown if a given job doesn't utilize all the columns in the schema? Or should people have a per-job schema? On Wed, Sep 28, 2016 at 2:17 PM, Michael Armbrust wrote: > Burak, you can configure what happens with corrupt records for the > datasource using t

Re: Spark SQL JSON Column Support

2016-09-28 Thread Michael Armbrust
Burak, you can configure what happens with corrupt records for the datasource using the parse mode. The parse will still fail, so we can't get any data out of it, but we do leave the JSON in another column for you to inspect. In the case of this function, we'll just return null if its unparable.

Re: Spark SQL JSON Column Support

2016-09-28 Thread Michael Segel
Silly question? When you talk about ‘user specified schema’ do you mean for the user to supply an additional schema, or that you’re using the schema that’s described by the JSON string? (or both? [either/or] ) Thx On Sep 28, 2016, at 12:52 PM, Michael Armbrust mailto:mich...@databricks.com>>

Re: Spark SQL JSON Column Support

2016-09-28 Thread Burak Yavuz
I would really love something like this! It would be great if it doesn't throw away corrupt_records like the Data Source. On Wed, Sep 28, 2016 at 11:02 AM, Nathan Lande wrote: > We are currently pulling out the JSON columns, passing them through > read.json, and then joining them back onto the i

Re: Spark SQL JSON Column Support

2016-09-28 Thread Nathan Lande
We are currently pulling out the JSON columns, passing them through read.json, and then joining them back onto the initial DF so something like from_json would be a nice quality of life improvement for us. On Wed, Sep 28, 2016 at 10:52 AM, Michael Armbrust wrote: > Spark SQL has great support fo

Spark SQL JSON Column Support

2016-09-28 Thread Michael Armbrust
Spark SQL has great support for reading text files that contain JSON data. However, in many cases the JSON data is just one column amongst others. This is particularly true when reading from sources such as Kafka. This PR adds a new functions from_json t