Re: Using Spark to analyze complex JSON

Nicholas Chammas Wed, 21 May 2014 13:34:04 -0700

Looking forward to that update!

Given a table of JSON objects like this one:

{
   "name": "Nick",
   "location": {
      "x": 241.6,
      "y": -22.5
   },
   "likes": ["ice cream", "dogs", "Vanilla Ice"]}

It would be SUPER COOL if we could query that table in a way that is as
natural as follows:

SELECT DISTINCT nameFROM json_table;
SELECT MAX(location.x)FROM json_table;
SELECT likes[2] -- Ice Ice BabyFROM json_tableWHERE name = "Nick";

Of course, this is just a hand-wavy suggestion of how I’d like to be able
to query JSON (particularly that last example) using SQL. I’m interested in
seeing what y’all come up with.

A large part of what my team does is make it easy for analysts to explore
and query JSON data using SQL. We have a fairly complex home-grown process
to do that and are looking to replace it with something more out of the
box. So if you’d like more input on how users might use this feature, I’d
be glad to chime in.

Nick

On Wed, May 21, 2014 at 11:21 AM, Michael Armbrust
<mich...@databricks.com>wrote:

> You can already extract fields from json data using Hive UDFs.  We have an
> intern working on on better native support this summer.  We will be sure to
> post updates once there is a working prototype.
>
> Michael
>
>
> On Tue, May 20, 2014 at 6:46 PM, Nick Chammas 
> <nicholas.cham...@gmail.com>wrote:
>
>> The Apache Drill <http://incubator.apache.org/drill/> home page has an
>> interesting heading: "Liberate Nested Data".
>>
>> Is there any current or planned functionality in Spark SQL or Shark to
>> enable SQL-like querying of complex JSON?
>>
>> Nick
>>
>>
>> ------------------------------
>> View this message in context: Using Spark to analyze complex 
>> JSON<http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-to-analyze-complex-JSON-tp6146.html>
>> Sent from the Apache Spark User List mailing list 
>> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>>
>
>

Re: Using Spark to analyze complex JSON

Reply via email to