Re: Question on nested JSON behavior

Aditya Thu, 10 Mar 2016 20:13:27 -0800

This depends how do you see the "batter" field.

If you see it as an array, the query should be run as "select id,
t.batters.batter[0].id"


But you can also view "batter" as a repeated map, which would mean "id" is
a child field of this map, repeated a number of time and projecting "
batters.batter.id" should return this field for each repeated instance.

This is what MongoDB is doing.

On Thu, Mar 10, 2016 at 7:54 PM, Jiang Wu <jiang...@numerxdata.com> wrote:

> For the those who are curious, here is what Mongo returns for the same
> conceptual query:
>
> > db.t1.find({},{"batters":1, "_id":0});
> { "batters" : { "batter" : [ { "id" : "1001", "type" : "Regular" }, { "id"
> : "1002", "type" : "Chocolate" }, { "id" : "1003", "type" : "Blueberry" },
> { "id" : "1004", "type" : "Devil's Food" } ] } }
> { "batters" : { "batter" : [ { "id" : "2001", "type" : "Regular" } ] } }
> { "batters" : { "batter" : [ { "id" : "3001", "type" : "Regular" }, { "id"
> : "3002", "type" : "Chocolate" } ] } }
>
> > db.t1.find({},{"batters.batter":1, "_id":0});
> { "batters" : { "batter" : [ { "id" : "1001", "type" : "Regular" }, { "id"
> : "1002", "type" : "Chocolate" }, { "id" : "1003", "type" : "Blueberry" },
> { "id" : "1004", "type" : "Devil's Food" } ] } }
> { "batters" : { "batter" : [ { "id" : "2001", "type" : "Regular" } ] } }
> { "batters" : { "batter" : [ { "id" : "3001", "type" : "Regular" }, { "id"
> : "3002", "type" : "Chocolate" } ] } }
>
> > db.t1.find({},{"batters.batter.id":1, "_id":0});
> { "batters" : { "batter" : [ { "id" : "1001" }, { "id" : "1002" }, { "id"
> : "1003" }, { "id" : "1004" } ] } }
> { "batters" : { "batter" : [ { "id" : "2001" } ] } }
> { "batters" : { "batter" : [ { "id" : "3001" }, { "id" : "3002" } ] } }
>
> Each returned document is always a JSON document from the root.  Hence, no
> ambiguity in the last case.  It is returning the inner most array with each
> array element having a single map key.
>
> To produce the same conceptual results, all values in the inner most array
> will need to returned for each row.
>
> -- Jiang
>
> -----Original Message-----
> From: Neeraja Rentachintala [mailto:nrentachint...@maprtech.com]
> Sent: Thursday, March 10, 2016 4:43 PM
> To: user@drill.apache.org
> Subject: Re: Question on nested JSON behavior
>
> Actually I agree with Jiang. The result does seem unintuitive. If it is a
> file with just a list, it does still make sense to return the ids in that
> list as an array unless the user has configured Drill to automatically
> flatten the first level.
> Does anyone know how does the other systems behave for this use case? (for
> ex: Mongo)
>
>
>
> On Thu, Mar 10, 2016 at 4:21 PM, Nathan Griffith <ngriff...@dremio.com>
> wrote:
>
> > Hi Jiang,
> >
> > Think of it this way: If you had a file that was just the list:
> >
> > {"id":"1001","type":"Regular"}
> > {"id":"1002","type":"Chocolate"}
> > {"id":"1003","type":"Blueberry"}
> > {"id":"1004","type":"Devil's Food"}
> >
> > What would you like it to return when you query:
> >
> > select id from dfs.`/path/to/sample_file.json`;
> >
> > ?
> >
> > When you enter the query that you're asking about, you're indicating
> > exactly that structure of data. Does this explanation make sense?
> >
> > Best,
> > Nathan
> >
> > On Thu, Mar 10, 2016 at 4:07 PM, Jiang Wu <jiang...@numerxdata.com>
> wrote:
> > > Drill version: 1.4.0.  Assuming 3 JSON objects with the following
> > structure:
> > >
> > >   {  ...
> > >     "batters":
> > >       {
> > >         "batter":
> > >           [
> > >             { "id": "1001", "type": "Regular" },
> > >             { "id": "1002", "type": "Chocolate" },
> > >             { "id": "1003", "type": "Blueberry" },
> > >             { "id": "1004", "type": "Devil's Food" }
> > >           ]
> > >       },
> > >     ...
> > >   }
> > >
> > > Now running a few sample queries against the above data:
> > >
> > >
> > > A)      select "batters" returns expected results, which are the values
> > of "batters" from each row.
> > >
> > > 0: jdbc:drill:zk=local> select batters from
> > > dfs.`c:\tmp\sample.json`;
> > > +---------+
> > > | batters |
> > > +---------+
> > > |
> > {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocola
> > te"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's
> > Food"}]} |
> > > | {"batter":[{"id":"1001","type":"Regular"}]} |
> > > |
> > {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocola
> > te"}]}
> > |
> > > +---------+
> > > 3 rows selected (0.243 seconds)
> > >
> > >
> > > B)      select "batters.batter" also returns the expected results,
> which
> > are the array values for "batters.batter" from each row.
> > >
> > >
> > > 0: jdbc:drill:zk=local> select t.batters.batter from
> > dfs.`c:\tmp\sample.json` t;
> > > +--------+
> > > | EXPR$0 |
> > > +--------+
> > > |
> > [{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id"
> > :"1003","type":"Blueberry"},{"id":"1004","type":"Devil's
> > Food"}] |
> > > | [{"id":"1001","type":"Regular"}] |
> > > | [{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"}]
> > > | |
> > > +--------+
> > > 3 rows selected (0.198 seconds)
> > >
> > >
> > > C)  select "batters.batter.id" returns something unexpected:
> > >
> > > 0: jdbc:drill:zk=local> select t.batters.batter.id from
> > dfs.`c:\tmp\sample.json` t;
> > > +---------+
> > > | EXPR$0  |
> > > +---------+
> > > | 1001    |
> > > | 1002    |
> > > | 1003    |
> > > +---------+
> > >
> > > The above result doesn't make sense.  The result looks like the 3
> > > values
> > from row 1. Should the result be the following instead?
> > >
> > > +---------+
> > > | EXPR$0  |
> > > +---------+
> > > | [1001, 1002, 1003, 1004]    |
> > > | [1001]    |
> > > | [1001, 1002]    |
> > > +---------+
> > >
> > > Any hints on what is happening here?  Thanks.
> > >
> > > -- Jiang
> > >
> >
>

Re: Question on nested JSON behavior

Reply via email to