Here are the complete examples to illustrate the problem:

0: jdbc:drill:zk=local> select id, t.batters.batter from 
dfs.`c:\tmp\sample.json` t;
+----+--------+
| id | EXPR$1 |
+----+--------+
| 0001 | 
[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's
 Food"}] |
| 0002 | [{"id":"2001","type":"Regular"}] |
| 0003 | [{"id":"3001","type":"Regular"},{"id":"3002","type":"Chocolate"}] |
+----+--------+
3 rows selected (0.174 seconds)


0: jdbc:drill:zk=local> select id, t.batters.batter.id from 
dfs.`c:\tmp\sample.json` t;
+-------+---------+
|  id   | EXPR$1  |
+-------+---------+
| 0001  | 1001    | <-- OK
| 0002  | 1002    | <-- Wrong, 1002 is not associated with 0002
| 0003  | 1003    | <-- Wrong, 1003 is not associated with 0003
+-------+---------+
3 rows selected (0.182 seconds)

-- Jiang



-----Original Message-----
From: Jiang Wu [mailto:jiang...@numerxdata.com] 
Sent: Thursday, March 10, 2016 6:16 PM
To: user@drill.apache.org
Subject: RE: Question on nested JSON behavior

One problem with the current behavior is that the results are not correlated to 
the JSON row object.  It looks like the code simply picks the first N values 
from the union of all values across all rows.  The N is the number of rows in 
the result.

For example, if I give this query:

0: jdbc:drill:zk=local> select id, t.batters.batter.id from 
dfs.`c:\tmp\sample.json` t;
+-------+---------+
|  id   | EXPR$1  |
+-------+---------+
| 0001  | 1001    |
| 0002  | 1002    |
| 0003  | 1003    |
+-------+---------+
3 rows selected (0.182 seconds)

The above cannot be correct because "1001", "1002", and "1003" never appears in 
the the row with id being "0001".  

Returning an error will be better than returning the wrong answer.

-- Jiang

-----Original Message-----
From: Steven Phillips [mailto:ste...@dremio.com]
Sent: Thursday, March 10, 2016 5:41 PM
To: user <user@drill.apache.org>
Subject: Re: Question on nested JSON behavior

I am surprised that you are getting that result. I would have expected the 
query to fail. Since batter is an array, you should specify the index of the 
array if yo want to access lower level elements.

A way to access all of the sub-fields of a repeated map is something we've 
discussed, but never implemented. Until it is implemented, I think the correct 
behavior is for this query to fail.

On Thu, Mar 10, 2016 at 4:42 PM, Neeraja Rentachintala < 
nrentachint...@maprtech.com> wrote:

> Actually I agree with Jiang. The result does seem unintuitive. If it 
> is a file with just a list, it does still make sense to return the ids 
> in that list as an array unless the user has configured Drill to 
> automatically flatten the first level.
> Does anyone know how does the other systems behave for this use case? 
> (for
> ex: Mongo)
>
>
>
> On Thu, Mar 10, 2016 at 4:21 PM, Nathan Griffith 
> <ngriff...@dremio.com>
> wrote:
>
> > Hi Jiang,
> >
> > Think of it this way: If you had a file that was just the list:
> >
> > {"id":"1001","type":"Regular"}
> > {"id":"1002","type":"Chocolate"}
> > {"id":"1003","type":"Blueberry"}
> > {"id":"1004","type":"Devil's Food"}
> >
> > What would you like it to return when you query:
> >
> > select id from dfs.`/path/to/sample_file.json`;
> >
> > ?
> >
> > When you enter the query that you're asking about, you're indicating 
> > exactly that structure of data. Does this explanation make sense?
> >
> > Best,
> > Nathan
> >
> > On Thu, Mar 10, 2016 at 4:07 PM, Jiang Wu <jiang...@numerxdata.com>
> wrote:
> > > Drill version: 1.4.0.  Assuming 3 JSON objects with the following
> > structure:
> > >
> > >   {  ...
> > >     "batters":
> > >       {
> > >         "batter":
> > >           [
> > >             { "id": "1001", "type": "Regular" },
> > >             { "id": "1002", "type": "Chocolate" },
> > >             { "id": "1003", "type": "Blueberry" },
> > >             { "id": "1004", "type": "Devil's Food" }
> > >           ]
> > >       },
> > >     ...
> > >   }
> > >
> > > Now running a few sample queries against the above data:
> > >
> > >
> > > A)      select "batters" returns expected results, which are the values
> > of "batters" from each row.
> > >
> > > 0: jdbc:drill:zk=local> select batters from 
> > > dfs.`c:\tmp\sample.json`;
> > > +---------+
> > > | batters |
> > > +---------+
> > > |
> >
> {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocola
> te"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's
> > Food"}]} |
> > > | {"batter":[{"id":"1001","type":"Regular"}]} |
> > > |
> >
> {"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocola
> te"}]}
> > |
> > > +---------+
> > > 3 rows selected (0.243 seconds)
> > >
> > >
> > > B)      select "batters.batter" also returns the expected results,
> which
> > are the array values for "batters.batter" from each row.
> > >
> > >
> > > 0: jdbc:drill:zk=local> select t.batters.batter from
> > dfs.`c:\tmp\sample.json` t;
> > > +--------+
> > > | EXPR$0 |
> > > +--------+
> > > |
> >
> [{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id"
> :"1003","type":"Blueberry"},{"id":"1004","type":"Devil's
> > Food"}] |
> > > | [{"id":"1001","type":"Regular"}] | 
> > > | [{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"}
> > > | ] |
> > > +--------+
> > > 3 rows selected (0.198 seconds)
> > >
> > >
> > > C)  select "batters.batter.id" returns something unexpected:
> > >
> > > 0: jdbc:drill:zk=local> select t.batters.batter.id from
> > dfs.`c:\tmp\sample.json` t;
> > > +---------+
> > > | EXPR$0  |
> > > +---------+
> > > | 1001    |
> > > | 1002    |
> > > | 1003    |
> > > +---------+
> > >
> > > The above result doesn't make sense.  The result looks like the 3
> values
> > from row 1. Should the result be the following instead?
> > >
> > > +---------+
> > > | EXPR$0  |
> > > +---------+
> > > | [1001, 1002, 1003, 1004]    |
> > > | [1001]    |
> > > | [1001, 1002]    |
> > > +---------+
> > >
> > > Any hints on what is happening here?  Thanks.
> > >
> > > -- Jiang
> > >
> >
>

Reply via email to