Actually not due to flatten, if you directly query the file, it will only
show the non-null values.

0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z.json` t;
+------------+
|   EXPR$0   |
+------------+
| [{"text":"GoPatriots"},{"text":"aaa"},{"text":"bbb"}] |
+------------+
1 row selected (0.123 seconds)

Thanks,
Hao

On Wed, Jan 21, 2015 at 4:18 PM, Andries Engelbrecht <
aengelbre...@maprtech.com> wrote:

> Very interesting, flatten seems to bypass empty records. Not sure if that
> is an ideal result for all use cases, but certainly usable in this case.
>
> Thanks
> —Andries
>
>
> On Jan 21, 2015, at 3:45 PM, Hao Zhu <h...@maprtech.com> wrote:
>
> > I can also fetch non-null values for attached json file which contains 6
> "entities", 3 of them are null, 3 of them have "text" value.
> >
> > Could you share your complete "twitter.json"?
> >
> > 0: jdbc:drill:> with tmp as
> > . . . . . . . > (
> > . . . . . . . > select flatten(t.entities.hashtags) as c from
> dfs.tmp.`z.json` t
> > . . . . . . . > )
> > . . . . . . . > select tmp.c.text from tmp;
> > +------------+
> > |   EXPR$0   |
> > +------------+
> > | GoPatriots |
> > | aaa        |
> > | bbb        |
> > +------------+
> > 3 rows selected (0.122 seconds)
> >
> > Thanks,
> > Hao
> >
> >
> >
> > On Wed, Jan 21, 2015 at 3:35 PM, Andries Engelbrecht <
> aengelbre...@maprtech.com> wrote:
> > The sample data I posted only has 1 element, but some records have
> multiple elements in them.
> >
> > Interestingly enough though
> > select t.entities.hashtags[0].`text` from `twitter.json` t limit 10;
> >
> > Produces
> > +------------+
> > |   EXPR$0   |
> > +------------+
> > | null       |
> > | SportsNews |
> > | null       |
> > | SportsNews |
> > | null       |
> > | null       |
> > | null       |
> > | null       |
> > | null       |
> > | CARvsSEA   |
> > +——————+
> >
> >
> > And
> >
> > select t.entities.hashtags[0] from `twitter.json` t limit 10;
> >
> > +------------+
> > |   EXPR$0   |
> > +------------+
> > | {"indices":[]} |
> > | {"text":"SportsNews","indices":[90,99]} |
> > | {"indices":[]} |
> > | {"text":"SportsNews","indices":[90,99]} |
> > | {"indices":[]} |
> > | {"indices":[]} |
> > | {"indices":[]} |
> > | {"indices":[]} |
> > | {"indices":[]} |
> > | {"text":"CARvsSEA","indices":[90,99]} |
> > +——————+
> >
> > Strange part is that there is no indices map in the hashtags array, so
> no idea why it shows up when pointing to the first lament in an empty array.
> >
> >
> > —Andries
> >
> >
> >
> >
> >
> >
> > On Jan 21, 2015, at 3:24 PM, Hao Zhu <h...@maprtech.com> wrote:
> >
> > > I just noticed that the result of "hashtags" is just an array with
> only 1
> > > element.
> > > So take your example:
> > > [root@maprdemo tmp]# cat d.json
> > > {
> > > "entities": {
> > >   "trends": [],
> > >   "symbols": [],
> > >   "urls": [],
> > >   "hashtags": [],
> > >   "user_mentions": []
> > > },
> > > "entities": {
> > >   "trends": [1,2,3],
> > >   "symbols": [4,5,6],
> > >   "urls": [7,8,9],
> > >   "hashtags": [
> > >     {
> > >       "text": "GoPatriots",
> > >       "indices": []
> > >     }
> > >   ],
> > >   "user_mentions": []
> > > }
> > > }
> > >
> > > Now we can do this to achieve the results:
> > > 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` t ;
> > > +------------+
> > > |   EXPR$0   |
> > > +------------+
> > > | [{"text":"GoPatriots"}] |
> > > +------------+
> > > 1 row selected (0.09 seconds)
> > > 0: jdbc:drill:> select t.entities.hashtags[0].text from
> dfs.tmp.`d.json` t ;
> > > +------------+
> > > |   EXPR$0   |
> > > +------------+
> > > | GoPatriots |
> > > +------------+
> > > 1 row selected (0.108 seconds)
> > >
> > > Thanks,
> > > Hao
> > >
> > >
> > >
> > > On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht <
> > > aengelbre...@maprtech.com> wrote:
> > >
> > >> When I run the query on a larger dataset it actually show the empty
> > >> records.
> > >>
> > >> select t.entities.hashtags from `twitter.json` t limit 10;
> > >>
> > >> +------------+
> > >> |   EXPR$0   |
> > >> +------------+
> > >> | []         |
> > >> | [{"text":"SportsNews","indices":[0,11]}] |
> > >> | []         |
> > >> | [{"text":"SportsNews","indices":[0,11]}] |
> > >> | []         |
> > >> | []         |
> > >> | []         |
> > >> | []         |
> > >> | []         |
> > >> | [{"text":"CARvsSEA","indices":[36,45]}] |
> > >> +------------+
> > >> 10 rows selected (2.899 seconds)
> > >>
> > >>
> > >>
> > >> However having the output as maps is not very useful, unless i can
> filter
> > >> out the records with empty arrays and then drill deeper into the ones
> with
> > >> data in the arrays.
> > >>
> > >> BTW: Hao I would have expiated your query to return both rows, one
> with an
> > >> empty array as above and the other with the array data.
> > >>
> > >>
> > >> —Andries
> > >>
> > >>
> > >> On Jan 21, 2015, at 2:56 PM, Hao Zhu <h...@maprtech.com> wrote:
> > >>
> > >>> I am not sure if below is expected behavior.
> > >>> If we only select "hashtags", and it will return only 1 row ignoring
> the
> > >>> null value.
> > >>> However then if we try to get "hashtags.text", it fails...which
> means it
> > >> is
> > >>> still trying to read the NULL value.
> > >>> I am thinking it may confuse the SQL developers.
> > >>>
> > >>>
> > >>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` t ;
> > >>> +------------+
> > >>> |   EXPR$0   |
> > >>> +------------+
> > >>> | [{"text":"GoPatriots"}] |
> > >>> +------------+
> > >>> 1 row selected (0.109 seconds)
> > >>>
> > >>>
> > >>> 0: jdbc:drill:> select t.entities.hashtags.text from
> dfs.tmp.`d.json` t ;
> > >>>
> > >>> Query failed: Query failed: Failure while running fragment.,
> > >>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be
> cast to
> > >>> org.apache.drill.exec.vector.complex.MapVector [
> > >>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> > >>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> > >>>
> > >>>
> > >>> Error: exception while executing query: Failure while executing
> query.
> > >>> (state=,code=0)
> > >>>
> > >>> Thanks,
> > >>> Hao
> > >>>
> > >>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht <
> > >>> aengelbre...@maprtech.com> wrote:
> > >>>
> > >>>> Now try on hashtags with the following:
> > >>>>
> > >>>> drilldemo:5181> select t.entities.hashtags.`text` from
> `/twitter.json` t
> > >>>> where t.entities.hashtags is not null limit 10;
> > >>>>
> > >>>> Query failed: Query failed: Failure while running fragment.,
> > >>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be
> cast to
> > >>>> org.apache.drill.exec.vector.complex.MapVector [
> > >>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
> > >>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
> > >>>>
> > >>>>
> > >>>> Error: exception while executing query: Failure while executing
> query.
> > >>>> (state=,code=0)
> > >>>>
> > >>>>
> > >>>> {
> > >>>> "entities": {
> > >>>>  "trends": [],
> > >>>>  "symbols": [],
> > >>>>  "urls": [],
> > >>>>  "hashtags": [],
> > >>>>  "user_mentions": []
> > >>>> },
> > >>>> "entities": {
> > >>>>  "trends": [1,2,3],
> > >>>>  "symbols": [4,5,6],
> > >>>>  "urls": [7,8,9],
> > >>>>  "hashtags": [
> > >>>>    {
> > >>>>      "text": "GoPatriots",
> > >>>>      "indices": []
> > >>>>    }
> > >>>>  ],
> > >>>>  "user_mentions": []
> > >>>> }
> > >>>> }
> > >>>>
> > >>>> The issue seems to be that if some records have arrays with maps in
> them
> > >>>> and others are empty.
> > >>>>
> > >>>> —Andries
> > >>>>
> > >>>>
> > >>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <h...@maprtech.com> wrote:
> > >>>>
> > >>>>> Seems it works for below json file:
> > >>>>> {
> > >>>>> "entities": {
> > >>>>>  "trends": [],
> > >>>>>  "symbols": [],
> > >>>>>  "urls": [],
> > >>>>>  "hashtags": [
> > >>>>>    {
> > >>>>>      "text": "GoPatriots",
> > >>>>>      "indices": [
> > >>>>>        83,
> > >>>>>        94
> > >>>>>      ]
> > >>>>>    }
> > >>>>>  ],
> > >>>>>  "user_mentions": []
> > >>>>> },
> > >>>>> "entities": {
> > >>>>>  "trends": [1,2,3],
> > >>>>>  "symbols": [4,5,6],
> > >>>>>  "urls": [7,8,9],
> > >>>>>  "hashtags": [
> > >>>>>    {
> > >>>>>      "text": "GoPatriots",
> > >>>>>      "indices": []
> > >>>>>    }
> > >>>>>  ],
> > >>>>>  "user_mentions": []
> > >>>>> }
> > >>>>> }
> > >>>>>
> > >>>>>
> > >>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as t
> where
> > >>>>> t.entities.urls is not null;
> > >>>>> +------------+
> > >>>>> |   EXPR$0   |
> > >>>>> +------------+
> > >>>>> | [7,8,9]    |
> > >>>>> +------------+
> > >>>>> 1 row selected (0.139 seconds)
> > >>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as t
> where
> > >>>>> t.entities.urls is null;
> > >>>>> +------------+
> > >>>>> |   EXPR$0   |
> > >>>>> +------------+
> > >>>>> +------------+
> > >>>>> No rows selected (0.158 seconds)
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Hao
> > >>>>>
> > >>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya <adityakish...@gmail.com>
> > >> wrote:
> > >>>>>
> > >>>>>> I believe that this works if the array contains homogeneous
> primitive
> > >>>>>> types. In your example, it appears from the error, the array field
> > >>>> 'member'
> > >>>>>> contained maps for at least one record.
> > >>>>>>
> > >>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta <
> cma...@mapr.com>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Trying that locally did not work for me (drill 0.7.0):
> > >>>>>>>
> > >>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` from
> > >>>>>> `Downloads/test.json` where repeated_count(`members`) > 0;
> > >>>>>>> Query failed: Query stopped., Failure while trying to materialize
> > >>>>>> incoming schema.  Errors:
> > >>>>>>>
> > >>>>>>> Error in expression at index -1.  Error: Missing function
> > >>>>>> implementation: [repeated_count(MAP-REPEATED)].  Full expression:
> > >>>> --UNKNOWN
> > >>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on
> > >>>> bullseye-3:31010 ]
> > >>>>>>>
> > >>>>>>> Error: exception while executing query: Failure while executing
> > >> query.
> > >>>>>> (state=,code=0)
> > >>>>>>>
> > >>>>>>> ​
> > >>>>>>>
> > >>>>>>> Chris Matta
> > >>>>>>> cma...@mapr.com
> > >>>>>>> 215-701-3146
> > >>>>>>>
> > >>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya <adityakish...@gmail.com
> >
> > >>>> wrote:
> > >>>>>>>
> > >>>>>>>> repeated_count('entities.urls') > 0
> > >>>>>>>>
> > >>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht <
> > >>>>>>>> aengelbre...@maprtech.com> wrote:
> > >>>>>>>>
> > >>>>>>>>> How do you filter out records with an empty array in drill?
> > >>>>>>>>> i.e some records have "url":[]  and some will have an array
> with
> > >> data
> > >>>>>> in
> > >>>>>>>>> it. When trying to read records with data in the array drill
> fails
> > >>>> due
> > >>>>>>>> to
> > >>>>>>>>> records missing any data in the array. Trying a filter with/*
> where
> > >>>>>>>>> "url":[0] is not null */ fails, also fails if applying url is
> not
> > >>>>>> null.
> > >>>>>>>>>
> > >>>>>>>>> Note some of the arrays contains maps, using twitter data as an
> > >>>>>> example
> > >>>>>>>>> below. Some records have an empty array with “hashtags”:[]  and
> > >>>> others
> > >>>>>>>> will
> > >>>>>>>>> look similar to what is listed below.
> > >>>>>>>>>
> > >>>>>>>>> "entities": {
> > >>>>>>>>>  "trends": [],
> > >>>>>>>>>  "symbols": [],
> > >>>>>>>>>  "urls": [],
> > >>>>>>>>>  "hashtags": [
> > >>>>>>>>>    {
> > >>>>>>>>>      "text": "GoPatriots",
> > >>>>>>>>>      "indices": [
> > >>>>>>>>>        83,
> > >>>>>>>>>        94
> > >>>>>>>>>      ]
> > >>>>>>>>>    }
> > >>>>>>>>>  ],
> > >>>>>>>>>  "user_mentions": []
> > >>>>>>>>> },
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Thanks
> > >>>>>>>>> —Andries
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >
> > <z.json>
>
>

Reply via email to