Re: Filter out empty arrays in JSON

Christopher Matta Thu, 05 Feb 2015 19:48:04 -0800

You can’t use repeated_count function if the list contains maps, only
scalars. This was pointed out by Jason (and Aditya) earlier in this thread:


As Aditya commented before this will work if the lists only contain scalars
repeated_count(‘entities.urls’) > 0

If the lists contain maps unfortunately this is not available today. There
is an enhancement request open for this feature. I have marked it for a fix
in 0.9 as it is more of a feature request than a bug and we are working on
closing a large number of bugs for 0.8 before we get to issues like this.

https://issues.apache.org/jira/browse/DRILL-1650



Chris Matta
cma...@mapr.com
215-701-3146

On Thu, Feb 5, 2015 at 10:36 PM, Sudhakar Thota <sth...@maprtech.com> wrote:

> You can use repeated_count function to eliminate the null arrays.
>
> Sudhakar Thota
> Sent from my iPhone
>
> > On Feb 5, 2015, at 7:05 PM, Andries Engelbrecht <
> aengelbre...@maprtech.com> wrote:
> >
> > Chris
> >
> > Thanks, but I tried it before without success.
> >
> > I still get this on Drill 0.7.
> >
> > select t.entities.hashtags from dfs.twitter.`/nfl` t where
> t.entities.hashtags[0] is not null limit 10;
> >
> > Error: exception while executing query: Failure while executing query.
> (state=,code=0)
> > Query failed: Query failed: Failure while running fragment., Failure
> while trying to materialize incoming schema.  Errors:
> >
> > Error in expression at index 0.  Error: Missing function implementation:
> [isnotnull(MAP-REQUIRED)].  Full expression: null.. [
> 0dab71b6-1860-4739-b752-1458ed29b671 on drilldemo:31010 ]
> > [ 0dab71b6-1860-4739-b752-1458ed29b671 on drilldemo:31010 ]
> >
> >
> > Mine seems finicky about even using the first element in the array
> >
> > select t.entities.hashtags[0] from dfs.twitter.`/nfl` t limit 10;
> >
> > Query failed: Query failed: Failure while running fragment., index: 4,
> length: 4 (expected: range(0, 4)) [ 62cd2d3a-546e-41c8-a0c5-4da2b81c2ef8 on
> drilldemo:31010 ]
> > [ 62cd2d3a-546e-41c8-a0c5-4da2b81c2ef8 on drilldemo:31010 ]
> >
> >
> > However flatten seems to work. (as long as you don’t have thousands of
> small json files)
> >
> > select flatten(t.entities.hashtags) from dfs.twitter.`/nfl` t limit 10;
> >
> > +------------+
> > |   EXPR$0   |
> > +------------+
> > | {"text":"PantherPride","indices":["12","25"]} |
> > | {"text":"KeepPounding","indices":["48","61"]} |
> > | {"text":"vegas","indices":["107","113"]} |
> > | {"text":"nfl","indices":["23","27"]} |
> > | {"text":"Cowboys","indices":["7","15"]} |
> > | {"text":"Cowboys","indices":["7","15"]} |
> > | {"text":"Ebay","indices":["52","57"]} |
> > | {"text":"NFL","indices":["58","62"]} |
> > | {"text":"Christmas","indices":["63","73"]} |
> > | {"text":"Gift","indices":["74","79"]} |
> > +------------+
> >
> >
> > —Andries
> >
> >
> >
> >
> >> On Feb 5, 2015, at 6:41 PM, Christopher Matta <cma...@mapr.com> wrote:
> >>
> >> Andreas, I think I may have solved your issue:
> >>
> >>> select t.entities.hashtags FROM mfs.cmatta.`tweets/blackfriday` t
> WHERE t.entities.hashtags[0].text is not null limit 10;
> >> +------------+
> >> |   EXPR$0   |
> >> +------------+
> >> |
> [{"text":"Blackfriday","indices":[11,23]},{"text":"facebook","indices":[56,65]},{"text":"christmas","indices":[66,76]}]
> >> |
> >> |
> [{"text":"BlackFriday","indices":[65,77]},{"text":"ViernesNegro","indices":[105,118]},{"text":"ofertas","indices":[122,130]}]
> >> |
> >> |
> [{"text":"StarWars","indices":[21,30]},{"text":"BlackFriday","indices":[71,83]}]
> >> |
> >> | [{"text":"NotOneDime","indices":[14,25]}] |
> >> | [{"text":"BlackFriday","indices":[43,55]}] |
> >> | [{"text":"JRStudio","indices":[93,102]}] |
> >> |
> [{"text":"luv","indices":[38,42]},{"text":"luvmanicures","indices":[43,56]},{"text":"luvpedicures","indices":[57,70]},{"text":"luvroyaloak","indices":[71,83]},{"text":"woodwardave","indices":[84,96]}]
> >> |
> >> | [{"text":"BlackFriday","indices":[0,12]}] |
> >> |
> [{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}]
> >> |
> >> |
> [{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}]
> >> |
> >> +------------+
> >> 10 rows selected (1.697 seconds)
> >>
> >> I noticed when only extracting the first text item in the hashtag array
> (
> >> t.entities.hashtags[0].text) it was returning null for those tweets that
> >> didn’t have hashtags, so filtering out that seems to work.
> >> 
> >>
> >> Chris Matta
> >> cma...@mapr.com
> >> 215-701-3146
> >>
> >> On Mon, Jan 26, 2015 at 6:43 PM, Jason Altekruse <
> altekruseja...@gmail.com>
> >> wrote:
> >>
> >>> As Aditya commented before this will work if the lists only contain
> scalars
> >>>
> >>> repeated_count('entities.urls') > 0
> >>>
> >>> If the lists contain maps unfortunately this is not available today.
> There
> >>> is an enhancement request open for this feature. I have marked it for
> a fix
> >>> in 0.9 as it is more of a feature request than a bug and we are
> working on
> >>> closing a large number of bugs for 0.8 before we get to issues like
> this.
> >>>
> >>> https://issues.apache.org/jira/browse/DRILL-1650
> >>>
> >>> -Jason
> >>>
> >>> On Mon, Jan 26, 2015 at 3:26 PM, Andries Engelbrecht <
> >>> aengelbre...@maprtech.com> wrote:
> >>>
> >>>> Unfortunately it seems that with larger data sets the use of flatten
> >>> seems
> >>>> to produce an error.
> >>>>
> >>>> Any other options to filter out JSON records (objects) where an array
> is
> >>>> empty?
> >>>>
> >>>> Thanks
> >>>> —Andries
> >>>>
> >>>>
> >>>> On Jan 21, 2015, at 5:18 PM, Andries Engelbrecht <
> >>>> aengelbre...@maprtech.com> wrote:
> >>>>
> >>>>> It was interesting to see flatten bypass the records with an empty
> >>> array.
> >>>>>
> >>>>> Unfortunately some arrays are much more complex than the example
> here,
> >>>> and it is still useful to have the ability to filter out ones that are
> >>>> empty.
> >>>>>
> >>>>> —Andries
> >>>>>
> >>>>>> On Jan 21, 2015, at 5:04 PM, Hao Zhu <h...@maprtech.com> wrote:
> >>>>>>
> >>>>>> I figured out the differences after getting the json file from
> >>> Andries.
> >>>>>>
> >>>>>> My json file is like:
> >>>>>> {“entities”:{xxx},“entities”:{yyy}... }
> >>>>>>
> >>>>>> Andries' json file is like:
> >>>>>> {“entities”:{xxx}}
> >>>>>> {“entities”:{yyy}}
> >>>>>> ...
> >>>>>>
> >>>>>> So basically Andries' json file is contains multiple json files.
> >>>>>>
> >>>>>> Fortunately, we can use the same SQL to get the same results:
> >>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z2.json` t;
> >>>>>> +------------+
> >>>>>> |   EXPR$0   |
> >>>>>> +------------+
> >>>>>> | []         |
> >>>>>> | [{"text":"GoPatriots"}] |
> >>>>>> | [{"text":"aaa"}] |
> >>>>>> | []         |
> >>>>>> | [{"text":"bbb"}] |
> >>>>>> +------------+
> >>>>>> 5 rows selected (0.136 seconds)
> >>>>>> 0: jdbc:drill:> with tmp as
> >>>>>> . . . . . . . > (
> >>>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from
> >>>>>> dfs.tmp.`z2.json` t
> >>>>>> . . . . . . . > )
> >>>>>> . . . . . . . > select tmp.c.text from tmp;
> >>>>>> +------------+
> >>>>>> |   EXPR$0   |
> >>>>>> +------------+
> >>>>>> | GoPatriots |
> >>>>>> | aaa        |
> >>>>>> | bbb        |
> >>>>>> +------------+
> >>>>>> 3 rows selected (0.115 seconds)
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Hao
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Jan 21, 2015 at 4:34 PM, Andries Engelbrecht <
> >>>>>> aengelbre...@maprtech.com> wrote:
> >>>>>>
> >>>>>>> In my case it returns the empty records when flatten is not used.
> >>>>>>>
> >>>>>>> 0: jdbc:drill:zk=drilldemo:5181> select t.entities.hashtags as
> >>> hashtags
> >>>>>>> from `twitter.json` t limit 10;
> >>>>>>> +------------+
> >>>>>>> |  hashtags  |
> >>>>>>> +------------+
> >>>>>>> | []         |
> >>>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
> >>>>>>> | []         |
> >>>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
> >>>>>>> | []         |
> >>>>>>> | []         |
> >>>>>>> | []         |
> >>>>>>> | []         |
> >>>>>>> | []         |
> >>>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] |
> >>>>>>> +------------+
> >>>>>>>
> >>>>>>>> On Jan 21, 2015, at 4:26 PM, Hao Zhu <h...@maprtech.com> wrote:
> >>>>>>>>
> >>>>>>>> Actually not due to flatten, if you directly query the file, it
> will
> >>>> only
> >>>>>>>> show the non-null values.
> >>>>>>>>
> >>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z.json`
> t;
> >>>>>>>> +------------+
> >>>>>>>> |   EXPR$0   |
> >>>>>>>> +------------+
> >>>>>>>> | [{"text":"GoPatriots"},{"text":"aaa"},{"text":"bbb"}] |
> >>>>>>>> +------------+
> >>>>>>>> 1 row selected (0.123 seconds)
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Hao
> >>>>>>>>
> >>>>>>>> On Wed, Jan 21, 2015 at 4:18 PM, Andries Engelbrecht <
> >>>>>>>> aengelbre...@maprtech.com> wrote:
> >>>>>>>>
> >>>>>>>>> Very interesting, flatten seems to bypass empty records. Not sure
> >>> if
> >>>>>>> that
> >>>>>>>>> is an ideal result for all use cases, but certainly usable in
> this
> >>>> case.
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>> —Andries
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On Jan 21, 2015, at 3:45 PM, Hao Zhu <h...@maprtech.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> I can also fetch non-null values for attached json file which
> >>>> contains
> >>>>>>> 6
> >>>>>>>>> "entities", 3 of them are null, 3 of them have "text" value.
> >>>>>>>>>>
> >>>>>>>>>> Could you share your complete "twitter.json"?
> >>>>>>>>>>
> >>>>>>>>>> 0: jdbc:drill:> with tmp as
> >>>>>>>>>> . . . . . . . > (
> >>>>>>>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from
> >>>>>>>>> dfs.tmp.`z.json` t
> >>>>>>>>>> . . . . . . . > )
> >>>>>>>>>> . . . . . . . > select tmp.c.text from tmp;
> >>>>>>>>>> +------------+
> >>>>>>>>>> |   EXPR$0   |
> >>>>>>>>>> +------------+
> >>>>>>>>>> | GoPatriots |
> >>>>>>>>>> | aaa        |
> >>>>>>>>>> | bbb        |
> >>>>>>>>>> +------------+
> >>>>>>>>>> 3 rows selected (0.122 seconds)
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Hao
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Wed, Jan 21, 2015 at 3:35 PM, Andries Engelbrecht <
> >>>>>>>>>> aengelbre...@maprtech.com> wrote:
> >>>>>>>>>> The sample data I posted only has 1 element, but some records
> have
> >>>>>>>>> multiple elements in them.
> >>>>>>>>>>
> >>>>>>>>>> Interestingly enough though
> >>>>>>>>>> select t.entities.hashtags[0].`text` from `twitter.json` t limit
> >>> 10;
> >>>>>>>>>>
> >>>>>>>>>> Produces
> >>>>>>>>>> +------------+
> >>>>>>>>>> |   EXPR$0   |
> >>>>>>>>>> +------------+
> >>>>>>>>>> | null       |
> >>>>>>>>>> | SportsNews |
> >>>>>>>>>> | null       |
> >>>>>>>>>> | SportsNews |
> >>>>>>>>>> | null       |
> >>>>>>>>>> | null       |
> >>>>>>>>>> | null       |
> >>>>>>>>>> | null       |
> >>>>>>>>>> | null       |
> >>>>>>>>>> | CARvsSEA   |
> >>>>>>>>>> +——————+
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> And
> >>>>>>>>>>
> >>>>>>>>>> select t.entities.hashtags[0] from `twitter.json` t limit 10;
> >>>>>>>>>>
> >>>>>>>>>> +------------+
> >>>>>>>>>> |   EXPR$0   |
> >>>>>>>>>> +------------+
> >>>>>>>>>> | {"indices":[]} |
> >>>>>>>>>> | {"text":"SportsNews","indices":[90,99]} |
> >>>>>>>>>> | {"indices":[]} |
> >>>>>>>>>> | {"text":"SportsNews","indices":[90,99]} |
> >>>>>>>>>> | {"indices":[]} |
> >>>>>>>>>> | {"indices":[]} |
> >>>>>>>>>> | {"indices":[]} |
> >>>>>>>>>> | {"indices":[]} |
> >>>>>>>>>> | {"indices":[]} |
> >>>>>>>>>> | {"text":"CARvsSEA","indices":[90,99]} |
> >>>>>>>>>> +——————+
> >>>>>>>>>>
> >>>>>>>>>> Strange part is that there is no indices map in the hashtags
> >>> array,
> >>>> so
> >>>>>>>>> no idea why it shows up when pointing to the first lament in an
> >>> empty
> >>>>>>> array.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> —Andries
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Jan 21, 2015, at 3:24 PM, Hao Zhu <h...@maprtech.com>
> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> I just noticed that the result of "hashtags" is just an array
> >>> with
> >>>>>>>>> only 1
> >>>>>>>>>>> element.
> >>>>>>>>>>> So take your example:
> >>>>>>>>>>> [root@maprdemo tmp]# cat d.json
> >>>>>>>>>>> {
> >>>>>>>>>>> "entities": {
> >>>>>>>>>>> "trends": [],
> >>>>>>>>>>> "symbols": [],
> >>>>>>>>>>> "urls": [],
> >>>>>>>>>>> "hashtags": [],
> >>>>>>>>>>> "user_mentions": []
> >>>>>>>>>>> },
> >>>>>>>>>>> "entities": {
> >>>>>>>>>>> "trends": [1,2,3],
> >>>>>>>>>>> "symbols": [4,5,6],
> >>>>>>>>>>> "urls": [7,8,9],
> >>>>>>>>>>> "hashtags": [
> >>>>>>>>>>> {
> >>>>>>>>>>>  "text": "GoPatriots",
> >>>>>>>>>>>  "indices": []
> >>>>>>>>>>> }
> >>>>>>>>>>> ],
> >>>>>>>>>>> "user_mentions": []
> >>>>>>>>>>> }
> >>>>>>>>>>> }
> >>>>>>>>>>>
> >>>>>>>>>>> Now we can do this to achieve the results:
> >>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from
> dfs.tmp.`d.json`
> >>> t
> >>>> ;
> >>>>>>>>>>> +------------+
> >>>>>>>>>>> |   EXPR$0   |
> >>>>>>>>>>> +------------+
> >>>>>>>>>>> | [{"text":"GoPatriots"}] |
> >>>>>>>>>>> +------------+
> >>>>>>>>>>> 1 row selected (0.09 seconds)
> >>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags[0].text from
> >>>>>>>>> dfs.tmp.`d.json` t ;
> >>>>>>>>>>> +------------+
> >>>>>>>>>>> |   EXPR$0   |
> >>>>>>>>>>> +------------+
> >>>>>>>>>>> | GoPatriots |
> >>>>>>>>>>> +------------+
> >>>>>>>>>>> 1 row selected (0.108 seconds)
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Hao
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht <
> >>>>>>>>>>> aengelbre...@maprtech.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> When I run the query on a larger dataset it actually show the
> >>>> empty
> >>>>>>>>>>>> records.
> >>>>>>>>>>>>
> >>>>>>>>>>>> select t.entities.hashtags from `twitter.json` t limit 10;
> >>>>>>>>>>>>
> >>>>>>>>>>>> +------------+
> >>>>>>>>>>>> |   EXPR$0   |
> >>>>>>>>>>>> +------------+
> >>>>>>>>>>>> | []         |
> >>>>>>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
> >>>>>>>>>>>> | []         |
> >>>>>>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
> >>>>>>>>>>>> | []         |
> >>>>>>>>>>>> | []         |
> >>>>>>>>>>>> | []         |
> >>>>>>>>>>>> | []         |
> >>>>>>>>>>>> | []         |
> >>>>>>>>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] |
> >>>>>>>>>>>> +------------+
> >>>>>>>>>>>> 10 rows selected (2.899 seconds)
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> However having the output as maps is not very useful, unless i
> >>> can
> >>>>>>>>> filter
> >>>>>>>>>>>> out the records with empty arrays and then drill deeper into
> the
> >>>> ones
> >>>>>>>>> with
> >>>>>>>>>>>> data in the arrays.
> >>>>>>>>>>>>
> >>>>>>>>>>>> BTW: Hao I would have expiated your query to return both rows,
> >>> one
> >>>>>>>>> with an
> >>>>>>>>>>>> empty array as above and the other with the array data.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> —Andries
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Jan 21, 2015, at 2:56 PM, Hao Zhu <h...@maprtech.com>
> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I am not sure if below is expected behavior.
> >>>>>>>>>>>>> If we only select "hashtags", and it will return only 1 row
> >>>> ignoring
> >>>>>>>>> the
> >>>>>>>>>>>>> null value.
> >>>>>>>>>>>>> However then if we try to get "hashtags.text", it
> fails...which
> >>>>>>>>> means it
> >>>>>>>>>>>> is
> >>>>>>>>>>>>> still trying to read the NULL value.
> >>>>>>>>>>>>> I am thinking it may confuse the SQL developers.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from
> >>> dfs.tmp.`d.json`
> >>>> t ;
> >>>>>>>>>>>>> +------------+
> >>>>>>>>>>>>> |   EXPR$0   |
> >>>>>>>>>>>>> +------------+
> >>>>>>>>>>>>> | [{"text":"GoPatriots"}] |
> >>>>>>>>>>>>> +------------+
> >>>>>>>>>>>>> 1 row selected (0.109 seconds)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags.text from
> >>>>>>>>> dfs.tmp.`d.json` t ;
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Query failed: Query failed: Failure while running fragment.,
> >>>>>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot
> >>> be
> >>>>>>>>> cast to
> >>>>>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [
> >>>>>>>>>>>>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> >>>>>>>>>>>>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Error: exception while executing query: Failure while
> executing
> >>>>>>>>> query.
> >>>>>>>>>>>>> (state=,code=0)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Hao
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht <
> >>>>>>>>>>>>> aengelbre...@maprtech.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Now try on hashtags with the following:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> drilldemo:5181> select t.entities.hashtags.`text` from
> >>>>>>>>> `/twitter.json` t
> >>>>>>>>>>>>>> where t.entities.hashtags is not null limit 10;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Query failed: Query failed: Failure while running fragment.,
> >>>>>>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector
> cannot
> >>> be
> >>>>>>>>> cast to
> >>>>>>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [
> >>>>>>>>>>>>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
> >>>>>>>>>>>>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Error: exception while executing query: Failure while
> >>> executing
> >>>>>>>>> query.
> >>>>>>>>>>>>>> (state=,code=0)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> {
> >>>>>>>>>>>>>> "entities": {
> >>>>>>>>>>>>>> "trends": [],
> >>>>>>>>>>>>>> "symbols": [],
> >>>>>>>>>>>>>> "urls": [],
> >>>>>>>>>>>>>> "hashtags": [],
> >>>>>>>>>>>>>> "user_mentions": []
> >>>>>>>>>>>>>> },
> >>>>>>>>>>>>>> "entities": {
> >>>>>>>>>>>>>> "trends": [1,2,3],
> >>>>>>>>>>>>>> "symbols": [4,5,6],
> >>>>>>>>>>>>>> "urls": [7,8,9],
> >>>>>>>>>>>>>> "hashtags": [
> >>>>>>>>>>>>>> {
> >>>>>>>>>>>>>> "text": "GoPatriots",
> >>>>>>>>>>>>>> "indices": []
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>> ],
> >>>>>>>>>>>>>> "user_mentions": []
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The issue seems to be that if some records have arrays with
> >>>> maps in
> >>>>>>>>> them
> >>>>>>>>>>>>>> and others are empty.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> —Andries
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <h...@maprtech.com>
> >>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Seems it works for below json file:
> >>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>> "entities": {
> >>>>>>>>>>>>>>> "trends": [],
> >>>>>>>>>>>>>>> "symbols": [],
> >>>>>>>>>>>>>>> "urls": [],
> >>>>>>>>>>>>>>> "hashtags": [
> >>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>> "text": "GoPatriots",
> >>>>>>>>>>>>>>> "indices": [
> >>>>>>>>>>>>>>>   83,
> >>>>>>>>>>>>>>>   94
> >>>>>>>>>>>>>>> ]
> >>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>> ],
> >>>>>>>>>>>>>>> "user_mentions": []
> >>>>>>>>>>>>>>> },
> >>>>>>>>>>>>>>> "entities": {
> >>>>>>>>>>>>>>> "trends": [1,2,3],
> >>>>>>>>>>>>>>> "symbols": [4,5,6],
> >>>>>>>>>>>>>>> "urls": [7,8,9],
> >>>>>>>>>>>>>>> "hashtags": [
> >>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>> "text": "GoPatriots",
> >>>>>>>>>>>>>>> "indices": []
> >>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>> ],
> >>>>>>>>>>>>>>> "user_mentions": []
> >>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from
> dfs.tmp.`a.json`
> >>>> as t
> >>>>>>>>> where
> >>>>>>>>>>>>>>> t.entities.urls is not null;
> >>>>>>>>>>>>>>> +------------+
> >>>>>>>>>>>>>>> |   EXPR$0   |
> >>>>>>>>>>>>>>> +------------+
> >>>>>>>>>>>>>>> | [7,8,9]    |
> >>>>>>>>>>>>>>> +------------+
> >>>>>>>>>>>>>>> 1 row selected (0.139 seconds)
> >>>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from
> dfs.tmp.`a.json`
> >>>> as t
> >>>>>>>>> where
> >>>>>>>>>>>>>>> t.entities.urls is null;
> >>>>>>>>>>>>>>> +------------+
> >>>>>>>>>>>>>>> |   EXPR$0   |
> >>>>>>>>>>>>>>> +------------+
> >>>>>>>>>>>>>>> +------------+
> >>>>>>>>>>>>>>> No rows selected (0.158 seconds)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>> Hao
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya <
> >>>> adityakish...@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I believe that this works if the array contains
> homogeneous
> >>>>>>>>> primitive
> >>>>>>>>>>>>>>>> types. In your example, it appears from the error, the
> array
> >>>>>>> field
> >>>>>>>>>>>>>> 'member'
> >>>>>>>>>>>>>>>> contained maps for at least one record.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta <
> >>>>>>>>> cma...@mapr.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Trying that locally did not work for me (drill 0.7.0):
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members`
> from
> >>>>>>>>>>>>>>>> `Downloads/test.json` where repeated_count(`members`) > 0;
> >>>>>>>>>>>>>>>>> Query failed: Query stopped., Failure while trying to
> >>>>>>> materialize
> >>>>>>>>>>>>>>>> incoming schema.  Errors:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Error in expression at index -1.  Error: Missing function
> >>>>>>>>>>>>>>>> implementation: [repeated_count(MAP-REPEATED)].  Full
> >>>> expression:
> >>>>>>>>>>>>>> --UNKNOWN
> >>>>>>>>>>>>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on
> >>>>>>>>>>>>>> bullseye-3:31010 ]
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Error: exception while executing query: Failure while
> >>>> executing
> >>>>>>>>>>>> query.
> >>>>>>>>>>>>>>>> (state=,code=0)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Chris Matta
> >>>>>>>>>>>>>>>>> cma...@mapr.com
> >>>>>>>>>>>>>>>>> 215-701-3146
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya <
> >>>>>>> adityakish...@gmail.com
> >>>>>>>>>>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> repeated_count('entities.urls') > 0
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht <
> >>>>>>>>>>>>>>>>>> aengelbre...@maprtech.com> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> How do you filter out records with an empty array in
> >>> drill?
> >>>>>>>>>>>>>>>>>>> i.e some records have "url":[]  and some will have an
> >>> array
> >>>>>>>>> with
> >>>>>>>>>>>> data
> >>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>> it. When trying to read records with data in the array
> >>>> drill
> >>>>>>>>> fails
> >>>>>>>>>>>>>> due
> >>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> records missing any data in the array. Trying a filter
> >>>> with/*
> >>>>>>>>> where
> >>>>>>>>>>>>>>>>>>> "url":[0] is not null */ fails, also fails if applying
> >>> url
> >>>> is
> >>>>>>>>> not
> >>>>>>>>>>>>>>>> null.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Note some of the arrays contains maps, using twitter
> data
> >>>> as
> >>>>>>> an
> >>>>>>>>>>>>>>>> example
> >>>>>>>>>>>>>>>>>>> below. Some records have an empty array with
> >>> “hashtags”:[]
> >>>>>>> and
> >>>>>>>>>>>>>> others
> >>>>>>>>>>>>>>>>>> will
> >>>>>>>>>>>>>>>>>>> look similar to what is listed below.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> "entities": {
> >>>>>>>>>>>>>>>>>>> "trends": [],
> >>>>>>>>>>>>>>>>>>> "symbols": [],
> >>>>>>>>>>>>>>>>>>> "urls": [],
> >>>>>>>>>>>>>>>>>>> "hashtags": [
> >>>>>>>>>>>>>>>>>>> {
> >>>>>>>>>>>>>>>>>>> "text": "GoPatriots",
> >>>>>>>>>>>>>>>>>>> "indices": [
> >>>>>>>>>>>>>>>>>>>   83,
> >>>>>>>>>>>>>>>>>>>   94
> >>>>>>>>>>>>>>>>>>> ]
> >>>>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>>> ],
> >>>>>>>>>>>>>>>>>>> "user_mentions": []
> >>>>>>>>>>>>>>>>>>> },
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>>>>>>>> —Andries
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> <z.json>
> >
>

Re: Filter out empty arrays in JSON

Reply via email to