Re: Filter out empty arrays in JSON

Christopher Matta Thu, 05 Feb 2015 18:43:37 -0800

Andreas, I think I may have solved your issue:

> select t.entities.hashtags FROM mfs.cmatta.`tweets/blackfriday` t WHERE 
> t.entities.hashtags[0].text is not null limit 10;
+------------+
|   EXPR$0   |
+------------+
| 
[{"text":"Blackfriday","indices":[11,23]},{"text":"facebook","indices":[56,65]},{"text":"christmas","indices":[66,76]}]
|
| 
[{"text":"BlackFriday","indices":[65,77]},{"text":"ViernesNegro","indices":[105,118]},{"text":"ofertas","indices":[122,130]}]
|
| 
[{"text":"StarWars","indices":[21,30]},{"text":"BlackFriday","indices":[71,83]}]
|
| [{"text":"NotOneDime","indices":[14,25]}] |
| [{"text":"BlackFriday","indices":[43,55]}] |
| [{"text":"JRStudio","indices":[93,102]}] |
| 
[{"text":"luv","indices":[38,42]},{"text":"luvmanicures","indices":[43,56]},{"text":"luvpedicures","indices":[57,70]},{"text":"luvroyaloak","indices":[71,83]},{"text":"woodwardave","indices":[84,96]}]
|
| [{"text":"BlackFriday","indices":[0,12]}] |
| 
[{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}]
|
| 
[{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}]
|
+------------+
10 rows selected (1.697 seconds)


I noticed when only extracting the first text item in the hashtag array (
t.entities.hashtags[0].text) it was returning null for those tweets that
didn’t have hashtags, so filtering out that seems to work.


Chris Matta
[email protected]
215-701-3146

On Mon, Jan 26, 2015 at 6:43 PM, Jason Altekruse <[email protected]>
wrote:

> As Aditya commented before this will work if the lists only contain scalars
>
> repeated_count('entities.urls') > 0
>
> If the lists contain maps unfortunately this is not available today. There
> is an enhancement request open for this feature. I have marked it for a fix
> in 0.9 as it is more of a feature request than a bug and we are working on
> closing a large number of bugs for 0.8 before we get to issues like this.
>
> https://issues.apache.org/jira/browse/DRILL-1650
>
> -Jason
>
> On Mon, Jan 26, 2015 at 3:26 PM, Andries Engelbrecht <
> [email protected]> wrote:
>
> > Unfortunately it seems that with larger data sets the use of flatten
> seems
> > to produce an error.
> >
> > Any other options to filter out JSON records (objects) where an array is
> > empty?
> >
> > Thanks
> > —Andries
> >
> >
> > On Jan 21, 2015, at 5:18 PM, Andries Engelbrecht <
> > [email protected]> wrote:
> >
> > > It was interesting to see flatten bypass the records with an empty
> array.
> > >
> > > Unfortunately some arrays are much more complex than the example here,
> > and it is still useful to have the ability to filter out ones that are
> > empty.
> > >
> > > —Andries
> > >
> > > On Jan 21, 2015, at 5:04 PM, Hao Zhu <[email protected]> wrote:
> > >
> > >> I figured out the differences after getting the json file from
> Andries.
> > >>
> > >> My json file is like:
> > >> {“entities”:{xxx},“entities”:{yyy}... }
> > >>
> > >> Andries' json file is like:
> > >> {“entities”:{xxx}}
> > >> {“entities”:{yyy}}
> > >> ...
> > >>
> > >> So basically Andries' json file is contains multiple json files.
> > >>
> > >> Fortunately, we can use the same SQL to get the same results:
> > >> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z2.json` t;
> > >> +------------+
> > >> |   EXPR$0   |
> > >> +------------+
> > >> | []         |
> > >> | [{"text":"GoPatriots"}] |
> > >> | [{"text":"aaa"}] |
> > >> | []         |
> > >> | [{"text":"bbb"}] |
> > >> +------------+
> > >> 5 rows selected (0.136 seconds)
> > >> 0: jdbc:drill:> with tmp as
> > >> . . . . . . . > (
> > >> . . . . . . . > select flatten(t.entities.hashtags) as c from
> > >> dfs.tmp.`z2.json` t
> > >> . . . . . . . > )
> > >> . . . . . . . > select tmp.c.text from tmp;
> > >> +------------+
> > >> |   EXPR$0   |
> > >> +------------+
> > >> | GoPatriots |
> > >> | aaa        |
> > >> | bbb        |
> > >> +------------+
> > >> 3 rows selected (0.115 seconds)
> > >>
> > >> Thanks,
> > >> Hao
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Wed, Jan 21, 2015 at 4:34 PM, Andries Engelbrecht <
> > >> [email protected]> wrote:
> > >>
> > >>> In my case it returns the empty records when flatten is not used.
> > >>>
> > >>> 0: jdbc:drill:zk=drilldemo:5181> select t.entities.hashtags as
> hashtags
> > >>> from `twitter.json` t limit 10;
> > >>> +------------+
> > >>> |  hashtags  |
> > >>> +------------+
> > >>> | []         |
> > >>> | [{"text":"SportsNews","indices":[0,11]}] |
> > >>> | []         |
> > >>> | [{"text":"SportsNews","indices":[0,11]}] |
> > >>> | []         |
> > >>> | []         |
> > >>> | []         |
> > >>> | []         |
> > >>> | []         |
> > >>> | [{"text":"CARvsSEA","indices":[36,45]}] |
> > >>> +------------+
> > >>>
> > >>> On Jan 21, 2015, at 4:26 PM, Hao Zhu <[email protected]> wrote:
> > >>>
> > >>>> Actually not due to flatten, if you directly query the file, it will
> > only
> > >>>> show the non-null values.
> > >>>>
> > >>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z.json` t;
> > >>>> +------------+
> > >>>> |   EXPR$0   |
> > >>>> +------------+
> > >>>> | [{"text":"GoPatriots"},{"text":"aaa"},{"text":"bbb"}] |
> > >>>> +------------+
> > >>>> 1 row selected (0.123 seconds)
> > >>>>
> > >>>> Thanks,
> > >>>> Hao
> > >>>>
> > >>>> On Wed, Jan 21, 2015 at 4:18 PM, Andries Engelbrecht <
> > >>>> [email protected]> wrote:
> > >>>>
> > >>>>> Very interesting, flatten seems to bypass empty records. Not sure
> if
> > >>> that
> > >>>>> is an ideal result for all use cases, but certainly usable in this
> > case.
> > >>>>>
> > >>>>> Thanks
> > >>>>> —Andries
> > >>>>>
> > >>>>>
> > >>>>> On Jan 21, 2015, at 3:45 PM, Hao Zhu <[email protected]> wrote:
> > >>>>>
> > >>>>>> I can also fetch non-null values for attached json file which
> > contains
> > >>> 6
> > >>>>> "entities", 3 of them are null, 3 of them have "text" value.
> > >>>>>>
> > >>>>>> Could you share your complete "twitter.json"?
> > >>>>>>
> > >>>>>> 0: jdbc:drill:> with tmp as
> > >>>>>> . . . . . . . > (
> > >>>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from
> > >>>>> dfs.tmp.`z.json` t
> > >>>>>> . . . . . . . > )
> > >>>>>> . . . . . . . > select tmp.c.text from tmp;
> > >>>>>> +------------+
> > >>>>>> |   EXPR$0   |
> > >>>>>> +------------+
> > >>>>>> | GoPatriots |
> > >>>>>> | aaa        |
> > >>>>>> | bbb        |
> > >>>>>> +------------+
> > >>>>>> 3 rows selected (0.122 seconds)
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Hao
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On Wed, Jan 21, 2015 at 3:35 PM, Andries Engelbrecht <
> > >>>>> [email protected]> wrote:
> > >>>>>> The sample data I posted only has 1 element, but some records have
> > >>>>> multiple elements in them.
> > >>>>>>
> > >>>>>> Interestingly enough though
> > >>>>>> select t.entities.hashtags[0].`text` from `twitter.json` t limit
> 10;
> > >>>>>>
> > >>>>>> Produces
> > >>>>>> +------------+
> > >>>>>> |   EXPR$0   |
> > >>>>>> +------------+
> > >>>>>> | null       |
> > >>>>>> | SportsNews |
> > >>>>>> | null       |
> > >>>>>> | SportsNews |
> > >>>>>> | null       |
> > >>>>>> | null       |
> > >>>>>> | null       |
> > >>>>>> | null       |
> > >>>>>> | null       |
> > >>>>>> | CARvsSEA   |
> > >>>>>> +——————+
> > >>>>>>
> > >>>>>>
> > >>>>>> And
> > >>>>>>
> > >>>>>> select t.entities.hashtags[0] from `twitter.json` t limit 10;
> > >>>>>>
> > >>>>>> +------------+
> > >>>>>> |   EXPR$0   |
> > >>>>>> +------------+
> > >>>>>> | {"indices":[]} |
> > >>>>>> | {"text":"SportsNews","indices":[90,99]} |
> > >>>>>> | {"indices":[]} |
> > >>>>>> | {"text":"SportsNews","indices":[90,99]} |
> > >>>>>> | {"indices":[]} |
> > >>>>>> | {"indices":[]} |
> > >>>>>> | {"indices":[]} |
> > >>>>>> | {"indices":[]} |
> > >>>>>> | {"indices":[]} |
> > >>>>>> | {"text":"CARvsSEA","indices":[90,99]} |
> > >>>>>> +——————+
> > >>>>>>
> > >>>>>> Strange part is that there is no indices map in the hashtags
> array,
> > so
> > >>>>> no idea why it shows up when pointing to the first lament in an
> empty
> > >>> array.
> > >>>>>>
> > >>>>>>
> > >>>>>> —Andries
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On Jan 21, 2015, at 3:24 PM, Hao Zhu <[email protected]> wrote:
> > >>>>>>
> > >>>>>>> I just noticed that the result of "hashtags" is just an array
> with
> > >>>>> only 1
> > >>>>>>> element.
> > >>>>>>> So take your example:
> > >>>>>>> [root@maprdemo tmp]# cat d.json
> > >>>>>>> {
> > >>>>>>> "entities": {
> > >>>>>>> "trends": [],
> > >>>>>>> "symbols": [],
> > >>>>>>> "urls": [],
> > >>>>>>> "hashtags": [],
> > >>>>>>> "user_mentions": []
> > >>>>>>> },
> > >>>>>>> "entities": {
> > >>>>>>> "trends": [1,2,3],
> > >>>>>>> "symbols": [4,5,6],
> > >>>>>>> "urls": [7,8,9],
> > >>>>>>> "hashtags": [
> > >>>>>>>  {
> > >>>>>>>    "text": "GoPatriots",
> > >>>>>>>    "indices": []
> > >>>>>>>  }
> > >>>>>>> ],
> > >>>>>>> "user_mentions": []
> > >>>>>>> }
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> Now we can do this to achieve the results:
> > >>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json`
> t
> > ;
> > >>>>>>> +------------+
> > >>>>>>> |   EXPR$0   |
> > >>>>>>> +------------+
> > >>>>>>> | [{"text":"GoPatriots"}] |
> > >>>>>>> +------------+
> > >>>>>>> 1 row selected (0.09 seconds)
> > >>>>>>> 0: jdbc:drill:> select t.entities.hashtags[0].text from
> > >>>>> dfs.tmp.`d.json` t ;
> > >>>>>>> +------------+
> > >>>>>>> |   EXPR$0   |
> > >>>>>>> +------------+
> > >>>>>>> | GoPatriots |
> > >>>>>>> +------------+
> > >>>>>>> 1 row selected (0.108 seconds)
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Hao
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht <
> > >>>>>>> [email protected]> wrote:
> > >>>>>>>
> > >>>>>>>> When I run the query on a larger dataset it actually show the
> > empty
> > >>>>>>>> records.
> > >>>>>>>>
> > >>>>>>>> select t.entities.hashtags from `twitter.json` t limit 10;
> > >>>>>>>>
> > >>>>>>>> +------------+
> > >>>>>>>> |   EXPR$0   |
> > >>>>>>>> +------------+
> > >>>>>>>> | []         |
> > >>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
> > >>>>>>>> | []         |
> > >>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
> > >>>>>>>> | []         |
> > >>>>>>>> | []         |
> > >>>>>>>> | []         |
> > >>>>>>>> | []         |
> > >>>>>>>> | []         |
> > >>>>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] |
> > >>>>>>>> +------------+
> > >>>>>>>> 10 rows selected (2.899 seconds)
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> However having the output as maps is not very useful, unless i
> can
> > >>>>> filter
> > >>>>>>>> out the records with empty arrays and then drill deeper into the
> > ones
> > >>>>> with
> > >>>>>>>> data in the arrays.
> > >>>>>>>>
> > >>>>>>>> BTW: Hao I would have expiated your query to return both rows,
> one
> > >>>>> with an
> > >>>>>>>> empty array as above and the other with the array data.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> —Andries
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Jan 21, 2015, at 2:56 PM, Hao Zhu <[email protected]> wrote:
> > >>>>>>>>
> > >>>>>>>>> I am not sure if below is expected behavior.
> > >>>>>>>>> If we only select "hashtags", and it will return only 1 row
> > ignoring
> > >>>>> the
> > >>>>>>>>> null value.
> > >>>>>>>>> However then if we try to get "hashtags.text", it fails...which
> > >>>>> means it
> > >>>>>>>> is
> > >>>>>>>>> still trying to read the NULL value.
> > >>>>>>>>> I am thinking it may confuse the SQL developers.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from
> dfs.tmp.`d.json`
> > t ;
> > >>>>>>>>> +------------+
> > >>>>>>>>> |   EXPR$0   |
> > >>>>>>>>> +------------+
> > >>>>>>>>> | [{"text":"GoPatriots"}] |
> > >>>>>>>>> +------------+
> > >>>>>>>>> 1 row selected (0.109 seconds)
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags.text from
> > >>>>> dfs.tmp.`d.json` t ;
> > >>>>>>>>>
> > >>>>>>>>> Query failed: Query failed: Failure while running fragment.,
> > >>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot
> be
> > >>>>> cast to
> > >>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [
> > >>>>>>>>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> > >>>>>>>>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Error: exception while executing query: Failure while executing
> > >>>>> query.
> > >>>>>>>>> (state=,code=0)
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Hao
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht <
> > >>>>>>>>> [email protected]> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Now try on hashtags with the following:
> > >>>>>>>>>>
> > >>>>>>>>>> drilldemo:5181> select t.entities.hashtags.`text` from
> > >>>>> `/twitter.json` t
> > >>>>>>>>>> where t.entities.hashtags is not null limit 10;
> > >>>>>>>>>>
> > >>>>>>>>>> Query failed: Query failed: Failure while running fragment.,
> > >>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot
> be
> > >>>>> cast to
> > >>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [
> > >>>>>>>>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
> > >>>>>>>>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Error: exception while executing query: Failure while
> executing
> > >>>>> query.
> > >>>>>>>>>> (state=,code=0)
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> {
> > >>>>>>>>>> "entities": {
> > >>>>>>>>>> "trends": [],
> > >>>>>>>>>> "symbols": [],
> > >>>>>>>>>> "urls": [],
> > >>>>>>>>>> "hashtags": [],
> > >>>>>>>>>> "user_mentions": []
> > >>>>>>>>>> },
> > >>>>>>>>>> "entities": {
> > >>>>>>>>>> "trends": [1,2,3],
> > >>>>>>>>>> "symbols": [4,5,6],
> > >>>>>>>>>> "urls": [7,8,9],
> > >>>>>>>>>> "hashtags": [
> > >>>>>>>>>> {
> > >>>>>>>>>>   "text": "GoPatriots",
> > >>>>>>>>>>   "indices": []
> > >>>>>>>>>> }
> > >>>>>>>>>> ],
> > >>>>>>>>>> "user_mentions": []
> > >>>>>>>>>> }
> > >>>>>>>>>> }
> > >>>>>>>>>>
> > >>>>>>>>>> The issue seems to be that if some records have arrays with
> > maps in
> > >>>>> them
> > >>>>>>>>>> and others are empty.
> > >>>>>>>>>>
> > >>>>>>>>>> —Andries
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <[email protected]>
> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Seems it works for below json file:
> > >>>>>>>>>>> {
> > >>>>>>>>>>> "entities": {
> > >>>>>>>>>>> "trends": [],
> > >>>>>>>>>>> "symbols": [],
> > >>>>>>>>>>> "urls": [],
> > >>>>>>>>>>> "hashtags": [
> > >>>>>>>>>>> {
> > >>>>>>>>>>>   "text": "GoPatriots",
> > >>>>>>>>>>>   "indices": [
> > >>>>>>>>>>>     83,
> > >>>>>>>>>>>     94
> > >>>>>>>>>>>   ]
> > >>>>>>>>>>> }
> > >>>>>>>>>>> ],
> > >>>>>>>>>>> "user_mentions": []
> > >>>>>>>>>>> },
> > >>>>>>>>>>> "entities": {
> > >>>>>>>>>>> "trends": [1,2,3],
> > >>>>>>>>>>> "symbols": [4,5,6],
> > >>>>>>>>>>> "urls": [7,8,9],
> > >>>>>>>>>>> "hashtags": [
> > >>>>>>>>>>> {
> > >>>>>>>>>>>   "text": "GoPatriots",
> > >>>>>>>>>>>   "indices": []
> > >>>>>>>>>>> }
> > >>>>>>>>>>> ],
> > >>>>>>>>>>> "user_mentions": []
> > >>>>>>>>>>> }
> > >>>>>>>>>>> }
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json`
> > as t
> > >>>>> where
> > >>>>>>>>>>> t.entities.urls is not null;
> > >>>>>>>>>>> +------------+
> > >>>>>>>>>>> |   EXPR$0   |
> > >>>>>>>>>>> +------------+
> > >>>>>>>>>>> | [7,8,9]    |
> > >>>>>>>>>>> +------------+
> > >>>>>>>>>>> 1 row selected (0.139 seconds)
> > >>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json`
> > as t
> > >>>>> where
> > >>>>>>>>>>> t.entities.urls is null;
> > >>>>>>>>>>> +------------+
> > >>>>>>>>>>> |   EXPR$0   |
> > >>>>>>>>>>> +------------+
> > >>>>>>>>>>> +------------+
> > >>>>>>>>>>> No rows selected (0.158 seconds)
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks,
> > >>>>>>>>>>> Hao
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya <
> > [email protected]>
> > >>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> I believe that this works if the array contains homogeneous
> > >>>>> primitive
> > >>>>>>>>>>>> types. In your example, it appears from the error, the array
> > >>> field
> > >>>>>>>>>> 'member'
> > >>>>>>>>>>>> contained maps for at least one record.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta <
> > >>>>> [email protected]>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Trying that locally did not work for me (drill 0.7.0):
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` from
> > >>>>>>>>>>>> `Downloads/test.json` where repeated_count(`members`) > 0;
> > >>>>>>>>>>>>> Query failed: Query stopped., Failure while trying to
> > >>> materialize
> > >>>>>>>>>>>> incoming schema.  Errors:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Error in expression at index -1.  Error: Missing function
> > >>>>>>>>>>>> implementation: [repeated_count(MAP-REPEATED)].  Full
> > expression:
> > >>>>>>>>>> --UNKNOWN
> > >>>>>>>>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on
> > >>>>>>>>>> bullseye-3:31010 ]
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Error: exception while executing query: Failure while
> > executing
> > >>>>>>>> query.
> > >>>>>>>>>>>> (state=,code=0)
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> 
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Chris Matta
> > >>>>>>>>>>>>> [email protected]
> > >>>>>>>>>>>>> 215-701-3146
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya <
> > >>> [email protected]
> > >>>>>>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>> repeated_count('entities.urls') > 0
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht <
> > >>>>>>>>>>>>>> [email protected]> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> How do you filter out records with an empty array in
> drill?
> > >>>>>>>>>>>>>>> i.e some records have "url":[]  and some will have an
> array
> > >>>>> with
> > >>>>>>>> data
> > >>>>>>>>>>>> in
> > >>>>>>>>>>>>>>> it. When trying to read records with data in the array
> > drill
> > >>>>> fails
> > >>>>>>>>>> due
> > >>>>>>>>>>>>>> to
> > >>>>>>>>>>>>>>> records missing any data in the array. Trying a filter
> > with/*
> > >>>>> where
> > >>>>>>>>>>>>>>> "url":[0] is not null */ fails, also fails if applying
> url
> > is
> > >>>>> not
> > >>>>>>>>>>>> null.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Note some of the arrays contains maps, using twitter data
> > as
> > >>> an
> > >>>>>>>>>>>> example
> > >>>>>>>>>>>>>>> below. Some records have an empty array with
> “hashtags”:[]
> > >>> and
> > >>>>>>>>>> others
> > >>>>>>>>>>>>>> will
> > >>>>>>>>>>>>>>> look similar to what is listed below.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> "entities": {
> > >>>>>>>>>>>>>>> "trends": [],
> > >>>>>>>>>>>>>>> "symbols": [],
> > >>>>>>>>>>>>>>> "urls": [],
> > >>>>>>>>>>>>>>> "hashtags": [
> > >>>>>>>>>>>>>>> {
> > >>>>>>>>>>>>>>>   "text": "GoPatriots",
> > >>>>>>>>>>>>>>>   "indices": [
> > >>>>>>>>>>>>>>>     83,
> > >>>>>>>>>>>>>>>     94
> > >>>>>>>>>>>>>>>   ]
> > >>>>>>>>>>>>>>> }
> > >>>>>>>>>>>>>>> ],
> > >>>>>>>>>>>>>>> "user_mentions": []
> > >>>>>>>>>>>>>>> },
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thanks
> > >>>>>>>>>>>>>>> —Andries
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> <z.json>
> > >>>>>
> > >>>>>
> > >>>
> > >>>
> > >
> >
> >
>

Re: Filter out empty arrays in JSON

Reply via email to