Re: Filter out empty arrays in JSON

Sudhakar Thota Thu, 05 Feb 2015 19:36:50 -0800

You can use repeated_count function to eliminate the null arrays.

Sudhakar Thota
Sent from my iPhone


> On Feb 5, 2015, at 7:05 PM, Andries Engelbrecht <aengelbre...@maprtech.com> 
> wrote:
> 
> Chris 
> 
> Thanks, but I tried it before without success.
> 
> I still get this on Drill 0.7. 
> 
> select t.entities.hashtags from dfs.twitter.`/nfl` t where 
> t.entities.hashtags[0] is not null limit 10;
> 
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> Query failed: Query failed: Failure while running fragment., Failure while 
> trying to materialize incoming schema.  Errors:
> 
> Error in expression at index 0.  Error: Missing function implementation: 
> [isnotnull(MAP-REQUIRED)].  Full expression: null.. [ 
> 0dab71b6-1860-4739-b752-1458ed29b671 on drilldemo:31010 ]
> [ 0dab71b6-1860-4739-b752-1458ed29b671 on drilldemo:31010 ]
> 
> 
> Mine seems finicky about even using the first element in the array
> 
> select t.entities.hashtags[0] from dfs.twitter.`/nfl` t limit 10;
> 
> Query failed: Query failed: Failure while running fragment., index: 4, 
> length: 4 (expected: range(0, 4)) [ 62cd2d3a-546e-41c8-a0c5-4da2b81c2ef8 on 
> drilldemo:31010 ]
> [ 62cd2d3a-546e-41c8-a0c5-4da2b81c2ef8 on drilldemo:31010 ]
> 
> 
> However flatten seems to work. (as long as you don’t have thousands of small 
> json files)
> 
> select flatten(t.entities.hashtags) from dfs.twitter.`/nfl` t limit 10;
> 
> +------------+
> |   EXPR$0   |
> +------------+
> | {"text":"PantherPride","indices":["12","25"]} |
> | {"text":"KeepPounding","indices":["48","61"]} |
> | {"text":"vegas","indices":["107","113"]} |
> | {"text":"nfl","indices":["23","27"]} |
> | {"text":"Cowboys","indices":["7","15"]} |
> | {"text":"Cowboys","indices":["7","15"]} |
> | {"text":"Ebay","indices":["52","57"]} |
> | {"text":"NFL","indices":["58","62"]} |
> | {"text":"Christmas","indices":["63","73"]} |
> | {"text":"Gift","indices":["74","79"]} |
> +------------+
> 
> 
> —Andries
> 
> 
> 
> 
>> On Feb 5, 2015, at 6:41 PM, Christopher Matta <cma...@mapr.com> wrote:
>> 
>> Andreas, I think I may have solved your issue:
>> 
>>> select t.entities.hashtags FROM mfs.cmatta.`tweets/blackfriday` t WHERE 
>>> t.entities.hashtags[0].text is not null limit 10;
>> +------------+
>> |   EXPR$0   |
>> +------------+
>> | 
>> [{"text":"Blackfriday","indices":[11,23]},{"text":"facebook","indices":[56,65]},{"text":"christmas","indices":[66,76]}]
>> |
>> | 
>> [{"text":"BlackFriday","indices":[65,77]},{"text":"ViernesNegro","indices":[105,118]},{"text":"ofertas","indices":[122,130]}]
>> |
>> | 
>> [{"text":"StarWars","indices":[21,30]},{"text":"BlackFriday","indices":[71,83]}]
>> |
>> | [{"text":"NotOneDime","indices":[14,25]}] |
>> | [{"text":"BlackFriday","indices":[43,55]}] |
>> | [{"text":"JRStudio","indices":[93,102]}] |
>> | 
>> [{"text":"luv","indices":[38,42]},{"text":"luvmanicures","indices":[43,56]},{"text":"luvpedicures","indices":[57,70]},{"text":"luvroyaloak","indices":[71,83]},{"text":"woodwardave","indices":[84,96]}]
>> |
>> | [{"text":"BlackFriday","indices":[0,12]}] |
>> | 
>> [{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}]
>> |
>> | 
>> [{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}]
>> |
>> +------------+
>> 10 rows selected (1.697 seconds)
>> 
>> I noticed when only extracting the first text item in the hashtag array (
>> t.entities.hashtags[0].text) it was returning null for those tweets that
>> didn’t have hashtags, so filtering out that seems to work.
>> 
>> 
>> Chris Matta
>> cma...@mapr.com
>> 215-701-3146
>> 
>> On Mon, Jan 26, 2015 at 6:43 PM, Jason Altekruse <altekruseja...@gmail.com>
>> wrote:
>> 
>>> As Aditya commented before this will work if the lists only contain scalars
>>> 
>>> repeated_count('entities.urls') > 0
>>> 
>>> If the lists contain maps unfortunately this is not available today. There
>>> is an enhancement request open for this feature. I have marked it for a fix
>>> in 0.9 as it is more of a feature request than a bug and we are working on
>>> closing a large number of bugs for 0.8 before we get to issues like this.
>>> 
>>> https://issues.apache.org/jira/browse/DRILL-1650
>>> 
>>> -Jason
>>> 
>>> On Mon, Jan 26, 2015 at 3:26 PM, Andries Engelbrecht <
>>> aengelbre...@maprtech.com> wrote:
>>> 
>>>> Unfortunately it seems that with larger data sets the use of flatten
>>> seems
>>>> to produce an error.
>>>> 
>>>> Any other options to filter out JSON records (objects) where an array is
>>>> empty?
>>>> 
>>>> Thanks
>>>> —Andries
>>>> 
>>>> 
>>>> On Jan 21, 2015, at 5:18 PM, Andries Engelbrecht <
>>>> aengelbre...@maprtech.com> wrote:
>>>> 
>>>>> It was interesting to see flatten bypass the records with an empty
>>> array.
>>>>> 
>>>>> Unfortunately some arrays are much more complex than the example here,
>>>> and it is still useful to have the ability to filter out ones that are
>>>> empty.
>>>>> 
>>>>> —Andries
>>>>> 
>>>>>> On Jan 21, 2015, at 5:04 PM, Hao Zhu <h...@maprtech.com> wrote:
>>>>>> 
>>>>>> I figured out the differences after getting the json file from
>>> Andries.
>>>>>> 
>>>>>> My json file is like:
>>>>>> {“entities”:{xxx},“entities”:{yyy}... }
>>>>>> 
>>>>>> Andries' json file is like:
>>>>>> {“entities”:{xxx}}
>>>>>> {“entities”:{yyy}}
>>>>>> ...
>>>>>> 
>>>>>> So basically Andries' json file is contains multiple json files.
>>>>>> 
>>>>>> Fortunately, we can use the same SQL to get the same results:
>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z2.json` t;
>>>>>> +------------+
>>>>>> |   EXPR$0   |
>>>>>> +------------+
>>>>>> | []         |
>>>>>> | [{"text":"GoPatriots"}] |
>>>>>> | [{"text":"aaa"}] |
>>>>>> | []         |
>>>>>> | [{"text":"bbb"}] |
>>>>>> +------------+
>>>>>> 5 rows selected (0.136 seconds)
>>>>>> 0: jdbc:drill:> with tmp as
>>>>>> . . . . . . . > (
>>>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from
>>>>>> dfs.tmp.`z2.json` t
>>>>>> . . . . . . . > )
>>>>>> . . . . . . . > select tmp.c.text from tmp;
>>>>>> +------------+
>>>>>> |   EXPR$0   |
>>>>>> +------------+
>>>>>> | GoPatriots |
>>>>>> | aaa        |
>>>>>> | bbb        |
>>>>>> +------------+
>>>>>> 3 rows selected (0.115 seconds)
>>>>>> 
>>>>>> Thanks,
>>>>>> Hao
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jan 21, 2015 at 4:34 PM, Andries Engelbrecht <
>>>>>> aengelbre...@maprtech.com> wrote:
>>>>>> 
>>>>>>> In my case it returns the empty records when flatten is not used.
>>>>>>> 
>>>>>>> 0: jdbc:drill:zk=drilldemo:5181> select t.entities.hashtags as
>>> hashtags
>>>>>>> from `twitter.json` t limit 10;
>>>>>>> +------------+
>>>>>>> |  hashtags  |
>>>>>>> +------------+
>>>>>>> | []         |
>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
>>>>>>> | []         |
>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
>>>>>>> | []         |
>>>>>>> | []         |
>>>>>>> | []         |
>>>>>>> | []         |
>>>>>>> | []         |
>>>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] |
>>>>>>> +------------+
>>>>>>> 
>>>>>>>> On Jan 21, 2015, at 4:26 PM, Hao Zhu <h...@maprtech.com> wrote:
>>>>>>>> 
>>>>>>>> Actually not due to flatten, if you directly query the file, it will
>>>> only
>>>>>>>> show the non-null values.
>>>>>>>> 
>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z.json` t;
>>>>>>>> +------------+
>>>>>>>> |   EXPR$0   |
>>>>>>>> +------------+
>>>>>>>> | [{"text":"GoPatriots"},{"text":"aaa"},{"text":"bbb"}] |
>>>>>>>> +------------+
>>>>>>>> 1 row selected (0.123 seconds)
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Hao
>>>>>>>> 
>>>>>>>> On Wed, Jan 21, 2015 at 4:18 PM, Andries Engelbrecht <
>>>>>>>> aengelbre...@maprtech.com> wrote:
>>>>>>>> 
>>>>>>>>> Very interesting, flatten seems to bypass empty records. Not sure
>>> if
>>>>>>> that
>>>>>>>>> is an ideal result for all use cases, but certainly usable in this
>>>> case.
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> —Andries
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Jan 21, 2015, at 3:45 PM, Hao Zhu <h...@maprtech.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> I can also fetch non-null values for attached json file which
>>>> contains
>>>>>>> 6
>>>>>>>>> "entities", 3 of them are null, 3 of them have "text" value.
>>>>>>>>>> 
>>>>>>>>>> Could you share your complete "twitter.json"?
>>>>>>>>>> 
>>>>>>>>>> 0: jdbc:drill:> with tmp as
>>>>>>>>>> . . . . . . . > (
>>>>>>>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from
>>>>>>>>> dfs.tmp.`z.json` t
>>>>>>>>>> . . . . . . . > )
>>>>>>>>>> . . . . . . . > select tmp.c.text from tmp;
>>>>>>>>>> +------------+
>>>>>>>>>> |   EXPR$0   |
>>>>>>>>>> +------------+
>>>>>>>>>> | GoPatriots |
>>>>>>>>>> | aaa        |
>>>>>>>>>> | bbb        |
>>>>>>>>>> +------------+
>>>>>>>>>> 3 rows selected (0.122 seconds)
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Hao
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jan 21, 2015 at 3:35 PM, Andries Engelbrecht <
>>>>>>>>>> aengelbre...@maprtech.com> wrote:
>>>>>>>>>> The sample data I posted only has 1 element, but some records have
>>>>>>>>> multiple elements in them.
>>>>>>>>>> 
>>>>>>>>>> Interestingly enough though
>>>>>>>>>> select t.entities.hashtags[0].`text` from `twitter.json` t limit
>>> 10;
>>>>>>>>>> 
>>>>>>>>>> Produces
>>>>>>>>>> +------------+
>>>>>>>>>> |   EXPR$0   |
>>>>>>>>>> +------------+
>>>>>>>>>> | null       |
>>>>>>>>>> | SportsNews |
>>>>>>>>>> | null       |
>>>>>>>>>> | SportsNews |
>>>>>>>>>> | null       |
>>>>>>>>>> | null       |
>>>>>>>>>> | null       |
>>>>>>>>>> | null       |
>>>>>>>>>> | null       |
>>>>>>>>>> | CARvsSEA   |
>>>>>>>>>> +——————+
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> And
>>>>>>>>>> 
>>>>>>>>>> select t.entities.hashtags[0] from `twitter.json` t limit 10;
>>>>>>>>>> 
>>>>>>>>>> +------------+
>>>>>>>>>> |   EXPR$0   |
>>>>>>>>>> +------------+
>>>>>>>>>> | {"indices":[]} |
>>>>>>>>>> | {"text":"SportsNews","indices":[90,99]} |
>>>>>>>>>> | {"indices":[]} |
>>>>>>>>>> | {"text":"SportsNews","indices":[90,99]} |
>>>>>>>>>> | {"indices":[]} |
>>>>>>>>>> | {"indices":[]} |
>>>>>>>>>> | {"indices":[]} |
>>>>>>>>>> | {"indices":[]} |
>>>>>>>>>> | {"indices":[]} |
>>>>>>>>>> | {"text":"CARvsSEA","indices":[90,99]} |
>>>>>>>>>> +——————+
>>>>>>>>>> 
>>>>>>>>>> Strange part is that there is no indices map in the hashtags
>>> array,
>>>> so
>>>>>>>>> no idea why it shows up when pointing to the first lament in an
>>> empty
>>>>>>> array.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> —Andries
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Jan 21, 2015, at 3:24 PM, Hao Zhu <h...@maprtech.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I just noticed that the result of "hashtags" is just an array
>>> with
>>>>>>>>> only 1
>>>>>>>>>>> element.
>>>>>>>>>>> So take your example:
>>>>>>>>>>> [root@maprdemo tmp]# cat d.json
>>>>>>>>>>> {
>>>>>>>>>>> "entities": {
>>>>>>>>>>> "trends": [],
>>>>>>>>>>> "symbols": [],
>>>>>>>>>>> "urls": [],
>>>>>>>>>>> "hashtags": [],
>>>>>>>>>>> "user_mentions": []
>>>>>>>>>>> },
>>>>>>>>>>> "entities": {
>>>>>>>>>>> "trends": [1,2,3],
>>>>>>>>>>> "symbols": [4,5,6],
>>>>>>>>>>> "urls": [7,8,9],
>>>>>>>>>>> "hashtags": [
>>>>>>>>>>> {
>>>>>>>>>>>  "text": "GoPatriots",
>>>>>>>>>>>  "indices": []
>>>>>>>>>>> }
>>>>>>>>>>> ],
>>>>>>>>>>> "user_mentions": []
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>> 
>>>>>>>>>>> Now we can do this to achieve the results:
>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json`
>>> t
>>>> ;
>>>>>>>>>>> +------------+
>>>>>>>>>>> |   EXPR$0   |
>>>>>>>>>>> +------------+
>>>>>>>>>>> | [{"text":"GoPatriots"}] |
>>>>>>>>>>> +------------+
>>>>>>>>>>> 1 row selected (0.09 seconds)
>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags[0].text from
>>>>>>>>> dfs.tmp.`d.json` t ;
>>>>>>>>>>> +------------+
>>>>>>>>>>> |   EXPR$0   |
>>>>>>>>>>> +------------+
>>>>>>>>>>> | GoPatriots |
>>>>>>>>>>> +------------+
>>>>>>>>>>> 1 row selected (0.108 seconds)
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Hao
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht <
>>>>>>>>>>> aengelbre...@maprtech.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> When I run the query on a larger dataset it actually show the
>>>> empty
>>>>>>>>>>>> records.
>>>>>>>>>>>> 
>>>>>>>>>>>> select t.entities.hashtags from `twitter.json` t limit 10;
>>>>>>>>>>>> 
>>>>>>>>>>>> +------------+
>>>>>>>>>>>> |   EXPR$0   |
>>>>>>>>>>>> +------------+
>>>>>>>>>>>> | []         |
>>>>>>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
>>>>>>>>>>>> | []         |
>>>>>>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] |
>>>>>>>>>>>> | []         |
>>>>>>>>>>>> | []         |
>>>>>>>>>>>> | []         |
>>>>>>>>>>>> | []         |
>>>>>>>>>>>> | []         |
>>>>>>>>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] |
>>>>>>>>>>>> +------------+
>>>>>>>>>>>> 10 rows selected (2.899 seconds)
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> However having the output as maps is not very useful, unless i
>>> can
>>>>>>>>> filter
>>>>>>>>>>>> out the records with empty arrays and then drill deeper into the
>>>> ones
>>>>>>>>> with
>>>>>>>>>>>> data in the arrays.
>>>>>>>>>>>> 
>>>>>>>>>>>> BTW: Hao I would have expiated your query to return both rows,
>>> one
>>>>>>>>> with an
>>>>>>>>>>>> empty array as above and the other with the array data.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> —Andries
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Jan 21, 2015, at 2:56 PM, Hao Zhu <h...@maprtech.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I am not sure if below is expected behavior.
>>>>>>>>>>>>> If we only select "hashtags", and it will return only 1 row
>>>> ignoring
>>>>>>>>> the
>>>>>>>>>>>>> null value.
>>>>>>>>>>>>> However then if we try to get "hashtags.text", it fails...which
>>>>>>>>> means it
>>>>>>>>>>>> is
>>>>>>>>>>>>> still trying to read the NULL value.
>>>>>>>>>>>>> I am thinking it may confuse the SQL developers.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from
>>> dfs.tmp.`d.json`
>>>> t ;
>>>>>>>>>>>>> +------------+
>>>>>>>>>>>>> |   EXPR$0   |
>>>>>>>>>>>>> +------------+
>>>>>>>>>>>>> | [{"text":"GoPatriots"}] |
>>>>>>>>>>>>> +------------+
>>>>>>>>>>>>> 1 row selected (0.109 seconds)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags.text from
>>>>>>>>> dfs.tmp.`d.json` t ;
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Query failed: Query failed: Failure while running fragment.,
>>>>>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot
>>> be
>>>>>>>>> cast to
>>>>>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [
>>>>>>>>>>>>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
>>>>>>>>>>>>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Error: exception while executing query: Failure while executing
>>>>>>>>> query.
>>>>>>>>>>>>> (state=,code=0)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Hao
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht <
>>>>>>>>>>>>> aengelbre...@maprtech.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Now try on hashtags with the following:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> drilldemo:5181> select t.entities.hashtags.`text` from
>>>>>>>>> `/twitter.json` t
>>>>>>>>>>>>>> where t.entities.hashtags is not null limit 10;
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Query failed: Query failed: Failure while running fragment.,
>>>>>>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot
>>> be
>>>>>>>>> cast to
>>>>>>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [
>>>>>>>>>>>>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
>>>>>>>>>>>>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ]
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Error: exception while executing query: Failure while
>>> executing
>>>>>>>>> query.
>>>>>>>>>>>>>> (state=,code=0)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> "entities": {
>>>>>>>>>>>>>> "trends": [],
>>>>>>>>>>>>>> "symbols": [],
>>>>>>>>>>>>>> "urls": [],
>>>>>>>>>>>>>> "hashtags": [],
>>>>>>>>>>>>>> "user_mentions": []
>>>>>>>>>>>>>> },
>>>>>>>>>>>>>> "entities": {
>>>>>>>>>>>>>> "trends": [1,2,3],
>>>>>>>>>>>>>> "symbols": [4,5,6],
>>>>>>>>>>>>>> "urls": [7,8,9],
>>>>>>>>>>>>>> "hashtags": [
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> "text": "GoPatriots",
>>>>>>>>>>>>>> "indices": []
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> ],
>>>>>>>>>>>>>> "user_mentions": []
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The issue seems to be that if some records have arrays with
>>>> maps in
>>>>>>>>> them
>>>>>>>>>>>>>> and others are empty.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> —Andries
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <h...@maprtech.com>
>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Seems it works for below json file:
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> "entities": {
>>>>>>>>>>>>>>> "trends": [],
>>>>>>>>>>>>>>> "symbols": [],
>>>>>>>>>>>>>>> "urls": [],
>>>>>>>>>>>>>>> "hashtags": [
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> "text": "GoPatriots",
>>>>>>>>>>>>>>> "indices": [
>>>>>>>>>>>>>>>   83,
>>>>>>>>>>>>>>>   94
>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> ],
>>>>>>>>>>>>>>> "user_mentions": []
>>>>>>>>>>>>>>> },
>>>>>>>>>>>>>>> "entities": {
>>>>>>>>>>>>>>> "trends": [1,2,3],
>>>>>>>>>>>>>>> "symbols": [4,5,6],
>>>>>>>>>>>>>>> "urls": [7,8,9],
>>>>>>>>>>>>>>> "hashtags": [
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> "text": "GoPatriots",
>>>>>>>>>>>>>>> "indices": []
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> ],
>>>>>>>>>>>>>>> "user_mentions": []
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json`
>>>> as t
>>>>>>>>> where
>>>>>>>>>>>>>>> t.entities.urls is not null;
>>>>>>>>>>>>>>> +------------+
>>>>>>>>>>>>>>> |   EXPR$0   |
>>>>>>>>>>>>>>> +------------+
>>>>>>>>>>>>>>> | [7,8,9]    |
>>>>>>>>>>>>>>> +------------+
>>>>>>>>>>>>>>> 1 row selected (0.139 seconds)
>>>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json`
>>>> as t
>>>>>>>>> where
>>>>>>>>>>>>>>> t.entities.urls is null;
>>>>>>>>>>>>>>> +------------+
>>>>>>>>>>>>>>> |   EXPR$0   |
>>>>>>>>>>>>>>> +------------+
>>>>>>>>>>>>>>> +------------+
>>>>>>>>>>>>>>> No rows selected (0.158 seconds)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Hao
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya <
>>>> adityakish...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I believe that this works if the array contains homogeneous
>>>>>>>>> primitive
>>>>>>>>>>>>>>>> types. In your example, it appears from the error, the array
>>>>>>> field
>>>>>>>>>>>>>> 'member'
>>>>>>>>>>>>>>>> contained maps for at least one record.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta <
>>>>>>>>> cma...@mapr.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Trying that locally did not work for me (drill 0.7.0):
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` from
>>>>>>>>>>>>>>>> `Downloads/test.json` where repeated_count(`members`) > 0;
>>>>>>>>>>>>>>>>> Query failed: Query stopped., Failure while trying to
>>>>>>> materialize
>>>>>>>>>>>>>>>> incoming schema.  Errors:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Error in expression at index -1.  Error: Missing function
>>>>>>>>>>>>>>>> implementation: [repeated_count(MAP-REPEATED)].  Full
>>>> expression:
>>>>>>>>>>>>>> --UNKNOWN
>>>>>>>>>>>>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on
>>>>>>>>>>>>>> bullseye-3:31010 ]
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Error: exception while executing query: Failure while
>>>> executing
>>>>>>>>>>>> query.
>>>>>>>>>>>>>>>> (state=,code=0)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Chris Matta
>>>>>>>>>>>>>>>>> cma...@mapr.com
>>>>>>>>>>>>>>>>> 215-701-3146
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya <
>>>>>>> adityakish...@gmail.com
>>>>>>>>>> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> repeated_count('entities.urls') > 0
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht <
>>>>>>>>>>>>>>>>>> aengelbre...@maprtech.com> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> How do you filter out records with an empty array in
>>> drill?
>>>>>>>>>>>>>>>>>>> i.e some records have "url":[]  and some will have an
>>> array
>>>>>>>>> with
>>>>>>>>>>>> data
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> it. When trying to read records with data in the array
>>>> drill
>>>>>>>>> fails
>>>>>>>>>>>>>> due
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> records missing any data in the array. Trying a filter
>>>> with/*
>>>>>>>>> where
>>>>>>>>>>>>>>>>>>> "url":[0] is not null */ fails, also fails if applying
>>> url
>>>> is
>>>>>>>>> not
>>>>>>>>>>>>>>>> null.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Note some of the arrays contains maps, using twitter data
>>>> as
>>>>>>> an
>>>>>>>>>>>>>>>> example
>>>>>>>>>>>>>>>>>>> below. Some records have an empty array with
>>> “hashtags”:[]
>>>>>>> and
>>>>>>>>>>>>>> others
>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>> look similar to what is listed below.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> "entities": {
>>>>>>>>>>>>>>>>>>> "trends": [],
>>>>>>>>>>>>>>>>>>> "symbols": [],
>>>>>>>>>>>>>>>>>>> "urls": [],
>>>>>>>>>>>>>>>>>>> "hashtags": [
>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>> "text": "GoPatriots",
>>>>>>>>>>>>>>>>>>> "indices": [
>>>>>>>>>>>>>>>>>>>   83,
>>>>>>>>>>>>>>>>>>>   94
>>>>>>>>>>>>>>>>>>> ]
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>> ],
>>>>>>>>>>>>>>>>>>> "user_mentions": []
>>>>>>>>>>>>>>>>>>> },
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>> —Andries
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> <z.json>
>

Re: Filter out empty arrays in JSON

Reply via email to