You can use repeated_count function to eliminate the null arrays. Sudhakar Thota Sent from my iPhone
> On Feb 5, 2015, at 7:05 PM, Andries Engelbrecht <aengelbre...@maprtech.com> > wrote: > > Chris > > Thanks, but I tried it before without success. > > I still get this on Drill 0.7. > > select t.entities.hashtags from dfs.twitter.`/nfl` t where > t.entities.hashtags[0] is not null limit 10; > > Error: exception while executing query: Failure while executing query. > (state=,code=0) > Query failed: Query failed: Failure while running fragment., Failure while > trying to materialize incoming schema. Errors: > > Error in expression at index 0. Error: Missing function implementation: > [isnotnull(MAP-REQUIRED)]. Full expression: null.. [ > 0dab71b6-1860-4739-b752-1458ed29b671 on drilldemo:31010 ] > [ 0dab71b6-1860-4739-b752-1458ed29b671 on drilldemo:31010 ] > > > Mine seems finicky about even using the first element in the array > > select t.entities.hashtags[0] from dfs.twitter.`/nfl` t limit 10; > > Query failed: Query failed: Failure while running fragment., index: 4, > length: 4 (expected: range(0, 4)) [ 62cd2d3a-546e-41c8-a0c5-4da2b81c2ef8 on > drilldemo:31010 ] > [ 62cd2d3a-546e-41c8-a0c5-4da2b81c2ef8 on drilldemo:31010 ] > > > However flatten seems to work. (as long as you don’t have thousands of small > json files) > > select flatten(t.entities.hashtags) from dfs.twitter.`/nfl` t limit 10; > > +------------+ > | EXPR$0 | > +------------+ > | {"text":"PantherPride","indices":["12","25"]} | > | {"text":"KeepPounding","indices":["48","61"]} | > | {"text":"vegas","indices":["107","113"]} | > | {"text":"nfl","indices":["23","27"]} | > | {"text":"Cowboys","indices":["7","15"]} | > | {"text":"Cowboys","indices":["7","15"]} | > | {"text":"Ebay","indices":["52","57"]} | > | {"text":"NFL","indices":["58","62"]} | > | {"text":"Christmas","indices":["63","73"]} | > | {"text":"Gift","indices":["74","79"]} | > +------------+ > > > —Andries > > > > >> On Feb 5, 2015, at 6:41 PM, Christopher Matta <cma...@mapr.com> wrote: >> >> Andreas, I think I may have solved your issue: >> >>> select t.entities.hashtags FROM mfs.cmatta.`tweets/blackfriday` t WHERE >>> t.entities.hashtags[0].text is not null limit 10; >> +------------+ >> | EXPR$0 | >> +------------+ >> | >> [{"text":"Blackfriday","indices":[11,23]},{"text":"facebook","indices":[56,65]},{"text":"christmas","indices":[66,76]}] >> | >> | >> [{"text":"BlackFriday","indices":[65,77]},{"text":"ViernesNegro","indices":[105,118]},{"text":"ofertas","indices":[122,130]}] >> | >> | >> [{"text":"StarWars","indices":[21,30]},{"text":"BlackFriday","indices":[71,83]}] >> | >> | [{"text":"NotOneDime","indices":[14,25]}] | >> | [{"text":"BlackFriday","indices":[43,55]}] | >> | [{"text":"JRStudio","indices":[93,102]}] | >> | >> [{"text":"luv","indices":[38,42]},{"text":"luvmanicures","indices":[43,56]},{"text":"luvpedicures","indices":[57,70]},{"text":"luvroyaloak","indices":[71,83]},{"text":"woodwardave","indices":[84,96]}] >> | >> | [{"text":"BlackFriday","indices":[0,12]}] | >> | >> [{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}] >> | >> | >> [{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}] >> | >> +------------+ >> 10 rows selected (1.697 seconds) >> >> I noticed when only extracting the first text item in the hashtag array ( >> t.entities.hashtags[0].text) it was returning null for those tweets that >> didn’t have hashtags, so filtering out that seems to work. >> >> >> Chris Matta >> cma...@mapr.com >> 215-701-3146 >> >> On Mon, Jan 26, 2015 at 6:43 PM, Jason Altekruse <altekruseja...@gmail.com> >> wrote: >> >>> As Aditya commented before this will work if the lists only contain scalars >>> >>> repeated_count('entities.urls') > 0 >>> >>> If the lists contain maps unfortunately this is not available today. There >>> is an enhancement request open for this feature. I have marked it for a fix >>> in 0.9 as it is more of a feature request than a bug and we are working on >>> closing a large number of bugs for 0.8 before we get to issues like this. >>> >>> https://issues.apache.org/jira/browse/DRILL-1650 >>> >>> -Jason >>> >>> On Mon, Jan 26, 2015 at 3:26 PM, Andries Engelbrecht < >>> aengelbre...@maprtech.com> wrote: >>> >>>> Unfortunately it seems that with larger data sets the use of flatten >>> seems >>>> to produce an error. >>>> >>>> Any other options to filter out JSON records (objects) where an array is >>>> empty? >>>> >>>> Thanks >>>> —Andries >>>> >>>> >>>> On Jan 21, 2015, at 5:18 PM, Andries Engelbrecht < >>>> aengelbre...@maprtech.com> wrote: >>>> >>>>> It was interesting to see flatten bypass the records with an empty >>> array. >>>>> >>>>> Unfortunately some arrays are much more complex than the example here, >>>> and it is still useful to have the ability to filter out ones that are >>>> empty. >>>>> >>>>> —Andries >>>>> >>>>>> On Jan 21, 2015, at 5:04 PM, Hao Zhu <h...@maprtech.com> wrote: >>>>>> >>>>>> I figured out the differences after getting the json file from >>> Andries. >>>>>> >>>>>> My json file is like: >>>>>> {“entities”:{xxx},“entities”:{yyy}... } >>>>>> >>>>>> Andries' json file is like: >>>>>> {“entities”:{xxx}} >>>>>> {“entities”:{yyy}} >>>>>> ... >>>>>> >>>>>> So basically Andries' json file is contains multiple json files. >>>>>> >>>>>> Fortunately, we can use the same SQL to get the same results: >>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z2.json` t; >>>>>> +------------+ >>>>>> | EXPR$0 | >>>>>> +------------+ >>>>>> | [] | >>>>>> | [{"text":"GoPatriots"}] | >>>>>> | [{"text":"aaa"}] | >>>>>> | [] | >>>>>> | [{"text":"bbb"}] | >>>>>> +------------+ >>>>>> 5 rows selected (0.136 seconds) >>>>>> 0: jdbc:drill:> with tmp as >>>>>> . . . . . . . > ( >>>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from >>>>>> dfs.tmp.`z2.json` t >>>>>> . . . . . . . > ) >>>>>> . . . . . . . > select tmp.c.text from tmp; >>>>>> +------------+ >>>>>> | EXPR$0 | >>>>>> +------------+ >>>>>> | GoPatriots | >>>>>> | aaa | >>>>>> | bbb | >>>>>> +------------+ >>>>>> 3 rows selected (0.115 seconds) >>>>>> >>>>>> Thanks, >>>>>> Hao >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jan 21, 2015 at 4:34 PM, Andries Engelbrecht < >>>>>> aengelbre...@maprtech.com> wrote: >>>>>> >>>>>>> In my case it returns the empty records when flatten is not used. >>>>>>> >>>>>>> 0: jdbc:drill:zk=drilldemo:5181> select t.entities.hashtags as >>> hashtags >>>>>>> from `twitter.json` t limit 10; >>>>>>> +------------+ >>>>>>> | hashtags | >>>>>>> +------------+ >>>>>>> | [] | >>>>>>> | [{"text":"SportsNews","indices":[0,11]}] | >>>>>>> | [] | >>>>>>> | [{"text":"SportsNews","indices":[0,11]}] | >>>>>>> | [] | >>>>>>> | [] | >>>>>>> | [] | >>>>>>> | [] | >>>>>>> | [] | >>>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] | >>>>>>> +------------+ >>>>>>> >>>>>>>> On Jan 21, 2015, at 4:26 PM, Hao Zhu <h...@maprtech.com> wrote: >>>>>>>> >>>>>>>> Actually not due to flatten, if you directly query the file, it will >>>> only >>>>>>>> show the non-null values. >>>>>>>> >>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z.json` t; >>>>>>>> +------------+ >>>>>>>> | EXPR$0 | >>>>>>>> +------------+ >>>>>>>> | [{"text":"GoPatriots"},{"text":"aaa"},{"text":"bbb"}] | >>>>>>>> +------------+ >>>>>>>> 1 row selected (0.123 seconds) >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Hao >>>>>>>> >>>>>>>> On Wed, Jan 21, 2015 at 4:18 PM, Andries Engelbrecht < >>>>>>>> aengelbre...@maprtech.com> wrote: >>>>>>>> >>>>>>>>> Very interesting, flatten seems to bypass empty records. Not sure >>> if >>>>>>> that >>>>>>>>> is an ideal result for all use cases, but certainly usable in this >>>> case. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> —Andries >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Jan 21, 2015, at 3:45 PM, Hao Zhu <h...@maprtech.com> wrote: >>>>>>>>>> >>>>>>>>>> I can also fetch non-null values for attached json file which >>>> contains >>>>>>> 6 >>>>>>>>> "entities", 3 of them are null, 3 of them have "text" value. >>>>>>>>>> >>>>>>>>>> Could you share your complete "twitter.json"? >>>>>>>>>> >>>>>>>>>> 0: jdbc:drill:> with tmp as >>>>>>>>>> . . . . . . . > ( >>>>>>>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from >>>>>>>>> dfs.tmp.`z.json` t >>>>>>>>>> . . . . . . . > ) >>>>>>>>>> . . . . . . . > select tmp.c.text from tmp; >>>>>>>>>> +------------+ >>>>>>>>>> | EXPR$0 | >>>>>>>>>> +------------+ >>>>>>>>>> | GoPatriots | >>>>>>>>>> | aaa | >>>>>>>>>> | bbb | >>>>>>>>>> +------------+ >>>>>>>>>> 3 rows selected (0.122 seconds) >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Hao >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Wed, Jan 21, 2015 at 3:35 PM, Andries Engelbrecht < >>>>>>>>>> aengelbre...@maprtech.com> wrote: >>>>>>>>>> The sample data I posted only has 1 element, but some records have >>>>>>>>> multiple elements in them. >>>>>>>>>> >>>>>>>>>> Interestingly enough though >>>>>>>>>> select t.entities.hashtags[0].`text` from `twitter.json` t limit >>> 10; >>>>>>>>>> >>>>>>>>>> Produces >>>>>>>>>> +------------+ >>>>>>>>>> | EXPR$0 | >>>>>>>>>> +------------+ >>>>>>>>>> | null | >>>>>>>>>> | SportsNews | >>>>>>>>>> | null | >>>>>>>>>> | SportsNews | >>>>>>>>>> | null | >>>>>>>>>> | null | >>>>>>>>>> | null | >>>>>>>>>> | null | >>>>>>>>>> | null | >>>>>>>>>> | CARvsSEA | >>>>>>>>>> +——————+ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> And >>>>>>>>>> >>>>>>>>>> select t.entities.hashtags[0] from `twitter.json` t limit 10; >>>>>>>>>> >>>>>>>>>> +------------+ >>>>>>>>>> | EXPR$0 | >>>>>>>>>> +------------+ >>>>>>>>>> | {"indices":[]} | >>>>>>>>>> | {"text":"SportsNews","indices":[90,99]} | >>>>>>>>>> | {"indices":[]} | >>>>>>>>>> | {"text":"SportsNews","indices":[90,99]} | >>>>>>>>>> | {"indices":[]} | >>>>>>>>>> | {"indices":[]} | >>>>>>>>>> | {"indices":[]} | >>>>>>>>>> | {"indices":[]} | >>>>>>>>>> | {"indices":[]} | >>>>>>>>>> | {"text":"CARvsSEA","indices":[90,99]} | >>>>>>>>>> +——————+ >>>>>>>>>> >>>>>>>>>> Strange part is that there is no indices map in the hashtags >>> array, >>>> so >>>>>>>>> no idea why it shows up when pointing to the first lament in an >>> empty >>>>>>> array. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> —Andries >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Jan 21, 2015, at 3:24 PM, Hao Zhu <h...@maprtech.com> wrote: >>>>>>>>>>> >>>>>>>>>>> I just noticed that the result of "hashtags" is just an array >>> with >>>>>>>>> only 1 >>>>>>>>>>> element. >>>>>>>>>>> So take your example: >>>>>>>>>>> [root@maprdemo tmp]# cat d.json >>>>>>>>>>> { >>>>>>>>>>> "entities": { >>>>>>>>>>> "trends": [], >>>>>>>>>>> "symbols": [], >>>>>>>>>>> "urls": [], >>>>>>>>>>> "hashtags": [], >>>>>>>>>>> "user_mentions": [] >>>>>>>>>>> }, >>>>>>>>>>> "entities": { >>>>>>>>>>> "trends": [1,2,3], >>>>>>>>>>> "symbols": [4,5,6], >>>>>>>>>>> "urls": [7,8,9], >>>>>>>>>>> "hashtags": [ >>>>>>>>>>> { >>>>>>>>>>> "text": "GoPatriots", >>>>>>>>>>> "indices": [] >>>>>>>>>>> } >>>>>>>>>>> ], >>>>>>>>>>> "user_mentions": [] >>>>>>>>>>> } >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> Now we can do this to achieve the results: >>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` >>> t >>>> ; >>>>>>>>>>> +------------+ >>>>>>>>>>> | EXPR$0 | >>>>>>>>>>> +------------+ >>>>>>>>>>> | [{"text":"GoPatriots"}] | >>>>>>>>>>> +------------+ >>>>>>>>>>> 1 row selected (0.09 seconds) >>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags[0].text from >>>>>>>>> dfs.tmp.`d.json` t ; >>>>>>>>>>> +------------+ >>>>>>>>>>> | EXPR$0 | >>>>>>>>>>> +------------+ >>>>>>>>>>> | GoPatriots | >>>>>>>>>>> +------------+ >>>>>>>>>>> 1 row selected (0.108 seconds) >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Hao >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht < >>>>>>>>>>> aengelbre...@maprtech.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> When I run the query on a larger dataset it actually show the >>>> empty >>>>>>>>>>>> records. >>>>>>>>>>>> >>>>>>>>>>>> select t.entities.hashtags from `twitter.json` t limit 10; >>>>>>>>>>>> >>>>>>>>>>>> +------------+ >>>>>>>>>>>> | EXPR$0 | >>>>>>>>>>>> +------------+ >>>>>>>>>>>> | [] | >>>>>>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] | >>>>>>>>>>>> | [] | >>>>>>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] | >>>>>>>>>>>> | [] | >>>>>>>>>>>> | [] | >>>>>>>>>>>> | [] | >>>>>>>>>>>> | [] | >>>>>>>>>>>> | [] | >>>>>>>>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] | >>>>>>>>>>>> +------------+ >>>>>>>>>>>> 10 rows selected (2.899 seconds) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> However having the output as maps is not very useful, unless i >>> can >>>>>>>>> filter >>>>>>>>>>>> out the records with empty arrays and then drill deeper into the >>>> ones >>>>>>>>> with >>>>>>>>>>>> data in the arrays. >>>>>>>>>>>> >>>>>>>>>>>> BTW: Hao I would have expiated your query to return both rows, >>> one >>>>>>>>> with an >>>>>>>>>>>> empty array as above and the other with the array data. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> —Andries >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Jan 21, 2015, at 2:56 PM, Hao Zhu <h...@maprtech.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I am not sure if below is expected behavior. >>>>>>>>>>>>> If we only select "hashtags", and it will return only 1 row >>>> ignoring >>>>>>>>> the >>>>>>>>>>>>> null value. >>>>>>>>>>>>> However then if we try to get "hashtags.text", it fails...which >>>>>>>>> means it >>>>>>>>>>>> is >>>>>>>>>>>>> still trying to read the NULL value. >>>>>>>>>>>>> I am thinking it may confuse the SQL developers. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from >>> dfs.tmp.`d.json` >>>> t ; >>>>>>>>>>>>> +------------+ >>>>>>>>>>>>> | EXPR$0 | >>>>>>>>>>>>> +------------+ >>>>>>>>>>>>> | [{"text":"GoPatriots"}] | >>>>>>>>>>>>> +------------+ >>>>>>>>>>>>> 1 row selected (0.109 seconds) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags.text from >>>>>>>>> dfs.tmp.`d.json` t ; >>>>>>>>>>>>> >>>>>>>>>>>>> Query failed: Query failed: Failure while running fragment., >>>>>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot >>> be >>>>>>>>> cast to >>>>>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [ >>>>>>>>>>>>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ] >>>>>>>>>>>>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ] >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Error: exception while executing query: Failure while executing >>>>>>>>> query. >>>>>>>>>>>>> (state=,code=0) >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Hao >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht < >>>>>>>>>>>>> aengelbre...@maprtech.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Now try on hashtags with the following: >>>>>>>>>>>>>> >>>>>>>>>>>>>> drilldemo:5181> select t.entities.hashtags.`text` from >>>>>>>>> `/twitter.json` t >>>>>>>>>>>>>> where t.entities.hashtags is not null limit 10; >>>>>>>>>>>>>> >>>>>>>>>>>>>> Query failed: Query failed: Failure while running fragment., >>>>>>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot >>> be >>>>>>>>> cast to >>>>>>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [ >>>>>>>>>>>>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ] >>>>>>>>>>>>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ] >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Error: exception while executing query: Failure while >>> executing >>>>>>>>> query. >>>>>>>>>>>>>> (state=,code=0) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> { >>>>>>>>>>>>>> "entities": { >>>>>>>>>>>>>> "trends": [], >>>>>>>>>>>>>> "symbols": [], >>>>>>>>>>>>>> "urls": [], >>>>>>>>>>>>>> "hashtags": [], >>>>>>>>>>>>>> "user_mentions": [] >>>>>>>>>>>>>> }, >>>>>>>>>>>>>> "entities": { >>>>>>>>>>>>>> "trends": [1,2,3], >>>>>>>>>>>>>> "symbols": [4,5,6], >>>>>>>>>>>>>> "urls": [7,8,9], >>>>>>>>>>>>>> "hashtags": [ >>>>>>>>>>>>>> { >>>>>>>>>>>>>> "text": "GoPatriots", >>>>>>>>>>>>>> "indices": [] >>>>>>>>>>>>>> } >>>>>>>>>>>>>> ], >>>>>>>>>>>>>> "user_mentions": [] >>>>>>>>>>>>>> } >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> The issue seems to be that if some records have arrays with >>>> maps in >>>>>>>>> them >>>>>>>>>>>>>> and others are empty. >>>>>>>>>>>>>> >>>>>>>>>>>>>> —Andries >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <h...@maprtech.com> >>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Seems it works for below json file: >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> "entities": { >>>>>>>>>>>>>>> "trends": [], >>>>>>>>>>>>>>> "symbols": [], >>>>>>>>>>>>>>> "urls": [], >>>>>>>>>>>>>>> "hashtags": [ >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> "text": "GoPatriots", >>>>>>>>>>>>>>> "indices": [ >>>>>>>>>>>>>>> 83, >>>>>>>>>>>>>>> 94 >>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>> "user_mentions": [] >>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>> "entities": { >>>>>>>>>>>>>>> "trends": [1,2,3], >>>>>>>>>>>>>>> "symbols": [4,5,6], >>>>>>>>>>>>>>> "urls": [7,8,9], >>>>>>>>>>>>>>> "hashtags": [ >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> "text": "GoPatriots", >>>>>>>>>>>>>>> "indices": [] >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>> "user_mentions": [] >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` >>>> as t >>>>>>>>> where >>>>>>>>>>>>>>> t.entities.urls is not null; >>>>>>>>>>>>>>> +------------+ >>>>>>>>>>>>>>> | EXPR$0 | >>>>>>>>>>>>>>> +------------+ >>>>>>>>>>>>>>> | [7,8,9] | >>>>>>>>>>>>>>> +------------+ >>>>>>>>>>>>>>> 1 row selected (0.139 seconds) >>>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` >>>> as t >>>>>>>>> where >>>>>>>>>>>>>>> t.entities.urls is null; >>>>>>>>>>>>>>> +------------+ >>>>>>>>>>>>>>> | EXPR$0 | >>>>>>>>>>>>>>> +------------+ >>>>>>>>>>>>>>> +------------+ >>>>>>>>>>>>>>> No rows selected (0.158 seconds) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Hao >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya < >>>> adityakish...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I believe that this works if the array contains homogeneous >>>>>>>>> primitive >>>>>>>>>>>>>>>> types. In your example, it appears from the error, the array >>>>>>> field >>>>>>>>>>>>>> 'member' >>>>>>>>>>>>>>>> contained maps for at least one record. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta < >>>>>>>>> cma...@mapr.com> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Trying that locally did not work for me (drill 0.7.0): >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` from >>>>>>>>>>>>>>>> `Downloads/test.json` where repeated_count(`members`) > 0; >>>>>>>>>>>>>>>>> Query failed: Query stopped., Failure while trying to >>>>>>> materialize >>>>>>>>>>>>>>>> incoming schema. Errors: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Error in expression at index -1. Error: Missing function >>>>>>>>>>>>>>>> implementation: [repeated_count(MAP-REPEATED)]. Full >>>> expression: >>>>>>>>>>>>>> --UNKNOWN >>>>>>>>>>>>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on >>>>>>>>>>>>>> bullseye-3:31010 ] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Error: exception while executing query: Failure while >>>> executing >>>>>>>>>>>> query. >>>>>>>>>>>>>>>> (state=,code=0) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Chris Matta >>>>>>>>>>>>>>>>> cma...@mapr.com >>>>>>>>>>>>>>>>> 215-701-3146 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya < >>>>>>> adityakish...@gmail.com >>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> repeated_count('entities.urls') > 0 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht < >>>>>>>>>>>>>>>>>> aengelbre...@maprtech.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> How do you filter out records with an empty array in >>> drill? >>>>>>>>>>>>>>>>>>> i.e some records have "url":[] and some will have an >>> array >>>>>>>>> with >>>>>>>>>>>> data >>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>> it. When trying to read records with data in the array >>>> drill >>>>>>>>> fails >>>>>>>>>>>>>> due >>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> records missing any data in the array. Trying a filter >>>> with/* >>>>>>>>> where >>>>>>>>>>>>>>>>>>> "url":[0] is not null */ fails, also fails if applying >>> url >>>> is >>>>>>>>> not >>>>>>>>>>>>>>>> null. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Note some of the arrays contains maps, using twitter data >>>> as >>>>>>> an >>>>>>>>>>>>>>>> example >>>>>>>>>>>>>>>>>>> below. Some records have an empty array with >>> “hashtags”:[] >>>>>>> and >>>>>>>>>>>>>> others >>>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>> look similar to what is listed below. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "entities": { >>>>>>>>>>>>>>>>>>> "trends": [], >>>>>>>>>>>>>>>>>>> "symbols": [], >>>>>>>>>>>>>>>>>>> "urls": [], >>>>>>>>>>>>>>>>>>> "hashtags": [ >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> "text": "GoPatriots", >>>>>>>>>>>>>>>>>>> "indices": [ >>>>>>>>>>>>>>>>>>> 83, >>>>>>>>>>>>>>>>>>> 94 >>>>>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>>>>>> "user_mentions": [] >>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>> —Andries >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> <z.json> >