Good catch, seem to work when pointing to a specific map in the array. Strange thing is that I have actually been using that to filter out retweets, by using using a specific map to retweeted_status.id is not null.
Good find though. Also flatten on hashtags works to filter out the empty arrays, but you may not always want to use flatten. When pointing to a specific map Drill seems to consistently produce a null if the map does not exist,which is very handy. --Andries > On Feb 5, 2015, at 7:36 PM, Christopher Matta <cma...@mapr.com> wrote: > > You missed something: > > select t.entities.hashtags from dfs.twitter./nflt where > t.entities.hashtags[0].*text* is not null limit 10; > > Add the .text to the end of the where. > > > Chris Matta > cma...@mapr.com > 215-701-3146 > > On Thu, Feb 5, 2015 at 10:05 PM, Andries Engelbrecht < > aengelbre...@maprtech.com> wrote: > >> Chris >> >> Thanks, but I tried it before without success. >> >> I still get this on Drill 0.7. >> >> select t.entities.hashtags from dfs.twitter.`/nfl` t where >> t.entities.hashtags[0] is not null limit 10; >> >> Error: exception while executing query: Failure while executing query. >> (state=,code=0) >> Query failed: Query failed: Failure while running fragment., Failure while >> trying to materialize incoming schema. Errors: >> >> Error in expression at index 0. Error: Missing function implementation: >> [isnotnull(MAP-REQUIRED)]. Full expression: null.. [ >> 0dab71b6-1860-4739-b752-1458ed29b671 on drilldemo:31010 ] >> [ 0dab71b6-1860-4739-b752-1458ed29b671 on drilldemo:31010 ] >> >> >> Mine seems finicky about even using the first element in the array >> >> select t.entities.hashtags[0] from dfs.twitter.`/nfl` t limit 10; >> >> Query failed: Query failed: Failure while running fragment., index: 4, >> length: 4 (expected: range(0, 4)) [ 62cd2d3a-546e-41c8-a0c5-4da2b81c2ef8 on >> drilldemo:31010 ] >> [ 62cd2d3a-546e-41c8-a0c5-4da2b81c2ef8 on drilldemo:31010 ] >> >> >> However flatten seems to work. (as long as you don’t have thousands of >> small json files) >> >> select flatten(t.entities.hashtags) from dfs.twitter.`/nfl` t limit 10; >> >> +------------+ >> | EXPR$0 | >> +------------+ >> | {"text":"PantherPride","indices":["12","25"]} | >> | {"text":"KeepPounding","indices":["48","61"]} | >> | {"text":"vegas","indices":["107","113"]} | >> | {"text":"nfl","indices":["23","27"]} | >> | {"text":"Cowboys","indices":["7","15"]} | >> | {"text":"Cowboys","indices":["7","15"]} | >> | {"text":"Ebay","indices":["52","57"]} | >> | {"text":"NFL","indices":["58","62"]} | >> | {"text":"Christmas","indices":["63","73"]} | >> | {"text":"Gift","indices":["74","79"]} | >> +------------+ >> >> >> —Andries >> >> >> >> >>> On Feb 5, 2015, at 6:41 PM, Christopher Matta <cma...@mapr.com> wrote: >>> >>> Andreas, I think I may have solved your issue: >>> >>>> select t.entities.hashtags FROM mfs.cmatta.`tweets/blackfriday` t WHERE >> t.entities.hashtags[0].text is not null limit 10; >>> +------------+ >>> | EXPR$0 | >>> +------------+ >>> | >> [{"text":"Blackfriday","indices":[11,23]},{"text":"facebook","indices":[56,65]},{"text":"christmas","indices":[66,76]}] >>> | >>> | >> [{"text":"BlackFriday","indices":[65,77]},{"text":"ViernesNegro","indices":[105,118]},{"text":"ofertas","indices":[122,130]}] >>> | >>> | >> [{"text":"StarWars","indices":[21,30]},{"text":"BlackFriday","indices":[71,83]}] >>> | >>> | [{"text":"NotOneDime","indices":[14,25]}] | >>> | [{"text":"BlackFriday","indices":[43,55]}] | >>> | [{"text":"JRStudio","indices":[93,102]}] | >>> | >> [{"text":"luv","indices":[38,42]},{"text":"luvmanicures","indices":[43,56]},{"text":"luvpedicures","indices":[57,70]},{"text":"luvroyaloak","indices":[71,83]},{"text":"woodwardave","indices":[84,96]}] >>> | >>> | [{"text":"BlackFriday","indices":[0,12]}] | >>> | >> [{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}] >>> | >>> | >> [{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}] >>> | >>> +------------+ >>> 10 rows selected (1.697 seconds) >>> >>> I noticed when only extracting the first text item in the hashtag array ( >>> t.entities.hashtags[0].text) it was returning null for those tweets that >>> didn’t have hashtags, so filtering out that seems to work. >>> >>> >>> Chris Matta >>> cma...@mapr.com >>> 215-701-3146 >>> >>> On Mon, Jan 26, 2015 at 6:43 PM, Jason Altekruse < >> altekruseja...@gmail.com> >>> wrote: >>> >>>> As Aditya commented before this will work if the lists only contain >> scalars >>>> >>>> repeated_count('entities.urls') > 0 >>>> >>>> If the lists contain maps unfortunately this is not available today. >> There >>>> is an enhancement request open for this feature. I have marked it for a >> fix >>>> in 0.9 as it is more of a feature request than a bug and we are working >> on >>>> closing a large number of bugs for 0.8 before we get to issues like >> this. >>>> >>>> https://issues.apache.org/jira/browse/DRILL-1650 >>>> >>>> -Jason >>>> >>>> On Mon, Jan 26, 2015 at 3:26 PM, Andries Engelbrecht < >>>> aengelbre...@maprtech.com> wrote: >>>> >>>>> Unfortunately it seems that with larger data sets the use of flatten >>>> seems >>>>> to produce an error. >>>>> >>>>> Any other options to filter out JSON records (objects) where an array >> is >>>>> empty? >>>>> >>>>> Thanks >>>>> —Andries >>>>> >>>>> >>>>> On Jan 21, 2015, at 5:18 PM, Andries Engelbrecht < >>>>> aengelbre...@maprtech.com> wrote: >>>>> >>>>>> It was interesting to see flatten bypass the records with an empty >>>> array. >>>>>> >>>>>> Unfortunately some arrays are much more complex than the example here, >>>>> and it is still useful to have the ability to filter out ones that are >>>>> empty. >>>>>> >>>>>> —Andries >>>>>> >>>>>>> On Jan 21, 2015, at 5:04 PM, Hao Zhu <h...@maprtech.com> wrote: >>>>>>> >>>>>>> I figured out the differences after getting the json file from >>>> Andries. >>>>>>> >>>>>>> My json file is like: >>>>>>> {“entities”:{xxx},“entities”:{yyy}... } >>>>>>> >>>>>>> Andries' json file is like: >>>>>>> {“entities”:{xxx}} >>>>>>> {“entities”:{yyy}} >>>>>>> ... >>>>>>> >>>>>>> So basically Andries' json file is contains multiple json files. >>>>>>> >>>>>>> Fortunately, we can use the same SQL to get the same results: >>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z2.json` t; >>>>>>> +------------+ >>>>>>> | EXPR$0 | >>>>>>> +------------+ >>>>>>> | [] | >>>>>>> | [{"text":"GoPatriots"}] | >>>>>>> | [{"text":"aaa"}] | >>>>>>> | [] | >>>>>>> | [{"text":"bbb"}] | >>>>>>> +------------+ >>>>>>> 5 rows selected (0.136 seconds) >>>>>>> 0: jdbc:drill:> with tmp as >>>>>>> . . . . . . . > ( >>>>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from >>>>>>> dfs.tmp.`z2.json` t >>>>>>> . . . . . . . > ) >>>>>>> . . . . . . . > select tmp.c.text from tmp; >>>>>>> +------------+ >>>>>>> | EXPR$0 | >>>>>>> +------------+ >>>>>>> | GoPatriots | >>>>>>> | aaa | >>>>>>> | bbb | >>>>>>> +------------+ >>>>>>> 3 rows selected (0.115 seconds) >>>>>>> >>>>>>> Thanks, >>>>>>> Hao >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jan 21, 2015 at 4:34 PM, Andries Engelbrecht < >>>>>>> aengelbre...@maprtech.com> wrote: >>>>>>> >>>>>>>> In my case it returns the empty records when flatten is not used. >>>>>>>> >>>>>>>> 0: jdbc:drill:zk=drilldemo:5181> select t.entities.hashtags as >>>> hashtags >>>>>>>> from `twitter.json` t limit 10; >>>>>>>> +------------+ >>>>>>>> | hashtags | >>>>>>>> +------------+ >>>>>>>> | [] | >>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] | >>>>>>>> | [] | >>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] | >>>>>>>> | [] | >>>>>>>> | [] | >>>>>>>> | [] | >>>>>>>> | [] | >>>>>>>> | [] | >>>>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] | >>>>>>>> +------------+ >>>>>>>> >>>>>>>>> On Jan 21, 2015, at 4:26 PM, Hao Zhu <h...@maprtech.com> wrote: >>>>>>>>> >>>>>>>>> Actually not due to flatten, if you directly query the file, it >> will >>>>> only >>>>>>>>> show the non-null values. >>>>>>>>> >>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z.json` t; >>>>>>>>> +------------+ >>>>>>>>> | EXPR$0 | >>>>>>>>> +------------+ >>>>>>>>> | [{"text":"GoPatriots"},{"text":"aaa"},{"text":"bbb"}] | >>>>>>>>> +------------+ >>>>>>>>> 1 row selected (0.123 seconds) >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Hao >>>>>>>>> >>>>>>>>> On Wed, Jan 21, 2015 at 4:18 PM, Andries Engelbrecht < >>>>>>>>> aengelbre...@maprtech.com> wrote: >>>>>>>>> >>>>>>>>>> Very interesting, flatten seems to bypass empty records. Not sure >>>> if >>>>>>>> that >>>>>>>>>> is an ideal result for all use cases, but certainly usable in this >>>>> case. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> —Andries >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Jan 21, 2015, at 3:45 PM, Hao Zhu <h...@maprtech.com> wrote: >>>>>>>>>>> >>>>>>>>>>> I can also fetch non-null values for attached json file which >>>>> contains >>>>>>>> 6 >>>>>>>>>> "entities", 3 of them are null, 3 of them have "text" value. >>>>>>>>>>> >>>>>>>>>>> Could you share your complete "twitter.json"? >>>>>>>>>>> >>>>>>>>>>> 0: jdbc:drill:> with tmp as >>>>>>>>>>> . . . . . . . > ( >>>>>>>>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from >>>>>>>>>> dfs.tmp.`z.json` t >>>>>>>>>>> . . . . . . . > ) >>>>>>>>>>> . . . . . . . > select tmp.c.text from tmp; >>>>>>>>>>> +------------+ >>>>>>>>>>> | EXPR$0 | >>>>>>>>>>> +------------+ >>>>>>>>>>> | GoPatriots | >>>>>>>>>>> | aaa | >>>>>>>>>>> | bbb | >>>>>>>>>>> +------------+ >>>>>>>>>>> 3 rows selected (0.122 seconds) >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Hao >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Wed, Jan 21, 2015 at 3:35 PM, Andries Engelbrecht < >>>>>>>>>>> aengelbre...@maprtech.com> wrote: >>>>>>>>>>> The sample data I posted only has 1 element, but some records >> have >>>>>>>>>> multiple elements in them. >>>>>>>>>>> >>>>>>>>>>> Interestingly enough though >>>>>>>>>>> select t.entities.hashtags[0].`text` from `twitter.json` t limit >>>> 10; >>>>>>>>>>> >>>>>>>>>>> Produces >>>>>>>>>>> +------------+ >>>>>>>>>>> | EXPR$0 | >>>>>>>>>>> +------------+ >>>>>>>>>>> | null | >>>>>>>>>>> | SportsNews | >>>>>>>>>>> | null | >>>>>>>>>>> | SportsNews | >>>>>>>>>>> | null | >>>>>>>>>>> | null | >>>>>>>>>>> | null | >>>>>>>>>>> | null | >>>>>>>>>>> | null | >>>>>>>>>>> | CARvsSEA | >>>>>>>>>>> +——————+ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> And >>>>>>>>>>> >>>>>>>>>>> select t.entities.hashtags[0] from `twitter.json` t limit 10; >>>>>>>>>>> >>>>>>>>>>> +------------+ >>>>>>>>>>> | EXPR$0 | >>>>>>>>>>> +------------+ >>>>>>>>>>> | {"indices":[]} | >>>>>>>>>>> | {"text":"SportsNews","indices":[90,99]} | >>>>>>>>>>> | {"indices":[]} | >>>>>>>>>>> | {"text":"SportsNews","indices":[90,99]} | >>>>>>>>>>> | {"indices":[]} | >>>>>>>>>>> | {"indices":[]} | >>>>>>>>>>> | {"indices":[]} | >>>>>>>>>>> | {"indices":[]} | >>>>>>>>>>> | {"indices":[]} | >>>>>>>>>>> | {"text":"CARvsSEA","indices":[90,99]} | >>>>>>>>>>> +——————+ >>>>>>>>>>> >>>>>>>>>>> Strange part is that there is no indices map in the hashtags >>>> array, >>>>> so >>>>>>>>>> no idea why it shows up when pointing to the first lament in an >>>> empty >>>>>>>> array. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> —Andries >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Jan 21, 2015, at 3:24 PM, Hao Zhu <h...@maprtech.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I just noticed that the result of "hashtags" is just an array >>>> with >>>>>>>>>> only 1 >>>>>>>>>>>> element. >>>>>>>>>>>> So take your example: >>>>>>>>>>>> [root@maprdemo tmp]# cat d.json >>>>>>>>>>>> { >>>>>>>>>>>> "entities": { >>>>>>>>>>>> "trends": [], >>>>>>>>>>>> "symbols": [], >>>>>>>>>>>> "urls": [], >>>>>>>>>>>> "hashtags": [], >>>>>>>>>>>> "user_mentions": [] >>>>>>>>>>>> }, >>>>>>>>>>>> "entities": { >>>>>>>>>>>> "trends": [1,2,3], >>>>>>>>>>>> "symbols": [4,5,6], >>>>>>>>>>>> "urls": [7,8,9], >>>>>>>>>>>> "hashtags": [ >>>>>>>>>>>> { >>>>>>>>>>>> "text": "GoPatriots", >>>>>>>>>>>> "indices": [] >>>>>>>>>>>> } >>>>>>>>>>>> ], >>>>>>>>>>>> "user_mentions": [] >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> Now we can do this to achieve the results: >>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` >>>> t >>>>> ; >>>>>>>>>>>> +------------+ >>>>>>>>>>>> | EXPR$0 | >>>>>>>>>>>> +------------+ >>>>>>>>>>>> | [{"text":"GoPatriots"}] | >>>>>>>>>>>> +------------+ >>>>>>>>>>>> 1 row selected (0.09 seconds) >>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags[0].text from >>>>>>>>>> dfs.tmp.`d.json` t ; >>>>>>>>>>>> +------------+ >>>>>>>>>>>> | EXPR$0 | >>>>>>>>>>>> +------------+ >>>>>>>>>>>> | GoPatriots | >>>>>>>>>>>> +------------+ >>>>>>>>>>>> 1 row selected (0.108 seconds) >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Hao >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht < >>>>>>>>>>>> aengelbre...@maprtech.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> When I run the query on a larger dataset it actually show the >>>>> empty >>>>>>>>>>>>> records. >>>>>>>>>>>>> >>>>>>>>>>>>> select t.entities.hashtags from `twitter.json` t limit 10; >>>>>>>>>>>>> >>>>>>>>>>>>> +------------+ >>>>>>>>>>>>> | EXPR$0 | >>>>>>>>>>>>> +------------+ >>>>>>>>>>>>> | [] | >>>>>>>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] | >>>>>>>>>>>>> | [] | >>>>>>>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] | >>>>>>>>>>>>> | [] | >>>>>>>>>>>>> | [] | >>>>>>>>>>>>> | [] | >>>>>>>>>>>>> | [] | >>>>>>>>>>>>> | [] | >>>>>>>>>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] | >>>>>>>>>>>>> +------------+ >>>>>>>>>>>>> 10 rows selected (2.899 seconds) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> However having the output as maps is not very useful, unless i >>>> can >>>>>>>>>> filter >>>>>>>>>>>>> out the records with empty arrays and then drill deeper into >> the >>>>> ones >>>>>>>>>> with >>>>>>>>>>>>> data in the arrays. >>>>>>>>>>>>> >>>>>>>>>>>>> BTW: Hao I would have expiated your query to return both rows, >>>> one >>>>>>>>>> with an >>>>>>>>>>>>> empty array as above and the other with the array data. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> —Andries >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Jan 21, 2015, at 2:56 PM, Hao Zhu <h...@maprtech.com> >> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I am not sure if below is expected behavior. >>>>>>>>>>>>>> If we only select "hashtags", and it will return only 1 row >>>>> ignoring >>>>>>>>>> the >>>>>>>>>>>>>> null value. >>>>>>>>>>>>>> However then if we try to get "hashtags.text", it >> fails...which >>>>>>>>>> means it >>>>>>>>>>>>> is >>>>>>>>>>>>>> still trying to read the NULL value. >>>>>>>>>>>>>> I am thinking it may confuse the SQL developers. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from >>>> dfs.tmp.`d.json` >>>>> t ; >>>>>>>>>>>>>> +------------+ >>>>>>>>>>>>>> | EXPR$0 | >>>>>>>>>>>>>> +------------+ >>>>>>>>>>>>>> | [{"text":"GoPatriots"}] | >>>>>>>>>>>>>> +------------+ >>>>>>>>>>>>>> 1 row selected (0.109 seconds) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags.text from >>>>>>>>>> dfs.tmp.`d.json` t ; >>>>>>>>>>>>>> >>>>>>>>>>>>>> Query failed: Query failed: Failure while running fragment., >>>>>>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot >>>> be >>>>>>>>>> cast to >>>>>>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [ >>>>>>>>>>>>>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ] >>>>>>>>>>>>>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ] >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Error: exception while executing query: Failure while >> executing >>>>>>>>>> query. >>>>>>>>>>>>>> (state=,code=0) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Hao >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht < >>>>>>>>>>>>>> aengelbre...@maprtech.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Now try on hashtags with the following: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> drilldemo:5181> select t.entities.hashtags.`text` from >>>>>>>>>> `/twitter.json` t >>>>>>>>>>>>>>> where t.entities.hashtags is not null limit 10; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Query failed: Query failed: Failure while running fragment., >>>>>>>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot >>>> be >>>>>>>>>> cast to >>>>>>>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [ >>>>>>>>>>>>>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ] >>>>>>>>>>>>>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Error: exception while executing query: Failure while >>>> executing >>>>>>>>>> query. >>>>>>>>>>>>>>> (state=,code=0) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> "entities": { >>>>>>>>>>>>>>> "trends": [], >>>>>>>>>>>>>>> "symbols": [], >>>>>>>>>>>>>>> "urls": [], >>>>>>>>>>>>>>> "hashtags": [], >>>>>>>>>>>>>>> "user_mentions": [] >>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>> "entities": { >>>>>>>>>>>>>>> "trends": [1,2,3], >>>>>>>>>>>>>>> "symbols": [4,5,6], >>>>>>>>>>>>>>> "urls": [7,8,9], >>>>>>>>>>>>>>> "hashtags": [ >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> "text": "GoPatriots", >>>>>>>>>>>>>>> "indices": [] >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>> "user_mentions": [] >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The issue seems to be that if some records have arrays with >>>>> maps in >>>>>>>>>> them >>>>>>>>>>>>>>> and others are empty. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> —Andries >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <h...@maprtech.com> >>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Seems it works for below json file: >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> "entities": { >>>>>>>>>>>>>>>> "trends": [], >>>>>>>>>>>>>>>> "symbols": [], >>>>>>>>>>>>>>>> "urls": [], >>>>>>>>>>>>>>>> "hashtags": [ >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> "text": "GoPatriots", >>>>>>>>>>>>>>>> "indices": [ >>>>>>>>>>>>>>>> 83, >>>>>>>>>>>>>>>> 94 >>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>>> "user_mentions": [] >>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>> "entities": { >>>>>>>>>>>>>>>> "trends": [1,2,3], >>>>>>>>>>>>>>>> "symbols": [4,5,6], >>>>>>>>>>>>>>>> "urls": [7,8,9], >>>>>>>>>>>>>>>> "hashtags": [ >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> "text": "GoPatriots", >>>>>>>>>>>>>>>> "indices": [] >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>>> "user_mentions": [] >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` >>>>> as t >>>>>>>>>> where >>>>>>>>>>>>>>>> t.entities.urls is not null; >>>>>>>>>>>>>>>> +------------+ >>>>>>>>>>>>>>>> | EXPR$0 | >>>>>>>>>>>>>>>> +------------+ >>>>>>>>>>>>>>>> | [7,8,9] | >>>>>>>>>>>>>>>> +------------+ >>>>>>>>>>>>>>>> 1 row selected (0.139 seconds) >>>>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` >>>>> as t >>>>>>>>>> where >>>>>>>>>>>>>>>> t.entities.urls is null; >>>>>>>>>>>>>>>> +------------+ >>>>>>>>>>>>>>>> | EXPR$0 | >>>>>>>>>>>>>>>> +------------+ >>>>>>>>>>>>>>>> +------------+ >>>>>>>>>>>>>>>> No rows selected (0.158 seconds) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Hao >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya < >>>>> adityakish...@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I believe that this works if the array contains homogeneous >>>>>>>>>> primitive >>>>>>>>>>>>>>>>> types. In your example, it appears from the error, the >> array >>>>>>>> field >>>>>>>>>>>>>>> 'member' >>>>>>>>>>>>>>>>> contained maps for at least one record. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta < >>>>>>>>>> cma...@mapr.com> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Trying that locally did not work for me (drill 0.7.0): >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` >> from >>>>>>>>>>>>>>>>> `Downloads/test.json` where repeated_count(`members`) > 0; >>>>>>>>>>>>>>>>>> Query failed: Query stopped., Failure while trying to >>>>>>>> materialize >>>>>>>>>>>>>>>>> incoming schema. Errors: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Error in expression at index -1. Error: Missing function >>>>>>>>>>>>>>>>> implementation: [repeated_count(MAP-REPEATED)]. Full >>>>> expression: >>>>>>>>>>>>>>> --UNKNOWN >>>>>>>>>>>>>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on >>>>>>>>>>>>>>> bullseye-3:31010 ] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Error: exception while executing query: Failure while >>>>> executing >>>>>>>>>>>>> query. >>>>>>>>>>>>>>>>> (state=,code=0) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Chris Matta >>>>>>>>>>>>>>>>>> cma...@mapr.com >>>>>>>>>>>>>>>>>> 215-701-3146 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya < >>>>>>>> adityakish...@gmail.com >>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> repeated_count('entities.urls') > 0 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht < >>>>>>>>>>>>>>>>>>> aengelbre...@maprtech.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> How do you filter out records with an empty array in >>>> drill? >>>>>>>>>>>>>>>>>>>> i.e some records have "url":[] and some will have an >>>> array >>>>>>>>>> with >>>>>>>>>>>>> data >>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>> it. When trying to read records with data in the array >>>>> drill >>>>>>>>>> fails >>>>>>>>>>>>>>> due >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>> records missing any data in the array. Trying a filter >>>>> with/* >>>>>>>>>> where >>>>>>>>>>>>>>>>>>>> "url":[0] is not null */ fails, also fails if applying >>>> url >>>>> is >>>>>>>>>> not >>>>>>>>>>>>>>>>> null. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Note some of the arrays contains maps, using twitter >> data >>>>> as >>>>>>>> an >>>>>>>>>>>>>>>>> example >>>>>>>>>>>>>>>>>>>> below. Some records have an empty array with >>>> “hashtags”:[] >>>>>>>> and >>>>>>>>>>>>>>> others >>>>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>>> look similar to what is listed below. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> "entities": { >>>>>>>>>>>>>>>>>>>> "trends": [], >>>>>>>>>>>>>>>>>>>> "symbols": [], >>>>>>>>>>>>>>>>>>>> "urls": [], >>>>>>>>>>>>>>>>>>>> "hashtags": [ >>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>> "text": "GoPatriots", >>>>>>>>>>>>>>>>>>>> "indices": [ >>>>>>>>>>>>>>>>>>>> 83, >>>>>>>>>>>>>>>>>>>> 94 >>>>>>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>>>>>>> "user_mentions": [] >>>>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>>> —Andries >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> <z.json> >> >>