You can’t use repeated_count function if the list contains maps, only scalars. This was pointed out by Jason (and Aditya) earlier in this thread:
As Aditya commented before this will work if the lists only contain scalars repeated_count(‘entities.urls’) > 0 If the lists contain maps unfortunately this is not available today. There is an enhancement request open for this feature. I have marked it for a fix in 0.9 as it is more of a feature request than a bug and we are working on closing a large number of bugs for 0.8 before we get to issues like this. https://issues.apache.org/jira/browse/DRILL-1650 Chris Matta cma...@mapr.com 215-701-3146 On Thu, Feb 5, 2015 at 10:36 PM, Sudhakar Thota <sth...@maprtech.com> wrote: > You can use repeated_count function to eliminate the null arrays. > > Sudhakar Thota > Sent from my iPhone > > > On Feb 5, 2015, at 7:05 PM, Andries Engelbrecht < > aengelbre...@maprtech.com> wrote: > > > > Chris > > > > Thanks, but I tried it before without success. > > > > I still get this on Drill 0.7. > > > > select t.entities.hashtags from dfs.twitter.`/nfl` t where > t.entities.hashtags[0] is not null limit 10; > > > > Error: exception while executing query: Failure while executing query. > (state=,code=0) > > Query failed: Query failed: Failure while running fragment., Failure > while trying to materialize incoming schema. Errors: > > > > Error in expression at index 0. Error: Missing function implementation: > [isnotnull(MAP-REQUIRED)]. Full expression: null.. [ > 0dab71b6-1860-4739-b752-1458ed29b671 on drilldemo:31010 ] > > [ 0dab71b6-1860-4739-b752-1458ed29b671 on drilldemo:31010 ] > > > > > > Mine seems finicky about even using the first element in the array > > > > select t.entities.hashtags[0] from dfs.twitter.`/nfl` t limit 10; > > > > Query failed: Query failed: Failure while running fragment., index: 4, > length: 4 (expected: range(0, 4)) [ 62cd2d3a-546e-41c8-a0c5-4da2b81c2ef8 on > drilldemo:31010 ] > > [ 62cd2d3a-546e-41c8-a0c5-4da2b81c2ef8 on drilldemo:31010 ] > > > > > > However flatten seems to work. (as long as you don’t have thousands of > small json files) > > > > select flatten(t.entities.hashtags) from dfs.twitter.`/nfl` t limit 10; > > > > +------------+ > > | EXPR$0 | > > +------------+ > > | {"text":"PantherPride","indices":["12","25"]} | > > | {"text":"KeepPounding","indices":["48","61"]} | > > | {"text":"vegas","indices":["107","113"]} | > > | {"text":"nfl","indices":["23","27"]} | > > | {"text":"Cowboys","indices":["7","15"]} | > > | {"text":"Cowboys","indices":["7","15"]} | > > | {"text":"Ebay","indices":["52","57"]} | > > | {"text":"NFL","indices":["58","62"]} | > > | {"text":"Christmas","indices":["63","73"]} | > > | {"text":"Gift","indices":["74","79"]} | > > +------------+ > > > > > > —Andries > > > > > > > > > >> On Feb 5, 2015, at 6:41 PM, Christopher Matta <cma...@mapr.com> wrote: > >> > >> Andreas, I think I may have solved your issue: > >> > >>> select t.entities.hashtags FROM mfs.cmatta.`tweets/blackfriday` t > WHERE t.entities.hashtags[0].text is not null limit 10; > >> +------------+ > >> | EXPR$0 | > >> +------------+ > >> | > [{"text":"Blackfriday","indices":[11,23]},{"text":"facebook","indices":[56,65]},{"text":"christmas","indices":[66,76]}] > >> | > >> | > [{"text":"BlackFriday","indices":[65,77]},{"text":"ViernesNegro","indices":[105,118]},{"text":"ofertas","indices":[122,130]}] > >> | > >> | > [{"text":"StarWars","indices":[21,30]},{"text":"BlackFriday","indices":[71,83]}] > >> | > >> | [{"text":"NotOneDime","indices":[14,25]}] | > >> | [{"text":"BlackFriday","indices":[43,55]}] | > >> | [{"text":"JRStudio","indices":[93,102]}] | > >> | > [{"text":"luv","indices":[38,42]},{"text":"luvmanicures","indices":[43,56]},{"text":"luvpedicures","indices":[57,70]},{"text":"luvroyaloak","indices":[71,83]},{"text":"woodwardave","indices":[84,96]}] > >> | > >> | [{"text":"BlackFriday","indices":[0,12]}] | > >> | > [{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}] > >> | > >> | > [{"text":"BlackFriday","indices":[18,30]},{"text":"SOE","indices":[65,69]},{"text":"DoubleSC","indices":[82,91]}] > >> | > >> +------------+ > >> 10 rows selected (1.697 seconds) > >> > >> I noticed when only extracting the first text item in the hashtag array > ( > >> t.entities.hashtags[0].text) it was returning null for those tweets that > >> didn’t have hashtags, so filtering out that seems to work. > >> > >> > >> Chris Matta > >> cma...@mapr.com > >> 215-701-3146 > >> > >> On Mon, Jan 26, 2015 at 6:43 PM, Jason Altekruse < > altekruseja...@gmail.com> > >> wrote: > >> > >>> As Aditya commented before this will work if the lists only contain > scalars > >>> > >>> repeated_count('entities.urls') > 0 > >>> > >>> If the lists contain maps unfortunately this is not available today. > There > >>> is an enhancement request open for this feature. I have marked it for > a fix > >>> in 0.9 as it is more of a feature request than a bug and we are > working on > >>> closing a large number of bugs for 0.8 before we get to issues like > this. > >>> > >>> https://issues.apache.org/jira/browse/DRILL-1650 > >>> > >>> -Jason > >>> > >>> On Mon, Jan 26, 2015 at 3:26 PM, Andries Engelbrecht < > >>> aengelbre...@maprtech.com> wrote: > >>> > >>>> Unfortunately it seems that with larger data sets the use of flatten > >>> seems > >>>> to produce an error. > >>>> > >>>> Any other options to filter out JSON records (objects) where an array > is > >>>> empty? > >>>> > >>>> Thanks > >>>> —Andries > >>>> > >>>> > >>>> On Jan 21, 2015, at 5:18 PM, Andries Engelbrecht < > >>>> aengelbre...@maprtech.com> wrote: > >>>> > >>>>> It was interesting to see flatten bypass the records with an empty > >>> array. > >>>>> > >>>>> Unfortunately some arrays are much more complex than the example > here, > >>>> and it is still useful to have the ability to filter out ones that are > >>>> empty. > >>>>> > >>>>> —Andries > >>>>> > >>>>>> On Jan 21, 2015, at 5:04 PM, Hao Zhu <h...@maprtech.com> wrote: > >>>>>> > >>>>>> I figured out the differences after getting the json file from > >>> Andries. > >>>>>> > >>>>>> My json file is like: > >>>>>> {“entities”:{xxx},“entities”:{yyy}... } > >>>>>> > >>>>>> Andries' json file is like: > >>>>>> {“entities”:{xxx}} > >>>>>> {“entities”:{yyy}} > >>>>>> ... > >>>>>> > >>>>>> So basically Andries' json file is contains multiple json files. > >>>>>> > >>>>>> Fortunately, we can use the same SQL to get the same results: > >>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z2.json` t; > >>>>>> +------------+ > >>>>>> | EXPR$0 | > >>>>>> +------------+ > >>>>>> | [] | > >>>>>> | [{"text":"GoPatriots"}] | > >>>>>> | [{"text":"aaa"}] | > >>>>>> | [] | > >>>>>> | [{"text":"bbb"}] | > >>>>>> +------------+ > >>>>>> 5 rows selected (0.136 seconds) > >>>>>> 0: jdbc:drill:> with tmp as > >>>>>> . . . . . . . > ( > >>>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from > >>>>>> dfs.tmp.`z2.json` t > >>>>>> . . . . . . . > ) > >>>>>> . . . . . . . > select tmp.c.text from tmp; > >>>>>> +------------+ > >>>>>> | EXPR$0 | > >>>>>> +------------+ > >>>>>> | GoPatriots | > >>>>>> | aaa | > >>>>>> | bbb | > >>>>>> +------------+ > >>>>>> 3 rows selected (0.115 seconds) > >>>>>> > >>>>>> Thanks, > >>>>>> Hao > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Wed, Jan 21, 2015 at 4:34 PM, Andries Engelbrecht < > >>>>>> aengelbre...@maprtech.com> wrote: > >>>>>> > >>>>>>> In my case it returns the empty records when flatten is not used. > >>>>>>> > >>>>>>> 0: jdbc:drill:zk=drilldemo:5181> select t.entities.hashtags as > >>> hashtags > >>>>>>> from `twitter.json` t limit 10; > >>>>>>> +------------+ > >>>>>>> | hashtags | > >>>>>>> +------------+ > >>>>>>> | [] | > >>>>>>> | [{"text":"SportsNews","indices":[0,11]}] | > >>>>>>> | [] | > >>>>>>> | [{"text":"SportsNews","indices":[0,11]}] | > >>>>>>> | [] | > >>>>>>> | [] | > >>>>>>> | [] | > >>>>>>> | [] | > >>>>>>> | [] | > >>>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] | > >>>>>>> +------------+ > >>>>>>> > >>>>>>>> On Jan 21, 2015, at 4:26 PM, Hao Zhu <h...@maprtech.com> wrote: > >>>>>>>> > >>>>>>>> Actually not due to flatten, if you directly query the file, it > will > >>>> only > >>>>>>>> show the non-null values. > >>>>>>>> > >>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z.json` > t; > >>>>>>>> +------------+ > >>>>>>>> | EXPR$0 | > >>>>>>>> +------------+ > >>>>>>>> | [{"text":"GoPatriots"},{"text":"aaa"},{"text":"bbb"}] | > >>>>>>>> +------------+ > >>>>>>>> 1 row selected (0.123 seconds) > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Hao > >>>>>>>> > >>>>>>>> On Wed, Jan 21, 2015 at 4:18 PM, Andries Engelbrecht < > >>>>>>>> aengelbre...@maprtech.com> wrote: > >>>>>>>> > >>>>>>>>> Very interesting, flatten seems to bypass empty records. Not sure > >>> if > >>>>>>> that > >>>>>>>>> is an ideal result for all use cases, but certainly usable in > this > >>>> case. > >>>>>>>>> > >>>>>>>>> Thanks > >>>>>>>>> —Andries > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> On Jan 21, 2015, at 3:45 PM, Hao Zhu <h...@maprtech.com> wrote: > >>>>>>>>>> > >>>>>>>>>> I can also fetch non-null values for attached json file which > >>>> contains > >>>>>>> 6 > >>>>>>>>> "entities", 3 of them are null, 3 of them have "text" value. > >>>>>>>>>> > >>>>>>>>>> Could you share your complete "twitter.json"? > >>>>>>>>>> > >>>>>>>>>> 0: jdbc:drill:> with tmp as > >>>>>>>>>> . . . . . . . > ( > >>>>>>>>>> . . . . . . . > select flatten(t.entities.hashtags) as c from > >>>>>>>>> dfs.tmp.`z.json` t > >>>>>>>>>> . . . . . . . > ) > >>>>>>>>>> . . . . . . . > select tmp.c.text from tmp; > >>>>>>>>>> +------------+ > >>>>>>>>>> | EXPR$0 | > >>>>>>>>>> +------------+ > >>>>>>>>>> | GoPatriots | > >>>>>>>>>> | aaa | > >>>>>>>>>> | bbb | > >>>>>>>>>> +------------+ > >>>>>>>>>> 3 rows selected (0.122 seconds) > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> Hao > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> On Wed, Jan 21, 2015 at 3:35 PM, Andries Engelbrecht < > >>>>>>>>>> aengelbre...@maprtech.com> wrote: > >>>>>>>>>> The sample data I posted only has 1 element, but some records > have > >>>>>>>>> multiple elements in them. > >>>>>>>>>> > >>>>>>>>>> Interestingly enough though > >>>>>>>>>> select t.entities.hashtags[0].`text` from `twitter.json` t limit > >>> 10; > >>>>>>>>>> > >>>>>>>>>> Produces > >>>>>>>>>> +------------+ > >>>>>>>>>> | EXPR$0 | > >>>>>>>>>> +------------+ > >>>>>>>>>> | null | > >>>>>>>>>> | SportsNews | > >>>>>>>>>> | null | > >>>>>>>>>> | SportsNews | > >>>>>>>>>> | null | > >>>>>>>>>> | null | > >>>>>>>>>> | null | > >>>>>>>>>> | null | > >>>>>>>>>> | null | > >>>>>>>>>> | CARvsSEA | > >>>>>>>>>> +——————+ > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> And > >>>>>>>>>> > >>>>>>>>>> select t.entities.hashtags[0] from `twitter.json` t limit 10; > >>>>>>>>>> > >>>>>>>>>> +------------+ > >>>>>>>>>> | EXPR$0 | > >>>>>>>>>> +------------+ > >>>>>>>>>> | {"indices":[]} | > >>>>>>>>>> | {"text":"SportsNews","indices":[90,99]} | > >>>>>>>>>> | {"indices":[]} | > >>>>>>>>>> | {"text":"SportsNews","indices":[90,99]} | > >>>>>>>>>> | {"indices":[]} | > >>>>>>>>>> | {"indices":[]} | > >>>>>>>>>> | {"indices":[]} | > >>>>>>>>>> | {"indices":[]} | > >>>>>>>>>> | {"indices":[]} | > >>>>>>>>>> | {"text":"CARvsSEA","indices":[90,99]} | > >>>>>>>>>> +——————+ > >>>>>>>>>> > >>>>>>>>>> Strange part is that there is no indices map in the hashtags > >>> array, > >>>> so > >>>>>>>>> no idea why it shows up when pointing to the first lament in an > >>> empty > >>>>>>> array. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> —Andries > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> On Jan 21, 2015, at 3:24 PM, Hao Zhu <h...@maprtech.com> > wrote: > >>>>>>>>>>> > >>>>>>>>>>> I just noticed that the result of "hashtags" is just an array > >>> with > >>>>>>>>> only 1 > >>>>>>>>>>> element. > >>>>>>>>>>> So take your example: > >>>>>>>>>>> [root@maprdemo tmp]# cat d.json > >>>>>>>>>>> { > >>>>>>>>>>> "entities": { > >>>>>>>>>>> "trends": [], > >>>>>>>>>>> "symbols": [], > >>>>>>>>>>> "urls": [], > >>>>>>>>>>> "hashtags": [], > >>>>>>>>>>> "user_mentions": [] > >>>>>>>>>>> }, > >>>>>>>>>>> "entities": { > >>>>>>>>>>> "trends": [1,2,3], > >>>>>>>>>>> "symbols": [4,5,6], > >>>>>>>>>>> "urls": [7,8,9], > >>>>>>>>>>> "hashtags": [ > >>>>>>>>>>> { > >>>>>>>>>>> "text": "GoPatriots", > >>>>>>>>>>> "indices": [] > >>>>>>>>>>> } > >>>>>>>>>>> ], > >>>>>>>>>>> "user_mentions": [] > >>>>>>>>>>> } > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> Now we can do this to achieve the results: > >>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from > dfs.tmp.`d.json` > >>> t > >>>> ; > >>>>>>>>>>> +------------+ > >>>>>>>>>>> | EXPR$0 | > >>>>>>>>>>> +------------+ > >>>>>>>>>>> | [{"text":"GoPatriots"}] | > >>>>>>>>>>> +------------+ > >>>>>>>>>>> 1 row selected (0.09 seconds) > >>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags[0].text from > >>>>>>>>> dfs.tmp.`d.json` t ; > >>>>>>>>>>> +------------+ > >>>>>>>>>>> | EXPR$0 | > >>>>>>>>>>> +------------+ > >>>>>>>>>>> | GoPatriots | > >>>>>>>>>>> +------------+ > >>>>>>>>>>> 1 row selected (0.108 seconds) > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Hao > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht < > >>>>>>>>>>> aengelbre...@maprtech.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> When I run the query on a larger dataset it actually show the > >>>> empty > >>>>>>>>>>>> records. > >>>>>>>>>>>> > >>>>>>>>>>>> select t.entities.hashtags from `twitter.json` t limit 10; > >>>>>>>>>>>> > >>>>>>>>>>>> +------------+ > >>>>>>>>>>>> | EXPR$0 | > >>>>>>>>>>>> +------------+ > >>>>>>>>>>>> | [] | > >>>>>>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] | > >>>>>>>>>>>> | [] | > >>>>>>>>>>>> | [{"text":"SportsNews","indices":[0,11]}] | > >>>>>>>>>>>> | [] | > >>>>>>>>>>>> | [] | > >>>>>>>>>>>> | [] | > >>>>>>>>>>>> | [] | > >>>>>>>>>>>> | [] | > >>>>>>>>>>>> | [{"text":"CARvsSEA","indices":[36,45]}] | > >>>>>>>>>>>> +------------+ > >>>>>>>>>>>> 10 rows selected (2.899 seconds) > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> However having the output as maps is not very useful, unless i > >>> can > >>>>>>>>> filter > >>>>>>>>>>>> out the records with empty arrays and then drill deeper into > the > >>>> ones > >>>>>>>>> with > >>>>>>>>>>>> data in the arrays. > >>>>>>>>>>>> > >>>>>>>>>>>> BTW: Hao I would have expiated your query to return both rows, > >>> one > >>>>>>>>> with an > >>>>>>>>>>>> empty array as above and the other with the array data. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> —Andries > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> On Jan 21, 2015, at 2:56 PM, Hao Zhu <h...@maprtech.com> > wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> I am not sure if below is expected behavior. > >>>>>>>>>>>>> If we only select "hashtags", and it will return only 1 row > >>>> ignoring > >>>>>>>>> the > >>>>>>>>>>>>> null value. > >>>>>>>>>>>>> However then if we try to get "hashtags.text", it > fails...which > >>>>>>>>> means it > >>>>>>>>>>>> is > >>>>>>>>>>>>> still trying to read the NULL value. > >>>>>>>>>>>>> I am thinking it may confuse the SQL developers. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags from > >>> dfs.tmp.`d.json` > >>>> t ; > >>>>>>>>>>>>> +------------+ > >>>>>>>>>>>>> | EXPR$0 | > >>>>>>>>>>>>> +------------+ > >>>>>>>>>>>>> | [{"text":"GoPatriots"}] | > >>>>>>>>>>>>> +------------+ > >>>>>>>>>>>>> 1 row selected (0.109 seconds) > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.hashtags.text from > >>>>>>>>> dfs.tmp.`d.json` t ; > >>>>>>>>>>>>> > >>>>>>>>>>>>> Query failed: Query failed: Failure while running fragment., > >>>>>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot > >>> be > >>>>>>>>> cast to > >>>>>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [ > >>>>>>>>>>>>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ] > >>>>>>>>>>>>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ] > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Error: exception while executing query: Failure while > executing > >>>>>>>>> query. > >>>>>>>>>>>>> (state=,code=0) > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> Hao > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht < > >>>>>>>>>>>>> aengelbre...@maprtech.com> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Now try on hashtags with the following: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> drilldemo:5181> select t.entities.hashtags.`text` from > >>>>>>>>> `/twitter.json` t > >>>>>>>>>>>>>> where t.entities.hashtags is not null limit 10; > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Query failed: Query failed: Failure while running fragment., > >>>>>>>>>>>>>> org.apache.drill.exec.vector.complex.RepeatedMapVector > cannot > >>> be > >>>>>>>>> cast to > >>>>>>>>>>>>>> org.apache.drill.exec.vector.complex.MapVector [ > >>>>>>>>>>>>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ] > >>>>>>>>>>>>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ] > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Error: exception while executing query: Failure while > >>> executing > >>>>>>>>> query. > >>>>>>>>>>>>>> (state=,code=0) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> { > >>>>>>>>>>>>>> "entities": { > >>>>>>>>>>>>>> "trends": [], > >>>>>>>>>>>>>> "symbols": [], > >>>>>>>>>>>>>> "urls": [], > >>>>>>>>>>>>>> "hashtags": [], > >>>>>>>>>>>>>> "user_mentions": [] > >>>>>>>>>>>>>> }, > >>>>>>>>>>>>>> "entities": { > >>>>>>>>>>>>>> "trends": [1,2,3], > >>>>>>>>>>>>>> "symbols": [4,5,6], > >>>>>>>>>>>>>> "urls": [7,8,9], > >>>>>>>>>>>>>> "hashtags": [ > >>>>>>>>>>>>>> { > >>>>>>>>>>>>>> "text": "GoPatriots", > >>>>>>>>>>>>>> "indices": [] > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> ], > >>>>>>>>>>>>>> "user_mentions": [] > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The issue seems to be that if some records have arrays with > >>>> maps in > >>>>>>>>> them > >>>>>>>>>>>>>> and others are empty. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> —Andries > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <h...@maprtech.com> > >>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Seems it works for below json file: > >>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>> "entities": { > >>>>>>>>>>>>>>> "trends": [], > >>>>>>>>>>>>>>> "symbols": [], > >>>>>>>>>>>>>>> "urls": [], > >>>>>>>>>>>>>>> "hashtags": [ > >>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>> "text": "GoPatriots", > >>>>>>>>>>>>>>> "indices": [ > >>>>>>>>>>>>>>> 83, > >>>>>>>>>>>>>>> 94 > >>>>>>>>>>>>>>> ] > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> ], > >>>>>>>>>>>>>>> "user_mentions": [] > >>>>>>>>>>>>>>> }, > >>>>>>>>>>>>>>> "entities": { > >>>>>>>>>>>>>>> "trends": [1,2,3], > >>>>>>>>>>>>>>> "symbols": [4,5,6], > >>>>>>>>>>>>>>> "urls": [7,8,9], > >>>>>>>>>>>>>>> "hashtags": [ > >>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>> "text": "GoPatriots", > >>>>>>>>>>>>>>> "indices": [] > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> ], > >>>>>>>>>>>>>>> "user_mentions": [] > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from > dfs.tmp.`a.json` > >>>> as t > >>>>>>>>> where > >>>>>>>>>>>>>>> t.entities.urls is not null; > >>>>>>>>>>>>>>> +------------+ > >>>>>>>>>>>>>>> | EXPR$0 | > >>>>>>>>>>>>>>> +------------+ > >>>>>>>>>>>>>>> | [7,8,9] | > >>>>>>>>>>>>>>> +------------+ > >>>>>>>>>>>>>>> 1 row selected (0.139 seconds) > >>>>>>>>>>>>>>> 0: jdbc:drill:> select t.entities.urls from > dfs.tmp.`a.json` > >>>> as t > >>>>>>>>> where > >>>>>>>>>>>>>>> t.entities.urls is null; > >>>>>>>>>>>>>>> +------------+ > >>>>>>>>>>>>>>> | EXPR$0 | > >>>>>>>>>>>>>>> +------------+ > >>>>>>>>>>>>>>> +------------+ > >>>>>>>>>>>>>>> No rows selected (0.158 seconds) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>> Hao > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya < > >>>> adityakish...@gmail.com> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I believe that this works if the array contains > homogeneous > >>>>>>>>> primitive > >>>>>>>>>>>>>>>> types. In your example, it appears from the error, the > array > >>>>>>> field > >>>>>>>>>>>>>> 'member' > >>>>>>>>>>>>>>>> contained maps for at least one record. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta < > >>>>>>>>> cma...@mapr.com> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Trying that locally did not work for me (drill 0.7.0): > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` > from > >>>>>>>>>>>>>>>> `Downloads/test.json` where repeated_count(`members`) > 0; > >>>>>>>>>>>>>>>>> Query failed: Query stopped., Failure while trying to > >>>>>>> materialize > >>>>>>>>>>>>>>>> incoming schema. Errors: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Error in expression at index -1. Error: Missing function > >>>>>>>>>>>>>>>> implementation: [repeated_count(MAP-REPEATED)]. Full > >>>> expression: > >>>>>>>>>>>>>> --UNKNOWN > >>>>>>>>>>>>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on > >>>>>>>>>>>>>> bullseye-3:31010 ] > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Error: exception while executing query: Failure while > >>>> executing > >>>>>>>>>>>> query. > >>>>>>>>>>>>>>>> (state=,code=0) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Chris Matta > >>>>>>>>>>>>>>>>> cma...@mapr.com > >>>>>>>>>>>>>>>>> 215-701-3146 > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya < > >>>>>>> adityakish...@gmail.com > >>>>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> repeated_count('entities.urls') > 0 > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht < > >>>>>>>>>>>>>>>>>> aengelbre...@maprtech.com> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> How do you filter out records with an empty array in > >>> drill? > >>>>>>>>>>>>>>>>>>> i.e some records have "url":[] and some will have an > >>> array > >>>>>>>>> with > >>>>>>>>>>>> data > >>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>>> it. When trying to read records with data in the array > >>>> drill > >>>>>>>>> fails > >>>>>>>>>>>>>> due > >>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>> records missing any data in the array. Trying a filter > >>>> with/* > >>>>>>>>> where > >>>>>>>>>>>>>>>>>>> "url":[0] is not null */ fails, also fails if applying > >>> url > >>>> is > >>>>>>>>> not > >>>>>>>>>>>>>>>> null. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Note some of the arrays contains maps, using twitter > data > >>>> as > >>>>>>> an > >>>>>>>>>>>>>>>> example > >>>>>>>>>>>>>>>>>>> below. Some records have an empty array with > >>> “hashtags”:[] > >>>>>>> and > >>>>>>>>>>>>>> others > >>>>>>>>>>>>>>>>>> will > >>>>>>>>>>>>>>>>>>> look similar to what is listed below. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> "entities": { > >>>>>>>>>>>>>>>>>>> "trends": [], > >>>>>>>>>>>>>>>>>>> "symbols": [], > >>>>>>>>>>>>>>>>>>> "urls": [], > >>>>>>>>>>>>>>>>>>> "hashtags": [ > >>>>>>>>>>>>>>>>>>> { > >>>>>>>>>>>>>>>>>>> "text": "GoPatriots", > >>>>>>>>>>>>>>>>>>> "indices": [ > >>>>>>>>>>>>>>>>>>> 83, > >>>>>>>>>>>>>>>>>>> 94 > >>>>>>>>>>>>>>>>>>> ] > >>>>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>>>> ], > >>>>>>>>>>>>>>>>>>> "user_mentions": [] > >>>>>>>>>>>>>>>>>>> }, > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Thanks > >>>>>>>>>>>>>>>>>>> —Andries > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> <z.json> > > >