Actually not due to flatten, if you directly query the file, it will only show the non-null values.
0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`z.json` t; +------------+ | EXPR$0 | +------------+ | [{"text":"GoPatriots"},{"text":"aaa"},{"text":"bbb"}] | +------------+ 1 row selected (0.123 seconds) Thanks, Hao On Wed, Jan 21, 2015 at 4:18 PM, Andries Engelbrecht < aengelbre...@maprtech.com> wrote: > Very interesting, flatten seems to bypass empty records. Not sure if that > is an ideal result for all use cases, but certainly usable in this case. > > Thanks > —Andries > > > On Jan 21, 2015, at 3:45 PM, Hao Zhu <h...@maprtech.com> wrote: > > > I can also fetch non-null values for attached json file which contains 6 > "entities", 3 of them are null, 3 of them have "text" value. > > > > Could you share your complete "twitter.json"? > > > > 0: jdbc:drill:> with tmp as > > . . . . . . . > ( > > . . . . . . . > select flatten(t.entities.hashtags) as c from > dfs.tmp.`z.json` t > > . . . . . . . > ) > > . . . . . . . > select tmp.c.text from tmp; > > +------------+ > > | EXPR$0 | > > +------------+ > > | GoPatriots | > > | aaa | > > | bbb | > > +------------+ > > 3 rows selected (0.122 seconds) > > > > Thanks, > > Hao > > > > > > > > On Wed, Jan 21, 2015 at 3:35 PM, Andries Engelbrecht < > aengelbre...@maprtech.com> wrote: > > The sample data I posted only has 1 element, but some records have > multiple elements in them. > > > > Interestingly enough though > > select t.entities.hashtags[0].`text` from `twitter.json` t limit 10; > > > > Produces > > +------------+ > > | EXPR$0 | > > +------------+ > > | null | > > | SportsNews | > > | null | > > | SportsNews | > > | null | > > | null | > > | null | > > | null | > > | null | > > | CARvsSEA | > > +——————+ > > > > > > And > > > > select t.entities.hashtags[0] from `twitter.json` t limit 10; > > > > +------------+ > > | EXPR$0 | > > +------------+ > > | {"indices":[]} | > > | {"text":"SportsNews","indices":[90,99]} | > > | {"indices":[]} | > > | {"text":"SportsNews","indices":[90,99]} | > > | {"indices":[]} | > > | {"indices":[]} | > > | {"indices":[]} | > > | {"indices":[]} | > > | {"indices":[]} | > > | {"text":"CARvsSEA","indices":[90,99]} | > > +——————+ > > > > Strange part is that there is no indices map in the hashtags array, so > no idea why it shows up when pointing to the first lament in an empty array. > > > > > > —Andries > > > > > > > > > > > > > > On Jan 21, 2015, at 3:24 PM, Hao Zhu <h...@maprtech.com> wrote: > > > > > I just noticed that the result of "hashtags" is just an array with > only 1 > > > element. > > > So take your example: > > > [root@maprdemo tmp]# cat d.json > > > { > > > "entities": { > > > "trends": [], > > > "symbols": [], > > > "urls": [], > > > "hashtags": [], > > > "user_mentions": [] > > > }, > > > "entities": { > > > "trends": [1,2,3], > > > "symbols": [4,5,6], > > > "urls": [7,8,9], > > > "hashtags": [ > > > { > > > "text": "GoPatriots", > > > "indices": [] > > > } > > > ], > > > "user_mentions": [] > > > } > > > } > > > > > > Now we can do this to achieve the results: > > > 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` t ; > > > +------------+ > > > | EXPR$0 | > > > +------------+ > > > | [{"text":"GoPatriots"}] | > > > +------------+ > > > 1 row selected (0.09 seconds) > > > 0: jdbc:drill:> select t.entities.hashtags[0].text from > dfs.tmp.`d.json` t ; > > > +------------+ > > > | EXPR$0 | > > > +------------+ > > > | GoPatriots | > > > +------------+ > > > 1 row selected (0.108 seconds) > > > > > > Thanks, > > > Hao > > > > > > > > > > > > On Wed, Jan 21, 2015 at 3:01 PM, Andries Engelbrecht < > > > aengelbre...@maprtech.com> wrote: > > > > > >> When I run the query on a larger dataset it actually show the empty > > >> records. > > >> > > >> select t.entities.hashtags from `twitter.json` t limit 10; > > >> > > >> +------------+ > > >> | EXPR$0 | > > >> +------------+ > > >> | [] | > > >> | [{"text":"SportsNews","indices":[0,11]}] | > > >> | [] | > > >> | [{"text":"SportsNews","indices":[0,11]}] | > > >> | [] | > > >> | [] | > > >> | [] | > > >> | [] | > > >> | [] | > > >> | [{"text":"CARvsSEA","indices":[36,45]}] | > > >> +------------+ > > >> 10 rows selected (2.899 seconds) > > >> > > >> > > >> > > >> However having the output as maps is not very useful, unless i can > filter > > >> out the records with empty arrays and then drill deeper into the ones > with > > >> data in the arrays. > > >> > > >> BTW: Hao I would have expiated your query to return both rows, one > with an > > >> empty array as above and the other with the array data. > > >> > > >> > > >> —Andries > > >> > > >> > > >> On Jan 21, 2015, at 2:56 PM, Hao Zhu <h...@maprtech.com> wrote: > > >> > > >>> I am not sure if below is expected behavior. > > >>> If we only select "hashtags", and it will return only 1 row ignoring > the > > >>> null value. > > >>> However then if we try to get "hashtags.text", it fails...which > means it > > >> is > > >>> still trying to read the NULL value. > > >>> I am thinking it may confuse the SQL developers. > > >>> > > >>> > > >>> 0: jdbc:drill:> select t.entities.hashtags from dfs.tmp.`d.json` t ; > > >>> +------------+ > > >>> | EXPR$0 | > > >>> +------------+ > > >>> | [{"text":"GoPatriots"}] | > > >>> +------------+ > > >>> 1 row selected (0.109 seconds) > > >>> > > >>> > > >>> 0: jdbc:drill:> select t.entities.hashtags.text from > dfs.tmp.`d.json` t ; > > >>> > > >>> Query failed: Query failed: Failure while running fragment., > > >>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be > cast to > > >>> org.apache.drill.exec.vector.complex.MapVector [ > > >>> 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ] > > >>> [ 7ab63d4e-8a1d-4e23-8853-a879db7e8a5f on maprdemo:31010 ] > > >>> > > >>> > > >>> Error: exception while executing query: Failure while executing > query. > > >>> (state=,code=0) > > >>> > > >>> Thanks, > > >>> Hao > > >>> > > >>> On Wed, Jan 21, 2015 at 2:43 PM, Andries Engelbrecht < > > >>> aengelbre...@maprtech.com> wrote: > > >>> > > >>>> Now try on hashtags with the following: > > >>>> > > >>>> drilldemo:5181> select t.entities.hashtags.`text` from > `/twitter.json` t > > >>>> where t.entities.hashtags is not null limit 10; > > >>>> > > >>>> Query failed: Query failed: Failure while running fragment., > > >>>> org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be > cast to > > >>>> org.apache.drill.exec.vector.complex.MapVector [ > > >>>> 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ] > > >>>> [ 6fe7f918-d1a7-4fc6-b24d-44ff9186f59e on drilldemo:31010 ] > > >>>> > > >>>> > > >>>> Error: exception while executing query: Failure while executing > query. > > >>>> (state=,code=0) > > >>>> > > >>>> > > >>>> { > > >>>> "entities": { > > >>>> "trends": [], > > >>>> "symbols": [], > > >>>> "urls": [], > > >>>> "hashtags": [], > > >>>> "user_mentions": [] > > >>>> }, > > >>>> "entities": { > > >>>> "trends": [1,2,3], > > >>>> "symbols": [4,5,6], > > >>>> "urls": [7,8,9], > > >>>> "hashtags": [ > > >>>> { > > >>>> "text": "GoPatriots", > > >>>> "indices": [] > > >>>> } > > >>>> ], > > >>>> "user_mentions": [] > > >>>> } > > >>>> } > > >>>> > > >>>> The issue seems to be that if some records have arrays with maps in > them > > >>>> and others are empty. > > >>>> > > >>>> —Andries > > >>>> > > >>>> > > >>>> On Jan 21, 2015, at 2:34 PM, Hao Zhu <h...@maprtech.com> wrote: > > >>>> > > >>>>> Seems it works for below json file: > > >>>>> { > > >>>>> "entities": { > > >>>>> "trends": [], > > >>>>> "symbols": [], > > >>>>> "urls": [], > > >>>>> "hashtags": [ > > >>>>> { > > >>>>> "text": "GoPatriots", > > >>>>> "indices": [ > > >>>>> 83, > > >>>>> 94 > > >>>>> ] > > >>>>> } > > >>>>> ], > > >>>>> "user_mentions": [] > > >>>>> }, > > >>>>> "entities": { > > >>>>> "trends": [1,2,3], > > >>>>> "symbols": [4,5,6], > > >>>>> "urls": [7,8,9], > > >>>>> "hashtags": [ > > >>>>> { > > >>>>> "text": "GoPatriots", > > >>>>> "indices": [] > > >>>>> } > > >>>>> ], > > >>>>> "user_mentions": [] > > >>>>> } > > >>>>> } > > >>>>> > > >>>>> > > >>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as t > where > > >>>>> t.entities.urls is not null; > > >>>>> +------------+ > > >>>>> | EXPR$0 | > > >>>>> +------------+ > > >>>>> | [7,8,9] | > > >>>>> +------------+ > > >>>>> 1 row selected (0.139 seconds) > > >>>>> 0: jdbc:drill:> select t.entities.urls from dfs.tmp.`a.json` as t > where > > >>>>> t.entities.urls is null; > > >>>>> +------------+ > > >>>>> | EXPR$0 | > > >>>>> +------------+ > > >>>>> +------------+ > > >>>>> No rows selected (0.158 seconds) > > >>>>> > > >>>>> Thanks, > > >>>>> Hao > > >>>>> > > >>>>> On Wed, Jan 21, 2015 at 2:01 PM, Aditya <adityakish...@gmail.com> > > >> wrote: > > >>>>> > > >>>>>> I believe that this works if the array contains homogeneous > primitive > > >>>>>> types. In your example, it appears from the error, the array field > > >>>> 'member' > > >>>>>> contained maps for at least one record. > > >>>>>> > > >>>>>> On Wed, Jan 21, 2015 at 1:57 PM, Christopher Matta < > cma...@mapr.com> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> Trying that locally did not work for me (drill 0.7.0): > > >>>>>>> > > >>>>>>> 0: jdbc:drill:zk=local> select `id`, `name`, `members` from > > >>>>>> `Downloads/test.json` where repeated_count(`members`) > 0; > > >>>>>>> Query failed: Query stopped., Failure while trying to materialize > > >>>>>> incoming schema. Errors: > > >>>>>>> > > >>>>>>> Error in expression at index -1. Error: Missing function > > >>>>>> implementation: [repeated_count(MAP-REPEATED)]. Full expression: > > >>>> --UNKNOWN > > >>>>>> EXPRESSION--.. [ 47142fa4-7e6a-48cb-be6a-676e885ede11 on > > >>>> bullseye-3:31010 ] > > >>>>>>> > > >>>>>>> Error: exception while executing query: Failure while executing > > >> query. > > >>>>>> (state=,code=0) > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> Chris Matta > > >>>>>>> cma...@mapr.com > > >>>>>>> 215-701-3146 > > >>>>>>> > > >>>>>>> On Wed, Jan 21, 2015 at 4:50 PM, Aditya <adityakish...@gmail.com > > > > >>>> wrote: > > >>>>>>> > > >>>>>>>> repeated_count('entities.urls') > 0 > > >>>>>>>> > > >>>>>>>> On Wed, Jan 21, 2015 at 1:46 PM, Andries Engelbrecht < > > >>>>>>>> aengelbre...@maprtech.com> wrote: > > >>>>>>>> > > >>>>>>>>> How do you filter out records with an empty array in drill? > > >>>>>>>>> i.e some records have "url":[] and some will have an array > with > > >> data > > >>>>>> in > > >>>>>>>>> it. When trying to read records with data in the array drill > fails > > >>>> due > > >>>>>>>> to > > >>>>>>>>> records missing any data in the array. Trying a filter with/* > where > > >>>>>>>>> "url":[0] is not null */ fails, also fails if applying url is > not > > >>>>>> null. > > >>>>>>>>> > > >>>>>>>>> Note some of the arrays contains maps, using twitter data as an > > >>>>>> example > > >>>>>>>>> below. Some records have an empty array with “hashtags”:[] and > > >>>> others > > >>>>>>>> will > > >>>>>>>>> look similar to what is listed below. > > >>>>>>>>> > > >>>>>>>>> "entities": { > > >>>>>>>>> "trends": [], > > >>>>>>>>> "symbols": [], > > >>>>>>>>> "urls": [], > > >>>>>>>>> "hashtags": [ > > >>>>>>>>> { > > >>>>>>>>> "text": "GoPatriots", > > >>>>>>>>> "indices": [ > > >>>>>>>>> 83, > > >>>>>>>>> 94 > > >>>>>>>>> ] > > >>>>>>>>> } > > >>>>>>>>> ], > > >>>>>>>>> "user_mentions": [] > > >>>>>>>>> }, > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Thanks > > >>>>>>>>> —Andries > > >>>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>> > > >>>> > > >>>> > > >> > > >> > > > > > > <z.json> > >