Thanks for flagging the problem in the docs, guys. In some cases, removing an empty array can be used as a workaround to query a JSON file that otherwise fails, as shown in this example:
http://drill.apache.org/docs/json-data-model/#example:-access-a-map-field-in-an-array Maybe this example, assuming it still works shown in the docs, should remain because sometimes empty arrays cause problems. In the workaround section http://drill.apache.org/docs/json-data-model/#empty-array, I'll remove the ambiguous example {"a":[]} and reword the section to say an empty array causes problems sometimes. Try removing it if you have a problem. Kristine Hahn Sr. Technical Writer 415-497-8107 @krishahn skype:krishahn On Mon, Oct 19, 2015 at 1:03 PM, Andries Engelbrecht < aengelbre...@maprtech.com> wrote: > Word of caution that Flatten may be better as only the first may be null. > > —Andries > > > > On Oct 19, 2015, at 12:59 PM, John Omernik <j...@omernik.com> wrote: > > > > Awesome that worked. > > > > *The documentation should probably be updated on the array stuff, it's > not > > accurate as it pertains to empty arrays. > > > > > > > > On Mon, Oct 19, 2015 at 2:52 PM, Andries Engelbrecht < > > aengelbre...@maprtech.com> wrote: > > > >> Use where a[0] is not null > >> > >> 0: jdbc:drill:> select * from `./array.json`; > >> +----+--------+ > >> | b | a | > >> +----+--------+ > >> | 1 | [] | > >> | 3 | [1,2] | > >> +----+--------+ > >> 2 rows selected (0.13 seconds) > >> 0: jdbc:drill:> select * from `./array.json` where a[0] is not null; > >> +----+--------+ > >> | b | a | > >> +----+--------+ > >> | 3 | [1,2] | > >> +----+--------+ > >> 1 row selected (0.151 seconds) > >> > >> —Andries > >> > >> > >>> On Oct 19, 2015, at 12:32 PM, John Omernik <j...@omernik.com> wrote: > >>> > >>> Well you are in a sense confirming my suspicions that an empty array, > as > >>> specified in the Docs as "error causing" doesn't actually cause an > error, > >>> and that is expected. That is, empty arrays are not the big meanies > that > >>> the docs make them out to be (my results are the same as your, that is, > >> no > >>> errors). > >>> > >>> I like the flatten approach, but is there a simple way to say select * > >> from > >>> dfs.tdunning.`x.json` where > >>> > >>> a is not empty > >>> > >>> or > >>> > >>> size(a) == 0 > >>> > >>> or > >>> > >>> a != [] > >>> > >>> > >>> I guess some functions for working with arrays would be handy. I'll > play > >>> with flatten to see if it gives me what I am looking for, but are there > >>> other ways to play with arrays (now that I confirm that empty arrays > >> aren't > >>> evil) > >>> > >>> John > >>> > >>> > >>> On Mon, Oct 19, 2015 at 1:40 PM, Ted Dunning <ted.dunn...@gmail.com> > >> wrote: > >>> > >>>> John, > >>>> > >>>> I don't understand what you are seeing. Here is what I am seeing (and > >>>> hopefully you can tell what I am missing). > >>>> > >>>> First the input is: > >>>> > >>>> $ cat x.json > >>>> {"b":1, "a":[] } > >>>> {"a":[1,2], "b":3} > >>>> > >>>> And then with this input, I get this: > >>>> > >>>> 0: jdbc:drill:> select * from dfs.tdunning.`x.json`; > >>>> +----+------------+ > >>>> | b | a | > >>>> +----+------------+ > >>>> | 1 | [] | > >>>> | 3 | ["1","2"] | > >>>> +----+------------+ > >>>> 2 rows selected (0.443 seconds) > >>>> 0: jdbc:drill:> select a,b from dfs.tdunning.`x.json`; > >>>> +------------+----+ > >>>> | a | b | > >>>> +------------+----+ > >>>> | [] | 1 | > >>>> | ["1","2"] | 3 | > >>>> +------------+----+ > >>>> 2 rows selected (0.473 seconds) > >>>> 0: jdbc:drill:> select flatten(a),b from dfs.tdunning.`x.json`; > >>>> +---------+----+ > >>>> | EXPR$0 | b | > >>>> +---------+----+ > >>>> | 1 | 3 | > >>>> | 2 | 3 | > >>>> +---------+----+ > >>>> 2 rows selected (0.499 seconds) > >>>> > >>>> > >>>> On Mon, Oct 19, 2015 at 7:03 AM, John Omernik <j...@omernik.com> > wrote: > >>>> > >>>>> In https://drill.apache.org/docs/json-data-model/ there is a section > >>>> that > >>>>> goes as laid out below. This is actually not occurring for me. I > >> have a > >>>>> json dump from Mongo that has a field called tags where many records > >> have > >>>>> "tags":[] and it's outputting that without error. (It just shows [] > as > >>>> the > >>>>> output). > >>>>> > >>>>> So, my question is this... based on the documentation, what I am > seeing > >>>> is > >>>>> NOT expected, is it a miss on the docs, or something that is fixed in > >> the > >>>>> 1.2 release that I have? > >>>>> > >>>>> If it is fixed so we can have empty arrays in a field like tags, is > it > >>>>> possible there are some functions I can use to determine if that > field > >> is > >>>>> empty? i.e. if isemptyarray(tags) returns true if empty or perhaps > get > >> me > >>>>> the length said array? These functions would be very valuable in > >>>> queries > >>>>> (if the empty arrays thing is not a weird quirk I am seeing). > >>>>> > >>>>> Empty array > >>>>> > >>>>> Drill cannot read an empty array, shown in the following example, and > >>>>> attempting to do so causes an error. > >>>>> > >>>>> { "a":[] } > >>>>> > >>>>> Workaround: Remove empty arrays. > >>>>> > >>>> > >> > >> > >