Cool. thanks for the nice comment.
On Mon, Oct 19, 2015 at 4:07 PM, Kristine Hahn <kh...@maprtech.com> wrote: > Thanks for explaining what's going on, Ted! I'll use the info you've > provided to clarify things in the docs. > > Kristine Hahn > Sr. Technical Writer > 415-497-8107 @krishahn skype:krishahn > > > On Mon, Oct 19, 2015 at 4:03 PM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > Kristine, > > > > I just tried working with that data file > (sf-city-logs-json/citylogs.json). > > > > First, I tried munging the file slightly. I deleted the outer wrapping > of > > the file and then deleted the commas between records. Drill handles the > > resulting file wonderfully: > > > > 0: jdbc:drill:> select lots.geometry.coordinates[0][0][0] longitude, > > lots.geometry.coordinates[0][0][1] latitude, > > lots.geometry.coordinates[0][0][2] altitude from > > dfs.tdunning.`sf-city-lots-json/citylots.json` lots limit 1; > > +-----------------------+---------------------+-----------+ > > | longitude | latitude | altitude | > > +-----------------------+---------------------+-----------+ > > | -122.422003528252475 | 37.808480096967251 | 0.0 | > > +-----------------------+---------------------+-----------+ > > 1 row selected (0.618 seconds) > > > > The data is not uniform, however, since there are Polygon shapes and > > MultiPolygon shapes. If I avoid the multi-polygon lines (120 of them), > > Drill works fine on the the entire remainder: > > > > 0: jdbc:drill:> with tbl as (select > cast(lots.geometry.coordinates[0][0][0] > > as float) longitude, cast(lots.geometry.coordinates[0][0][1] as float) > > latitude, cast(lots.geometry.coordinates[0][0][2] as float) altitude from > > dfs.tdunning.`uniform.json` lots) > > . . . . . . . > select avg(longitude), avg(latitude), max(altitude) from > > tbl; > > +---------------------+--------------------+---------+ > > | EXPR$0 | EXPR$1 | EXPR$2 | > > +---------------------+--------------------+---------+ > > | -122.4379846573301 | 37.75844260679518 | 0.0 | > > +---------------------+--------------------+---------+ > > 1 row selected (6.64 seconds) > > > > Where Drill breaks down is if I try to do anything with a list that has > > varying types. For shapes of type Polygon, I have a list of lists of > > coordinates. For shapes of MultiPolygon, I have a list of lists of lists > > of coordinates. That means that even a query that tries to filter away > the > > MultiPolygons will fail: > > > > > select count(*) from ( > > > select x.geometry.coordinates[0][0] as geo > > > from dfs.tdunning.`sf-city-lots-json/citylots.json` x) lots > > > where lots.geo[0][0] is null; > > Error: UNSUPPORTED_OPERATION ERROR: In a list of type VARCHAR, > encountered > > a value of type LIST. Drill does not support lists of different types. > > > > File /mapr/se1/user/tdunning/sf-city-lots-json/citylots.json > > Record 48570 > > Fragment 0:0 > > > > As others have noted, this is a major issue with many kinds of JSON > input, > > but the docs pages don't really make clear what is happening. > > > > > > > > > > > > > > Whe > > > > On Mon, Oct 19, 2015 at 1:17 PM, Kristine Hahn <kh...@maprtech.com> > wrote: > > > > > Thanks for flagging the problem in the docs, guys. In some cases, > > removing > > > an empty array can be used as a workaround to query a JSON file that > > > otherwise fails, as shown in this example: > > > > > > > > > > > > http://drill.apache.org/docs/json-data-model/#example:-access-a-map-field-in-an-array > > > > > > Maybe this example, assuming it still works shown in the docs, should > > > remain because sometimes empty arrays cause problems. > > > > > > In the workaround section > > > http://drill.apache.org/docs/json-data-model/#empty-array, I'll remove > > the > > > ambiguous example {"a":[]} and reword the section to say an empty array > > > causes problems sometimes. Try removing it if you have a problem. > > > > > > Kristine Hahn > > > Sr. Technical Writer > > > 415-497-8107 @krishahn skype:krishahn > > > > > > > > > On Mon, Oct 19, 2015 at 1:03 PM, Andries Engelbrecht < > > > aengelbre...@maprtech.com> wrote: > > > > > > > Word of caution that Flatten may be better as only the first may be > > null. > > > > > > > > —Andries > > > > > > > > > > > > > On Oct 19, 2015, at 12:59 PM, John Omernik <j...@omernik.com> > wrote: > > > > > > > > > > Awesome that worked. > > > > > > > > > > *The documentation should probably be updated on the array stuff, > > it's > > > > not > > > > > accurate as it pertains to empty arrays. > > > > > > > > > > > > > > > > > > > > On Mon, Oct 19, 2015 at 2:52 PM, Andries Engelbrecht < > > > > > aengelbre...@maprtech.com> wrote: > > > > > > > > > >> Use where a[0] is not null > > > > >> > > > > >> 0: jdbc:drill:> select * from `./array.json`; > > > > >> +----+--------+ > > > > >> | b | a | > > > > >> +----+--------+ > > > > >> | 1 | [] | > > > > >> | 3 | [1,2] | > > > > >> +----+--------+ > > > > >> 2 rows selected (0.13 seconds) > > > > >> 0: jdbc:drill:> select * from `./array.json` where a[0] is not > null; > > > > >> +----+--------+ > > > > >> | b | a | > > > > >> +----+--------+ > > > > >> | 3 | [1,2] | > > > > >> +----+--------+ > > > > >> 1 row selected (0.151 seconds) > > > > >> > > > > >> —Andries > > > > >> > > > > >> > > > > >>> On Oct 19, 2015, at 12:32 PM, John Omernik <j...@omernik.com> > > wrote: > > > > >>> > > > > >>> Well you are in a sense confirming my suspicions that an empty > > array, > > > > as > > > > >>> specified in the Docs as "error causing" doesn't actually cause > an > > > > error, > > > > >>> and that is expected. That is, empty arrays are not the big > meanies > > > > that > > > > >>> the docs make them out to be (my results are the same as your, > that > > > is, > > > > >> no > > > > >>> errors). > > > > >>> > > > > >>> I like the flatten approach, but is there a simple way to say > > select > > > * > > > > >> from > > > > >>> dfs.tdunning.`x.json` where > > > > >>> > > > > >>> a is not empty > > > > >>> > > > > >>> or > > > > >>> > > > > >>> size(a) == 0 > > > > >>> > > > > >>> or > > > > >>> > > > > >>> a != [] > > > > >>> > > > > >>> > > > > >>> I guess some functions for working with arrays would be handy. > I'll > > > > play > > > > >>> with flatten to see if it gives me what I am looking for, but are > > > there > > > > >>> other ways to play with arrays (now that I confirm that empty > > arrays > > > > >> aren't > > > > >>> evil) > > > > >>> > > > > >>> John > > > > >>> > > > > >>> > > > > >>> On Mon, Oct 19, 2015 at 1:40 PM, Ted Dunning < > > ted.dunn...@gmail.com> > > > > >> wrote: > > > > >>> > > > > >>>> John, > > > > >>>> > > > > >>>> I don't understand what you are seeing. Here is what I am > seeing > > > (and > > > > >>>> hopefully you can tell what I am missing). > > > > >>>> > > > > >>>> First the input is: > > > > >>>> > > > > >>>> $ cat x.json > > > > >>>> {"b":1, "a":[] } > > > > >>>> {"a":[1,2], "b":3} > > > > >>>> > > > > >>>> And then with this input, I get this: > > > > >>>> > > > > >>>> 0: jdbc:drill:> select * from dfs.tdunning.`x.json`; > > > > >>>> +----+------------+ > > > > >>>> | b | a | > > > > >>>> +----+------------+ > > > > >>>> | 1 | [] | > > > > >>>> | 3 | ["1","2"] | > > > > >>>> +----+------------+ > > > > >>>> 2 rows selected (0.443 seconds) > > > > >>>> 0: jdbc:drill:> select a,b from dfs.tdunning.`x.json`; > > > > >>>> +------------+----+ > > > > >>>> | a | b | > > > > >>>> +------------+----+ > > > > >>>> | [] | 1 | > > > > >>>> | ["1","2"] | 3 | > > > > >>>> +------------+----+ > > > > >>>> 2 rows selected (0.473 seconds) > > > > >>>> 0: jdbc:drill:> select flatten(a),b from dfs.tdunning.`x.json`; > > > > >>>> +---------+----+ > > > > >>>> | EXPR$0 | b | > > > > >>>> +---------+----+ > > > > >>>> | 1 | 3 | > > > > >>>> | 2 | 3 | > > > > >>>> +---------+----+ > > > > >>>> 2 rows selected (0.499 seconds) > > > > >>>> > > > > >>>> > > > > >>>> On Mon, Oct 19, 2015 at 7:03 AM, John Omernik <j...@omernik.com > > > > > > wrote: > > > > >>>> > > > > >>>>> In https://drill.apache.org/docs/json-data-model/ there is a > > > section > > > > >>>> that > > > > >>>>> goes as laid out below. This is actually not occurring for > me. > > I > > > > >> have a > > > > >>>>> json dump from Mongo that has a field called tags where many > > > records > > > > >> have > > > > >>>>> "tags":[] and it's outputting that without error. (It just > shows > > > [] > > > > as > > > > >>>> the > > > > >>>>> output). > > > > >>>>> > > > > >>>>> So, my question is this... based on the documentation, what I > am > > > > seeing > > > > >>>> is > > > > >>>>> NOT expected, is it a miss on the docs, or something that is > > fixed > > > in > > > > >> the > > > > >>>>> 1.2 release that I have? > > > > >>>>> > > > > >>>>> If it is fixed so we can have empty arrays in a field like > tags, > > is > > > > it > > > > >>>>> possible there are some functions I can use to determine if > that > > > > field > > > > >> is > > > > >>>>> empty? i.e. if isemptyarray(tags) returns true if empty or > > perhaps > > > > get > > > > >> me > > > > >>>>> the length said array? These functions would be very valuable > > in > > > > >>>> queries > > > > >>>>> (if the empty arrays thing is not a weird quirk I am seeing). > > > > >>>>> > > > > >>>>> Empty array > > > > >>>>> > > > > >>>>> Drill cannot read an empty array, shown in the following > example, > > > and > > > > >>>>> attempting to do so causes an error. > > > > >>>>> > > > > >>>>> { "a":[] } > > > > >>>>> > > > > >>>>> Workaround: Remove empty arrays. > > > > >>>>> > > > > >>>> > > > > >> > > > > >> > > > > > > > > > > > > > >