Re: Empty Array in JSON

Ted Dunning Mon, 19 Oct 2015 16:10:50 -0700

Cool.

thanks for the nice comment.


On Mon, Oct 19, 2015 at 4:07 PM, Kristine Hahn <[email protected]> wrote:

> Thanks for explaining what's going on, Ted! I'll use the info you've
> provided to clarify things in the docs.
>
> Kristine Hahn
> Sr. Technical Writer
> 415-497-8107 @krishahn skype:krishahn
>
>
> On Mon, Oct 19, 2015 at 4:03 PM, Ted Dunning <[email protected]>
> wrote:
>
> > Kristine,
> >
> > I just tried working with that data file
> (sf-city-logs-json/citylogs.json).
> >
> > First, I tried munging the file slightly.  I deleted the outer wrapping
> of
> > the file and then deleted the commas between records.  Drill handles the
> > resulting file wonderfully:
> >
> > 0: jdbc:drill:> select lots.geometry.coordinates[0][0][0] longitude,
> > lots.geometry.coordinates[0][0][1] latitude,
> > lots.geometry.coordinates[0][0][2] altitude from
> > dfs.tdunning.`sf-city-lots-json/citylots.json` lots limit 1;
> > +-----------------------+---------------------+-----------+
> > |       longitude       |      latitude       | altitude  |
> > +-----------------------+---------------------+-----------+
> > | -122.422003528252475  | 37.808480096967251  | 0.0       |
> > +-----------------------+---------------------+-----------+
> > 1 row selected (0.618 seconds)
> >
> > The data is not uniform, however, since there are Polygon shapes and
> > MultiPolygon shapes.  If I avoid the multi-polygon lines (120 of them),
> > Drill works fine on the the entire remainder:
> >
> > 0: jdbc:drill:> with tbl as (select
> cast(lots.geometry.coordinates[0][0][0]
> > as float) longitude, cast(lots.geometry.coordinates[0][0][1] as float)
> > latitude, cast(lots.geometry.coordinates[0][0][2] as float) altitude from
> > dfs.tdunning.`uniform.json` lots)
> > . . . . . . . > select avg(longitude), avg(latitude), max(altitude) from
> > tbl;
> > +---------------------+--------------------+---------+
> > |       EXPR$0        |       EXPR$1       | EXPR$2  |
> > +---------------------+--------------------+---------+
> > | -122.4379846573301  | 37.75844260679518  | 0.0     |
> > +---------------------+--------------------+---------+
> > 1 row selected (6.64 seconds)
> >
> > Where Drill breaks down is if I try to do anything with a list that has
> > varying types. For shapes of type Polygon, I have a list of lists of
> > coordinates.  For shapes of MultiPolygon, I have a list of lists of lists
> > of coordinates.  That means that even a query that tries to filter away
> the
> > MultiPolygons will fail:
> >
> > > select count(*) from (
> > >    select x.geometry.coordinates[0][0] as geo
> > >    from dfs.tdunning.`sf-city-lots-json/citylots.json` x) lots
> > > where lots.geo[0][0] is null;
> > Error: UNSUPPORTED_OPERATION ERROR: In a list of type VARCHAR,
> encountered
> > a value of type LIST. Drill does not support lists of different types.
> >
> > File  /mapr/se1/user/tdunning/sf-city-lots-json/citylots.json
> > Record  48570
> > Fragment 0:0
> >
> > As others have noted, this is a major issue with many kinds of JSON
> input,
> > but the docs pages don't really make clear what is happening.
> >
> >
> >
> >
> >
> >
> > Whe
> >
> > On Mon, Oct 19, 2015 at 1:17 PM, Kristine Hahn <[email protected]>
> wrote:
> >
> > > Thanks for flagging the problem in the docs, guys. In some cases,
> > removing
> > > an empty array can be used as a workaround to query a JSON file that
> > > otherwise fails, as shown in this example:
> > >
> > >
> > >
> >
> http://drill.apache.org/docs/json-data-model/#example:-access-a-map-field-in-an-array
> > >
> > > Maybe this example, assuming it still works shown in the docs, should
> > > remain because sometimes empty arrays cause problems.
> > >
> > > In the workaround section
> > > http://drill.apache.org/docs/json-data-model/#empty-array, I'll remove
> > the
> > > ambiguous example {"a":[]} and reword the section to say an empty array
> > > causes problems sometimes. Try removing it if you have a problem.
> > >
> > > Kristine Hahn
> > > Sr. Technical Writer
> > > 415-497-8107 @krishahn skype:krishahn
> > >
> > >
> > > On Mon, Oct 19, 2015 at 1:03 PM, Andries Engelbrecht <
> > > [email protected]> wrote:
> > >
> > > > Word of caution that Flatten may be better as only the first may be
> > null.
> > > >
> > > > —Andries
> > > >
> > > >
> > > > > On Oct 19, 2015, at 12:59 PM, John Omernik <[email protected]>
> wrote:
> > > > >
> > > > > Awesome that worked.
> > > > >
> > > > > *The documentation should probably be updated on the array stuff,
> > it's
> > > > not
> > > > > accurate as it pertains to empty arrays.
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Oct 19, 2015 at 2:52 PM, Andries Engelbrecht <
> > > > > [email protected]> wrote:
> > > > >
> > > > >> Use where a[0] is not null
> > > > >>
> > > > >> 0: jdbc:drill:> select * from `./array.json`;
> > > > >> +----+--------+
> > > > >> | b  |   a    |
> > > > >> +----+--------+
> > > > >> | 1  | []     |
> > > > >> | 3  | [1,2]  |
> > > > >> +----+--------+
> > > > >> 2 rows selected (0.13 seconds)
> > > > >> 0: jdbc:drill:> select * from `./array.json` where a[0] is not
> null;
> > > > >> +----+--------+
> > > > >> | b  |   a    |
> > > > >> +----+--------+
> > > > >> | 3  | [1,2]  |
> > > > >> +----+--------+
> > > > >> 1 row selected (0.151 seconds)
> > > > >>
> > > > >> —Andries
> > > > >>
> > > > >>
> > > > >>> On Oct 19, 2015, at 12:32 PM, John Omernik <[email protected]>
> > wrote:
> > > > >>>
> > > > >>> Well you are in a sense confirming my suspicions that an empty
> > array,
> > > > as
> > > > >>> specified in the Docs as "error causing" doesn't actually cause
> an
> > > > error,
> > > > >>> and that is expected. That is, empty arrays are not the big
> meanies
> > > > that
> > > > >>> the docs make them out to be (my results are the same as your,
> that
> > > is,
> > > > >> no
> > > > >>> errors).
> > > > >>>
> > > > >>> I like the flatten approach, but is there a simple way to say
> > select
> > > *
> > > > >> from
> > > > >>> dfs.tdunning.`x.json` where
> > > > >>>
> > > > >>> a is not empty
> > > > >>>
> > > > >>> or
> > > > >>>
> > > > >>> size(a) == 0
> > > > >>>
> > > > >>> or
> > > > >>>
> > > > >>> a != []
> > > > >>>
> > > > >>>
> > > > >>> I guess some functions for working with arrays would be handy.
> I'll
> > > > play
> > > > >>> with flatten to see if it gives me what I am looking for, but are
> > > there
> > > > >>> other ways to play with arrays (now that I confirm that empty
> > arrays
> > > > >> aren't
> > > > >>> evil)
> > > > >>>
> > > > >>> John
> > > > >>>
> > > > >>>
> > > > >>> On Mon, Oct 19, 2015 at 1:40 PM, Ted Dunning <
> > [email protected]>
> > > > >> wrote:
> > > > >>>
> > > > >>>> John,
> > > > >>>>
> > > > >>>> I don't understand what you are seeing.  Here is what I am
> seeing
> > > (and
> > > > >>>> hopefully you can tell what I am missing).
> > > > >>>>
> > > > >>>> First the input is:
> > > > >>>>
> > > > >>>> $ cat x.json
> > > > >>>> {"b":1, "a":[] }
> > > > >>>> {"a":[1,2], "b":3}
> > > > >>>>
> > > > >>>> And then with this input, I get this:
> > > > >>>>
> > > > >>>> 0: jdbc:drill:> select * from dfs.tdunning.`x.json`;
> > > > >>>> +----+------------+
> > > > >>>> | b  |     a      |
> > > > >>>> +----+------------+
> > > > >>>> | 1  | []         |
> > > > >>>> | 3  | ["1","2"]  |
> > > > >>>> +----+------------+
> > > > >>>> 2 rows selected (0.443 seconds)
> > > > >>>> 0: jdbc:drill:> select a,b from dfs.tdunning.`x.json`;
> > > > >>>> +------------+----+
> > > > >>>> |     a      | b  |
> > > > >>>> +------------+----+
> > > > >>>> | []         | 1  |
> > > > >>>> | ["1","2"]  | 3  |
> > > > >>>> +------------+----+
> > > > >>>> 2 rows selected (0.473 seconds)
> > > > >>>> 0: jdbc:drill:> select flatten(a),b from dfs.tdunning.`x.json`;
> > > > >>>> +---------+----+
> > > > >>>> | EXPR$0  | b  |
> > > > >>>> +---------+----+
> > > > >>>> | 1       | 3  |
> > > > >>>> | 2       | 3  |
> > > > >>>> +---------+----+
> > > > >>>> 2 rows selected (0.499 seconds)
> > > > >>>>
> > > > >>>>
> > > > >>>> On Mon, Oct 19, 2015 at 7:03 AM, John Omernik <[email protected]
> >
> > > > wrote:
> > > > >>>>
> > > > >>>>> In https://drill.apache.org/docs/json-data-model/ there is a
> > > section
> > > > >>>> that
> > > > >>>>> goes as laid out below.   This is actually not occurring for
> me.
> > I
> > > > >> have a
> > > > >>>>> json dump from Mongo that has a field called tags where many
> > > records
> > > > >> have
> > > > >>>>> "tags":[] and it's outputting that without error.  (It just
> shows
> > > []
> > > > as
> > > > >>>> the
> > > > >>>>> output).
> > > > >>>>>
> > > > >>>>> So, my question is this... based on the documentation, what I
> am
> > > > seeing
> > > > >>>> is
> > > > >>>>> NOT expected, is it a miss on the docs, or something that is
> > fixed
> > > in
> > > > >> the
> > > > >>>>> 1.2 release that I have?
> > > > >>>>>
> > > > >>>>> If it is fixed so we can have empty arrays in a field like
> tags,
> > is
> > > > it
> > > > >>>>> possible there are some functions I can use to determine if
> that
> > > > field
> > > > >> is
> > > > >>>>> empty? i.e. if isemptyarray(tags) returns true if empty or
> > perhaps
> > > > get
> > > > >> me
> > > > >>>>> the length said array?  These functions would  be very valuable
> > in
> > > > >>>> queries
> > > > >>>>> (if the empty arrays thing is not a weird quirk I am seeing).
> > > > >>>>>
> > > > >>>>> Empty array
> > > > >>>>>
> > > > >>>>> Drill cannot read an empty array, shown in the following
> example,
> > > and
> > > > >>>>> attempting to do so causes an error.
> > > > >>>>>
> > > > >>>>>   { "a":[] }
> > > > >>>>>
> > > > >>>>> Workaround: Remove empty arrays.
> > > > >>>>>
> > > > >>>>
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: Empty Array in JSON

Reply via email to