Re: The mysterious life of mr. Trim (aka. the value displacement trick)

Jacques Nadeau Tue, 21 Jul 2015 11:18:17 -0700

Whoa, something strange is going on.

Expected
select * from dfs.tmp.test2;
+-----------------+
|   dimensions    |
+-----------------+
| {"adults":"A"}  |
+-----------------+


select p.dimensions as b from dfs.tmp.test2 p;
+-----------------+
|        b        |
+-----------------+
| {"adults":"A"}  |
+-----------------+

select p.dimensions as b from dfs.tmp.test2 p;
+-----------------+
|        b        |
+-----------------+
| {"adults":"A"}  |
+-----------------+

select p.dimensions.nonexist as a from dfs.tmp.test2 p;
+-------+
|   b   |
+-------+
| null  |
+-------+

Unexpected

select p.dimensions.nonexist as a, p.dimensions as b from dfs.tmp.test2 p;
+----+-------+
| a  |   b   |
+----+-------+
| A  | null  |
+----+-------+

<<both wrong>>

select p.dimensions as b, p.dimensions.nonexist as a from dfs.tmp.test2 p;
+-------+-------+
|   b   |   a   |
+-------+-------+
| null  | null  |
+-------+-------+

<< b is wrong, a is correct>>

Stefan, not sure what is going on here.  It looks like an issue with
resolution of a nonexistent field in a multilevel structure.  It exists
whether it is Parquet or JSON.  Can you file a JIRA and mark it Critical?


On Tue, Jul 21, 2015 at 10:46 AM, Stefán Baxter <[email protected]>
wrote:

> Hi,
>
> all kidding aside then this renders Drill+Parquet unusable for us at the
> moment.
>
> If there is a quick fix then please let me know. :)
>
> Regards,
>  -Stefan
>
> On Tue, Jul 21, 2015 at 5:37 PM, Stefán Baxter <[email protected]>
> wrote:
>
> > Well,
> >
> > After some more testing it appears that this has nothing to do with trim.
> > (any non existing nested-value will be pushed aside)
> >
> > select p.dimensions.budgetLevel as `field1`, lower(p.dimensions.adults)
> as
> > `field2` from dfs.tmp.`/test/0_0_0.parquet` as p;
> >
> > also returns:
> > +---------+---------+
> > | field1  | field2  |
> > +---------+---------+
> > | a       | null    |
> > +---------+---------+
> >
> > I just as puzzled though :)
> >
> > Regards,
> >  -Stefan
> >
> > On Tue, Jul 21, 2015 at 5:32 PM, Stefán Baxter <
> [email protected]>
> > wrote:
> >
> >> Hi,
> >>
> >> Here is small trick I think you will like :).
> >>
> >>    - With this minimal dataset as /tmp/test.json:
> >>    {"dimensions":{"adults":"A"}}
> >>
> >>    - Running this:
> >>    select lower(p.dimensions.budgetLevel) as `field1`,
> >>    lower(p.dimensions.adults) as `field2` from dfs.tmp.`/test.json` as
> p;
> >>
> >>    - To no surprise returns this:
> >>    +---------+---------+
> >>    | field1  | field2  |
> >>    +---------+---------+
> >>    | null    | a       |
> >>    +---------+---------+
> >>
> >> Here comes the trick(y) part (hold your breath):
> >>
> >>    - With the same data as a Parquet file
> >>    CREATE TABLE dfs.tmp.`/test` AS SELECT * FROM dfs.tmp.`/test.json`;
> >>
> >>    - The same query:
> >>    select lower(p.dimensions.budgetLevel) as `field1`,
> >>    lower(p.dimensions.adults) as `field2` from
> dfs.tmp.`/test/0_0_0.parquet`
> >>    as p;
> >>
> >>    - Return this:
> >>    +---------+---------+
> >>    | field1  | field2  |
> >>    +---------+---------+
> >>    | a       | null    |
> >>    +---------+---------+
> >>
> >> ta ta !
> >>
> >> Best regards,
> >>  -Stefan
> >>
> >>
> >>
> >>
> >
>

Re: The mysterious life of mr. Trim (aka. the value displacement trick)

Reply via email to