Whoa, something strange is going on.
Expected
select * from dfs.tmp.test2;
+-----------------+
| dimensions |
+-----------------+
| {"adults":"A"} |
+-----------------+
select p.dimensions as b from dfs.tmp.test2 p;
+-----------------+
| b |
+-----------------+
| {"adults":"A"} |
+-----------------+
select p.dimensions as b from dfs.tmp.test2 p;
+-----------------+
| b |
+-----------------+
| {"adults":"A"} |
+-----------------+
select p.dimensions.nonexist as a from dfs.tmp.test2 p;
+-------+
| b |
+-------+
| null |
+-------+
Unexpected
select p.dimensions.nonexist as a, p.dimensions as b from dfs.tmp.test2 p;
+----+-------+
| a | b |
+----+-------+
| A | null |
+----+-------+
<<both wrong>>
select p.dimensions as b, p.dimensions.nonexist as a from dfs.tmp.test2 p;
+-------+-------+
| b | a |
+-------+-------+
| null | null |
+-------+-------+
<< b is wrong, a is correct>>
Stefan, not sure what is going on here. It looks like an issue with
resolution of a nonexistent field in a multilevel structure. It exists
whether it is Parquet or JSON. Can you file a JIRA and mark it Critical?
On Tue, Jul 21, 2015 at 10:46 AM, Stefán Baxter <[email protected]>
wrote:
> Hi,
>
> all kidding aside then this renders Drill+Parquet unusable for us at the
> moment.
>
> If there is a quick fix then please let me know. :)
>
> Regards,
> -Stefan
>
> On Tue, Jul 21, 2015 at 5:37 PM, Stefán Baxter <[email protected]>
> wrote:
>
> > Well,
> >
> > After some more testing it appears that this has nothing to do with trim.
> > (any non existing nested-value will be pushed aside)
> >
> > select p.dimensions.budgetLevel as `field1`, lower(p.dimensions.adults)
> as
> > `field2` from dfs.tmp.`/test/0_0_0.parquet` as p;
> >
> > also returns:
> > +---------+---------+
> > | field1 | field2 |
> > +---------+---------+
> > | a | null |
> > +---------+---------+
> >
> > I just as puzzled though :)
> >
> > Regards,
> > -Stefan
> >
> > On Tue, Jul 21, 2015 at 5:32 PM, Stefán Baxter <
> [email protected]>
> > wrote:
> >
> >> Hi,
> >>
> >> Here is small trick I think you will like :).
> >>
> >> - With this minimal dataset as /tmp/test.json:
> >> {"dimensions":{"adults":"A"}}
> >>
> >> - Running this:
> >> select lower(p.dimensions.budgetLevel) as `field1`,
> >> lower(p.dimensions.adults) as `field2` from dfs.tmp.`/test.json` as
> p;
> >>
> >> - To no surprise returns this:
> >> +---------+---------+
> >> | field1 | field2 |
> >> +---------+---------+
> >> | null | a |
> >> +---------+---------+
> >>
> >> Here comes the trick(y) part (hold your breath):
> >>
> >> - With the same data as a Parquet file
> >> CREATE TABLE dfs.tmp.`/test` AS SELECT * FROM dfs.tmp.`/test.json`;
> >>
> >> - The same query:
> >> select lower(p.dimensions.budgetLevel) as `field1`,
> >> lower(p.dimensions.adults) as `field2` from
> dfs.tmp.`/test/0_0_0.parquet`
> >> as p;
> >>
> >> - Return this:
> >> +---------+---------+
> >> | field1 | field2 |
> >> +---------+---------+
> >> | a | null |
> >> +---------+---------+
> >>
> >> ta ta !
> >>
> >> Best regards,
> >> -Stefan
> >>
> >>
> >>
> >>
> >
>