Looks like the COALESCE function is the source of the problem. Passing a
double (0.0) instead of an int (0) as a second expression solved the
problem for me:
CAST(COALESCE(t_total, 0.0) AS double)
On Fri, Mar 11, 2016 at 12:45 AM, Matt wrote:
> ~~~
> 00-01
This depends how do you see the "batter" field.
If you see it as an array, the query should be run as "select id,
t.batters.batter[0].id"
But you can also view "batter" as a repeated map, which would mean "id" is
a child field of this map, repeated a number of time and projecting "
For the those who are curious, here is what Mongo returns for the same
conceptual query:
> db.t1.find({},{"batters":1, "_id":0});
{ "batters" : { "batter" : [ { "id" : "1001", "type" : "Regular" }, { "id" :
"1002", "type" : "Chocolate" }, { "id" : "1003", "type" : "Blueberry" }, { "id"
:
In the mean time, if you want to be able to query this data, there are
basically two ways to do this in Drill. One way is to give an index into
the array:
select id, t.batters.batter[0].id
This will only look at the first entry in each array.
The other way is to flatten the data:
select
Yeah, it's definitely a bug. Could you please file a jira?
On Thu, Mar 10, 2016 at 6:19 PM, Jiang Wu wrote:
> Here are the complete examples to illustrate the problem:
>
> 0: jdbc:drill:zk=local> select id, t.batters.batter from
> dfs.`c:\tmp\sample.json` t;
>
Here are the complete examples to illustrate the problem:
0: jdbc:drill:zk=local> select id, t.batters.batter from
dfs.`c:\tmp\sample.json` t;
+++
| id | EXPR$1 |
+++
| 0001 |
One problem with the current behavior is that the results are not correlated to
the JSON row object. It looks like the code simply picks the first N values
from the union of all values across all rows. The N is the number of rows in
the result.
For example, if I give this query:
0:
I will play around with MongoDB and report back the findings. Thanks.
-- Jiang
-Original Message-
From: Steven Phillips [mailto:ste...@dremio.com]
Sent: Thursday, March 10, 2016 5:41 PM
To: user
Subject: Re: Question on nested JSON behavior
I am surprised that
I am surprised that you are getting that result. I would have expected the
query to fail. Since batter is an array, you should specify the index of
the array if yo want to access lower level elements.
A way to access all of the sub-fields of a repeated map is something we've
discussed, but never
Actually I agree with Jiang. The result does seem unintuitive. If it is a
file with just a list, it does still make sense to return the ids in that
list as an array unless the user has configured Drill to automatically
flatten the first level.
Does anyone know how does the other systems behave for
Hi Jiang,
Think of it this way: If you had a file that was just the list:
{"id":"1001","type":"Regular"}
{"id":"1002","type":"Chocolate"}
{"id":"1003","type":"Blueberry"}
{"id":"1004","type":"Devil's Food"}
What would you like it to return when you query:
select id from
Drill version: 1.4.0. Assuming 3 JSON objects with the following structure:
{ ...
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
~~~
00-01 Project(date_tm=[CAST($23):TIMESTAMP(0)],
id_1=[CAST($11):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE
"ISO-8859-1$en_US$primary"], id_2=[CAST($15):VARCHAR(1) CHARACTER SET
"ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"],
id_3=[CAST($33):VARCHAR(1) CHARACTER SET "ISO-8859-1"
>From the log, I see this one:
StringFunctionHelpers.varCharToInt():
It seems like there might be an implicit casting (to integer) be inserted
into your expression.
Can you do EXPLAIN PLAN FOR "your query" (not double quoted needed)
and share what you see?
On Thu, Mar 10, 2016 at 3:11 PM,
TRIM() did not help, and I visually verified there is no whitespace
around the number: The entire test data file:
~~~
Could you try throwing a trim() in before the cast? Might just be
whitespace.
There shouldn't be a need to add the COALESCE statement, as we only ever
read defined values (which may be empty string) out of a CSV file. You
should instead write out a full case statement that checks for empty string
Thank you for your suggestions.
UDF sounds pretty cool.
Thanks,
Bo
On Wed, Mar 9, 2016 at 5:30 AM, Abdel Hakim Deneche
wrote:
> User Defined Function doesn't exclude the possibility to include the
> function in Drill's code base, if the user chooses to contribute the
>
Hi All,
We are considering using drill to access data for large scale analytics on top
of parquet files stored on HDFS.
We would like to add data to this data-set in real-time, as it arrives into our
system. One propose solution was to use drill to perform both the inserts and
the selects on
I've been checking the logs, and I think that the problem is that it's
walking through the "directories" in S3 recursively, doing lots of small
HTTP requests.
My files are organized like this which amplifies the issue:
/category/random-hash/year/month/day/hour/data-chunk-000.json.gz
The
I'm trying to join two VARCHAR2 columns of differing length. The generate
SQL casts the shorter to match the longer.
create table myschema.a_table(
description varchar2(10)
);
create table myschema.b_table(
description20 varchar2(20)
);
When the join is attempted
0: jdbc:drill:zk=local>
I'm querying 20G of gzipped JSONs split in ~5600 small files with sizes
ranging from 1M to 30Mb. Drill is running in aws in 4 m4.xlarge nodes
and it's taking around 50 minutes before the query starts executing.
Any idea what could be causing this delay? What's the best way to debug
this?
I was trying to generate a sequence of timestamps from a sequence of
numbers, and run into this:
select
596 * interval '1' hour,
597 * interval '1' hour
from (values 1);
EXPR$0 |EXPR$1 |
---||
PT2145600S |PT-2145767.296S |
Notice
22 matches
Mail list logo