Re: NumberFormatException with cast to double?

2016-03-10 Thread Abdel Hakim Deneche
Looks like the COALESCE function is the source of the problem. Passing a double (0.0) instead of an int (0) as a second expression solved the problem for me: CAST(COALESCE(t_total, 0.0) AS double) On Fri, Mar 11, 2016 at 12:45 AM, Matt wrote: > ~~~ > 00-01

Re: Question on nested JSON behavior

2016-03-10 Thread Aditya
This depends how do you see the "batter" field. If you see it as an array, the query should be run as "select id, t.batters.batter[0].id" But you can also view "batter" as a repeated map, which would mean "id" is a child field of this map, repeated a number of time and projecting "

RE: Question on nested JSON behavior

2016-03-10 Thread Jiang Wu
For the those who are curious, here is what Mongo returns for the same conceptual query: > db.t1.find({},{"batters":1, "_id":0}); { "batters" : { "batter" : [ { "id" : "1001", "type" : "Regular" }, { "id" : "1002", "type" : "Chocolate" }, { "id" : "1003", "type" : "Blueberry" }, { "id" :

Re: Question on nested JSON behavior

2016-03-10 Thread Steven Phillips
In the mean time, if you want to be able to query this data, there are basically two ways to do this in Drill. One way is to give an index into the array: select id, t.batters.batter[0].id This will only look at the first entry in each array. The other way is to flatten the data: select

Re: Question on nested JSON behavior

2016-03-10 Thread Steven Phillips
Yeah, it's definitely a bug. Could you please file a jira? On Thu, Mar 10, 2016 at 6:19 PM, Jiang Wu wrote: > Here are the complete examples to illustrate the problem: > > 0: jdbc:drill:zk=local> select id, t.batters.batter from > dfs.`c:\tmp\sample.json` t; >

RE: Question on nested JSON behavior

2016-03-10 Thread Jiang Wu
Here are the complete examples to illustrate the problem: 0: jdbc:drill:zk=local> select id, t.batters.batter from dfs.`c:\tmp\sample.json` t; +++ | id | EXPR$1 | +++ | 0001 |

RE: Question on nested JSON behavior

2016-03-10 Thread Jiang Wu
One problem with the current behavior is that the results are not correlated to the JSON row object. It looks like the code simply picks the first N values from the union of all values across all rows. The N is the number of rows in the result. For example, if I give this query: 0:

RE: Question on nested JSON behavior

2016-03-10 Thread Jiang Wu
I will play around with MongoDB and report back the findings. Thanks. -- Jiang -Original Message- From: Steven Phillips [mailto:ste...@dremio.com] Sent: Thursday, March 10, 2016 5:41 PM To: user Subject: Re: Question on nested JSON behavior I am surprised that

Re: Question on nested JSON behavior

2016-03-10 Thread Steven Phillips
I am surprised that you are getting that result. I would have expected the query to fail. Since batter is an array, you should specify the index of the array if yo want to access lower level elements. A way to access all of the sub-fields of a repeated map is something we've discussed, but never

Re: Question on nested JSON behavior

2016-03-10 Thread Neeraja Rentachintala
Actually I agree with Jiang. The result does seem unintuitive. If it is a file with just a list, it does still make sense to return the ids in that list as an array unless the user has configured Drill to automatically flatten the first level. Does anyone know how does the other systems behave for

Re: Question on nested JSON behavior

2016-03-10 Thread Nathan Griffith
Hi Jiang, Think of it this way: If you had a file that was just the list: {"id":"1001","type":"Regular"} {"id":"1002","type":"Chocolate"} {"id":"1003","type":"Blueberry"} {"id":"1004","type":"Devil's Food"} What would you like it to return when you query: select id from

Question on nested JSON behavior

2016-03-10 Thread Jiang Wu
Drill version: 1.4.0. Assuming 3 JSON objects with the following structure: { ... "batters": { "batter": [ { "id": "1001", "type": "Regular" }, { "id": "1002", "type": "Chocolate" }, { "id": "1003", "type": "Blueberry" },

Re: NumberFormatException with cast to double?

2016-03-10 Thread Matt
~~~ 00-01 Project(date_tm=[CAST($23):TIMESTAMP(0)], id_1=[CAST($11):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"], id_2=[CAST($15):VARCHAR(1) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"], id_3=[CAST($33):VARCHAR(1) CHARACTER SET "ISO-8859-1"

Re: NumberFormatException with cast to double?

2016-03-10 Thread Hsuan Yi Chu
>From the log, I see this one: StringFunctionHelpers.varCharToInt(): It seems like there might be an implicit casting (to integer) be inserted into your expression. Can you do EXPLAIN PLAN FOR "your query" (not double quoted needed) and share what you see? On Thu, Mar 10, 2016 at 3:11 PM,

Re: NumberFormatException with cast to double?

2016-03-10 Thread Matt
TRIM() did not help, and I visually verified there is no whitespace around the number: The entire test data file: ~~~

Re: NumberFormatException with cast to double?

2016-03-10 Thread Jason Altekruse
Could you try throwing a trim() in before the cast? Might just be whitespace. There shouldn't be a need to add the COALESCE statement, as we only ever read defined values (which may be empty string) out of a CSV file. You should instead write out a full case statement that checks for empty string

Re: Drill with String Aggregation

2016-03-10 Thread Bosung Seo
Thank you for your suggestions. UDF sounds pretty cool. Thanks, Bo On Wed, Mar 9, 2016 at 5:30 AM, Abdel Hakim Deneche wrote: > User Defined Function doesn't exclude the possibility to include the > function in Drill's code base, if the user chooses to contribute the >

concurrency questions

2016-03-10 Thread Yadid Ayzenberg
Hi All, We are considering using drill to access data for large scale analytics on top of parquet files stored on HDFS. We would like to add data to this data-set in real-time, as it arrives into our system. One propose solution was to use drill to perform both the inserts and the selects on

Re: S3 query takes a long time to start

2016-03-10 Thread Oscar Morante
I've been checking the logs, and I think that the problem is that it's walking through the "directories" in S3 recursively, doing lots of small HTTP requests. My files are organized like this which amplifies the issue: /category/random-hash/year/month/day/hour/data-chunk-000.json.gz The

Cast function for Oracle

2016-03-10 Thread Chris Atkinson
I'm trying to join two VARCHAR2 columns of differing length. The generate SQL casts the shorter to match the longer. create table myschema.a_table( description varchar2(10) ); create table myschema.b_table( description20 varchar2(20) ); When the join is attempted 0: jdbc:drill:zk=local>

S3 query takes a long time to start

2016-03-10 Thread Oscar Morante
I'm querying 20G of gzipped JSONs split in ~5600 small files with sizes ranging from 1M to 30Mb. Drill is running in aws in 4 m4.xlarge nodes and it's taking around 50 minutes before the query starts executing. Any idea what could be causing this delay? What's the best way to debug this?

Time intervals overflowing?

2016-03-10 Thread Oscar Morante
I was trying to generate a sequence of timestamps from a sequence of numbers, and run into this: select 596 * interval '1' hour, 597 * interval '1' hour from (values 1); EXPR$0 |EXPR$1 | ---|| PT2145600S |PT-2145767.296S | Notice