Apache Drill 1.12.0

2018-02-23 Thread Robles, Edgardo
Hi, I am evaluating Apache Drill and have run into the following issue with CTAS and large table from RDBMS. I am using apache drill 1.12.0 with ubuntu 16.04.3 vm 8 cores, 8GB of ram with all patches, oracle java 1.8.0.161-b12. Postgres 9.4.1212 jdbc driver connecting to Greenplum 4.3. I enab

Base64 encoded results for Drill query (>10.000 columns)

2018-02-23 Thread Daniel Müller
Hi there, I started working with Drill a few weeks ago and I'm still wondering why the query results are Base64 encoded... I found this ticket, which also handles this situation: https://issues.apache.org/jira/browse/DRILL-4620 Of course, there's the CONVERT_FROM function to translate the r

Re: Apache Drill 1.12.0

2018-02-23 Thread Khurram Faraaz
How many unique values does col4 have in bdl.schema.view_in_greenplum ? Thanks, Khurram From: Robles, Edgardo Sent: Friday, February 23, 2018 8:27:59 AM To: user@drill.apache.org Subject: Apache Drill 1.12.0 Hi, I am evaluating Apache Drill and have run into

Schema problems trying to convert JSON to Parquet

2018-02-23 Thread Lee, David
Using Drill's CTAS statements I've run into a schema inconsistency issue and I'm not sure how to solve it.. CREATE TABLE name [ (column list) ] AS query; If I have a directory called Cities which have JSON files which look like: a.json: { "city":"San Francisco", "zip":"94105"} { "city":"San

Re: Schema problems trying to convert JSON to Parquet

2018-02-23 Thread Andries Engelbrecht
This is a challenge when dealing with JSON. You can either force the data type in the CTAS statement (likely better option) or deal with the data type change in parquet table(s) by using CAST, etc. In the case of zip codes you need to consider if it will be 5 digits or the extended 5-4 digits to

RE: Schema problems trying to convert JSON to Parquet

2018-02-23 Thread Lee, David
Unfortunately the JSON source files I'm trying to convert into nested Parquet have 4,000+ possible keys with multiple levels of nesting.. It would be ideal if you could inject the schema definition into a Drill query instead of relying on schema learning.. Like: Contact First name Last

RE: Schema problems trying to convert JSON to Parquet

2018-02-23 Thread Lee, David
Ideally Drill could be enhanced so you can pass in a schema definition using some spec like: http://json-schema.org/examples.html -Original Message- From: Lee, David Sent: Friday, February 23, 2018 12:44 PM To: user@drill.apache.org Subject: RE: Schema problems trying to convert JSON

Re: Schema problems trying to convert JSON to Parquet

2018-02-23 Thread Paul Rogers
Hi David, Your b.json file has only nulls; there is no way for Drill to determine what type of null is in your file. Drill requires each NULL to be a null of some type. Often, Drill guesses nullable int, which is why you saw the problem in your query. If all your fields are strings, there is a