Re: Running Drill as a persistent background process

2015-10-21 Thread Jacques Nadeau
The best way is to use bin/drillbit.sh start. You'll need to install Zookeeper first. This runs Drill in daemon mode. -- Jacques Nadeau CTO and Co-Founder, Dremio On Wed, Oct 21, 2015 at 10:31 PM, Assaf Lowenstein wrote: > Hello Drillers, > > How do I make Drill run as a background process on

Running Drill as a persistent background process

2015-10-21 Thread Assaf Lowenstein
Hello Drillers, How do I make Drill run as a background process on a remote server so it's available to multiple users all time? *nohup bin/drill-embedded* & doesn't work; it only prints Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 in noh

CTAS over empty file throws NPE

2015-10-21 Thread chandan prakash
Hi, I have to run CTAS on a tsv file which might in some cases be empty . In those cases its giving NPE. java.sql.SQLException: SYSTEM ERROR: NullPointerException Fragment 0:0 [Error Id: 4aa5a127-b2dd-41a0-ac49-fc2058e9564f on 192.168.0.104:31010] at org.apache.drill.jdbc.impl.DrillCursor.next

Re: Externally created Parquet files and partition pruning

2015-10-21 Thread Chris Mathews
Thanks guys this is very helpful. I now need to go away and do some more research into this. Cheers -- Chris Sent from my iPhone > On 21 Oct 2015, at 21:32, Jinfeng Ni wrote: > > For each column in the parquet files, Drill will check column metadata > and see if min == ma

Re: JDBC Storage Plugin and Postgres

2015-10-21 Thread Michael Franzkowiak
Hi, now that 1.2 is out I wanted to give the JDBC storage plugin another try with postgres (9.4). I still have the same issue: All queries to the postgres DB are prefixed with the DB name (“mydb” below) and thus fail with a “relation does not exist”. I tried not specifying a database in the c

Re: Externally created Parquet files and partition pruning

2015-10-21 Thread Jinfeng Ni
For each column in the parquet files, Drill will check column metadata and see if min == max across all parquet files. If yes, that indicates this column has a unique value for all the files, and Drill will use that column as partitioning columns. The partitioning column could be a column specifie

Re: Externally created Parquet files and partition pruning

2015-10-21 Thread rahul challapalli
Chris, Its not just sufficient to specify which column is the partition column. The data should also be organized accordingly. Below is a high level description of how partition pruning works with parquet files 1. Use CTAS with partition by clause : Here drill creates a single (or more) file for

Re: MapR Drill 1.2 Package

2015-10-21 Thread Neeraja Rentachintala
John, Chris yes, MapR Drill 1.2 is different from Apache Drill 1.2 by few fixes. The fixes are mainly related to the newly introduced JDBC storage plugin package, a great cool feature , which however came in very late in the cycle for MapR to be able to consume it and deliver to customers with qua

Re: Externally created Parquet files and partition pruning

2015-10-21 Thread Chris Mathews
We create a JSON format schema for the Parquet file using the Avro specification and use this schema when loading data. Is there anything special we have to do to flag a column as a partitioning column ? Sorry I don’t understand your answer. What do you mean by ‘discover the columns with a sing

Re: Externally created Parquet files and partition pruning

2015-10-21 Thread Mehant Baid
The information is stored in the footer of the parquet files. Drill reads the metadata information stored in the parquet footer to discover the columns with a single value and treats them as partitioning columns. Thanks Mehant On 10/21/15 11:52 AM, Chris Mathews wrote: Thank Mehant; yes we di

Re: Externally created Parquet files and partition pruning

2015-10-21 Thread Chris Mathews
Thank Mehant; yes we did look at doing this, but the advantages of using the new PARTITION BY feature is that the partitioned columns are automatically detected during any subsequent queries. This is a major advantage as our customers are using the Tableau BI tool, and knowing details such as t

Re: Externally created Parquet files and partition pruning

2015-10-21 Thread Mehant Baid
In addition to the auto partitioning done by CTAS, Drill also supports directory based pruning. You could load data into different(nested) directories underneath the top level table location and use the 'where' clause to get the pruning performance benefits. Following is a typical example Tab

Externally created Parquet files and partition pruning

2015-10-21 Thread Chris Mathews
We have an existing ETL framework processing machine generated data, which we are updating to write Parquet files out directly to HDFS using AvroParquetWriter for access by Drill. Some questions: How do we take advantage of Drill’s partition pruning capabilities with PARTITION BY if we are not

Re: Drill CTAS to single file

2015-10-21 Thread Jason Altekruse
I was just able to write out a 2.2 gig file into CSV format without Drill breaking it up into different files. I think that this safely indicate that there is no upper limit on the filesize. I did have to put in the sort in, as the reads of the input data in my case were parallelized and this cause

Re: Drill CTAS to single file

2015-10-21 Thread Abdel Hakim Deneche
Another way to do it is to let sqlline save the csv file for you, this way you won't have to worry about Drill's parallelization, but you might need to make slight changes to your storage plugin to properly read sqlline's csv files. For example, I have the following CTAS: create table e as select

Re: Dates -> Avro -> Parquet

2015-10-21 Thread Chris Mathews
Sorry for the delay in getting back on this one but we have been investigating other angles. We are using Tableau with Drill, so we have to create views in Drill to access Parquet files even though the Parquet files hold the schema. When we create the views we have to CAST the columns to the co

Re: Drill CTAS to single file

2015-10-21 Thread Jason Altekruse
For clarity, the only reason I said anything about a size limit on a CSV is that it is possible that Drill may stop writing one file and open up another in the same directory. We do this with parquet files, and I'm not sure if the behavior is the same or different for CSV files. Drill won't stop w

Re: Drill CTAS to single file

2015-10-21 Thread Jason Altekruse
When you say that you are running a succession of queries, are these queries that could be combined together using a UNION ALL statement? I don't know if there is an upper bound on the size of a CSV that we will generate, but if the reason Drill is writing multiple files is because of parallelizati

Re: Drill CTAS to single file

2015-10-21 Thread Ramana I N
You may be able to by playing around with the system/session options planner.width.max_per_query or planner.width.max_per_node , Not sure if you would want to though. Any of those options will reduce the parallelism possible either

Drill CTAS to single file

2015-10-21 Thread Boris Chmiel
Hi all, Does anyone know if there is a native way to force drill to produce only one file as a result of a CTAS ?In one of my specific use case, I run succession of queries with Drill to produce several csv result with CTAS. Many folders contains multiple files and I need to run a shell script t

RE: MapR Drill 1.2 Package

2015-10-21 Thread Andrew Brust
MapR Ships Apache Drill 1.2 in its Distribution Launches new Quick Start Solution for self-service data exploration San Jose, CA, – October 21, 2015 – MapR Technologies, Inc., provider of the top-ranked distribution for Apache™ Hadoop® that integrates web-scale enterprise storage and real-ti

Re: MapR Drill 1.2 Package

2015-10-21 Thread Christopher Matta
Good question, I'd like to know as well. I know that there were some JDBC fixes added to the official Drill 1.2 release, are those going to get back ported into the MapR release? Chris Matta cma...@mapr.com 215-701-3146 On Tue, Oct 20, 2015 at 2:36 PM, John Omernik wrote: > Hey quick questions

Re: Meta- and summary data for dynamic UIs

2015-10-21 Thread Boris Chmiel
I do hard code this in view. That not very efficient especially with CTAS as it needs a 'duplicates' code : 1 from view + 1 for the parquet file (the metadata is not handle when reading parquet file)There is an ODBC driver option: CastAnyToVarchar=true. Not sure that can help you.Boris

Re: [Design Document] Support the Ability to Identify And Skip Records when Function Evaluations Fail

2015-10-21 Thread John Omernik
AWESOME! I had just been in the process of writing up a long user story to ask for and support exactly this. I modified it and included it here: To start out, I want to say how much I love the Drill project, and the potential it has. I've put this together based on my experiences and want to c