Drill CTAS to single file

2015-10-21 Thread Boris Chmiel
Hi all, Does anyone know if there is a native way to force drill to produce only one file as a result of a CTAS ?In one of my specific use case, I run succession of queries with Drill to produce several csv result with CTAS. Many folders contains multiple files and I need to run a shell script t

Re: Drill CTAS to single file

2015-10-21 Thread Ramana I N
You may be able to by playing around with the system/session options planner.width.max_per_query or planner.width.max_per_node , Not sure if you would want to though. Any of those options will reduce the parallelism possible either

Re: Drill CTAS to single file

2015-10-21 Thread Jason Altekruse
When you say that you are running a succession of queries, are these queries that could be combined together using a UNION ALL statement? I don't know if there is an upper bound on the size of a CSV that we will generate, but if the reason Drill is writing multiple files is because of parallelizati

Re: Drill CTAS to single file

2015-10-21 Thread Jason Altekruse
For clarity, the only reason I said anything about a size limit on a CSV is that it is possible that Drill may stop writing one file and open up another in the same directory. We do this with parquet files, and I'm not sure if the behavior is the same or different for CSV files. Drill won't stop w

Re: Drill CTAS to single file

2015-10-21 Thread Abdel Hakim Deneche
Another way to do it is to let sqlline save the csv file for you, this way you won't have to worry about Drill's parallelization, but you might need to make slight changes to your storage plugin to properly read sqlline's csv files. For example, I have the following CTAS: create table e as select

Re: Drill CTAS to single file

2015-10-21 Thread Jason Altekruse
I was just able to write out a 2.2 gig file into CSV format without Drill breaking it up into different files. I think that this safely indicate that there is no upper limit on the filesize. I did have to put in the sort in, as the reads of the input data in my case were parallelized and this cause