Merge and save parquet files in Drill

2017-08-16 Thread Divya Gehlot
Hi, I have CTAS with partition on 4 columns and when I save it it creates lots of small files ~ 102290 where size of each file is in KBs . My queries are : 1.Does the lots of small files reduce the performance while reading the data in Drill ? 2.If yes ,How can I merge the small parquet files ?

Re: Merge and save parquet files in Drill

2017-08-17 Thread Andries Engelbrecht
Do you partition the table? You may want to sort (order by) on the columns you partition, or just order by in any case on the column(s) you are most likely going to use for predicates. It increases the CTAS time, but normally will improve the query performance quite a bit. Yes a large number of

Re: Merge and save parquet files in Drill

2017-08-17 Thread John Omernik
Also, what is the cardinality of the partition field? If you have lots of partitions, you will have lots of files... On Thu, Aug 17, 2017 at 9:55 AM, Andries Engelbrecht wrote: > Do you partition the table? > You may want to sort (order by) on the columns you partition, or just > order by in an

Re: Merge and save parquet files in Drill

2017-08-17 Thread Divya Gehlot
Hi, No way we can merge the files in Drill if creates lots of small files ? AFAIK , partitioning improves the performance as in my case partitioning is based on year,month,day.hour and querying the data also keeping partitioning column values in where clause . It should just go and read those file

RE: Merge and save parquet files in Drill

2017-08-18 Thread Kunal Khatua
@drill.apache.org Subject: Re: Merge and save parquet files in Drill Hi, No way we can merge the files in Drill if creates lots of small files ? AFAIK , partitioning improves the performance as in my case partitioning is based on year,month,day.hour and querying the data also keeping partitioning column