Avro - Let's talk Avro again

2017-08-17 Thread John Omernik
I know Avro is the unwanted child of the Drill world. (I know others have tried to mature the Avro support and that has been something that still is in a "experiemental" state. That said, isn't it time for us to clean it up? I am sure I there are some open JIRAs out there, (last Doc update on the

Re: Avro - Let's talk Avro again

2017-08-17 Thread Stefán Baxter
woha!!! (sorry, I just had to) Best of luck with that! Regards, -Stefán On Thu, Aug 17, 2017 at 12:37 PM, John Omernik wrote: > I know Avro is the unwanted child of the Drill world. (I know others have > tried to mature the Avro support and that has been something that still is > in a "exp

Re: Avro - Let's talk Avro again

2017-08-17 Thread John Omernik
I was guessing you would chime in with a response ;) Are you still using Drill w/ Avro how has things been lately? On Thu, Aug 17, 2017 at 8:00 AM, Stefán Baxter wrote: > woha!!! > > > (sorry, I just had to) > > > Best of luck with that! > > Regards, > -Stefán > > On Thu, Aug 17, 2017 at 12:37

Re: Avro - Let's talk Avro again

2017-08-17 Thread Charles Givre
I’m not an Avro user, but I’d definitely vote for improving this. — C > On Aug 17, 2017, at 10:17, John Omernik wrote: > > I was guessing you would chime in with a response ;) > > Are you still using Drill w/ Avro how has things been lately? > > On Thu, Aug 17, 2017 at 8:00 AM, Stefán Baxter

Re: Merge and save parquet files in Drill

2017-08-17 Thread Andries Engelbrecht
Do you partition the table? You may want to sort (order by) on the columns you partition, or just order by in any case on the column(s) you are most likely going to use for predicates. It increases the CTAS time, but normally will improve the query performance quite a bit. Yes a large number of

Re: Merge and save parquet files in Drill

2017-08-17 Thread John Omernik
Also, what is the cardinality of the partition field? If you have lots of partitions, you will have lots of files... On Thu, Aug 17, 2017 at 9:55 AM, Andries Engelbrecht wrote: > Do you partition the table? > You may want to sort (order by) on the columns you partition, or just > order by in an

Re: Query Optimization

2017-08-17 Thread Padma Penumarthy
It might read all those files if some new data gets added after running refresh metadata cache. If everything is same before and after metadata refresh i.e. no new data added and query is exactly the same, then it should not do that. Also, check if you can partition in a way that will not create

Re: Merge and save parquet files in Drill

2017-08-17 Thread Divya Gehlot
Hi, No way we can merge the files in Drill if creates lots of small files ? AFAIK , partitioning improves the performance as in my case partitioning is based on year,month,day.hour and querying the data also keeping partitioning column values in where clause . It should just go and read those file

Re: Query Optimization

2017-08-17 Thread Divya Gehlot
Hi , Yes its the same query its just the ran the metadata refresh command . My understanding is metadata refresh command saves reading the metadata. How about column values ... Why is it reading all the files after metedata refresh ? Partition helps to retrieve data faster . Like in hive how it hap

Re: Query Optimization

2017-08-17 Thread Khurram Faraaz
Please share your SQL query and the query plan. To get the query plan, execute EXPLAIN PLAN FOR ; Thanks, Khurram From: Divya Gehlot Sent: Friday, August 18, 2017 7:15:18 AM To: user@drill.apache.org Subject: Re: Query Optimization Hi , Yes its the same query

Re: Query Optimization

2017-08-17 Thread Padma Penumarthy
It is supposed to work like you expected. May be you are running into a bug. Why is it reading all files after metadata refresh ? That is difficult to answer without looking at the logs and query profile. If you look at the query profile, you can may be check what usedMetadataFile flag says for s

Re: Query Optimization

2017-08-17 Thread rahul challapalli
Could you be running into https://issues.apache.org/jira/browse/DRILL-3846 ? - Rahul On Thu, Aug 17, 2017 at 9:13 PM, Padma Penumarthy wrote: > It is supposed to work like you expected. May be you are running into a > bug. > Why is it reading all files after metadata refresh ? That is difficult