Re: Planning times

Charles Givre Thu, 04 Jun 2020 18:10:31 -0700

Hi Avner, 
Maybe you said this already but what version of Drill are you using and do you 
have the metastore enabled?
--C




> On Jun 4, 2020, at 9:02 PM, Avner Levy <[email protected]> wrote:
> 
> Thanks Rafael for your answer.
> As I wrote in the previous email these planning times occur even when
> selecting one fields from one tiny file (60k) that I pass directly by full
> path (select name from `parquet/data/data.parquet` limit 1).
> Any idea what can influence the time in such a trivial scenario?
> In addition, doesn't Drill cache execution plans between similar queries
> executions?
> Best regards,
> Avner
> 
> 
> On Thu, Jun 4, 2020 at 2:55 PM Rafael Jaimes III <[email protected]>
> wrote:
> 
>> Hi Avner,
>> 
>> One way you might be able to optimize this is by modifying the size
>> and number of the parquet files. How many files do you have and how
>> big are they? Do you know what the row group size is? What is the HDFS
>> block size is on your storage?
>> 
>> There's probably a lot more intricate ways to improve performance with
>> the Drill settings, but I have not modified them.
>> 
>> - Rafael
>> 
>> On Thu, Jun 4, 2020 at 2:43 PM Avner Levy <[email protected]> wrote:
>>> 
>>> I'm running Apache Drill (1.18 master branch) in a docker with data
>> stored
>>> in Parquet files on S3.
>>> When I run queries, even the most simple ones such as:
>>> 
>>> select name from `parquet/data/data.parquet` limit 1
>>> 
>>> The "Planning" time is 0.7-1.5 sec while the "Execution" is only 0.112
>> sec.
>>> These proportions are maintained even if I run the same query multiple
>>> times in a row.
>>> Since I'm trying to minimize query times to a minimum, I was wondering if
>>> such planning times (compared to execution) make sense and is there any
>> way
>>> to reduce it? (some plan caching mechanism)
>>> Thanks,
>>>  Avner
>>

Re: Planning times

Reply via email to