Hi Avner, Maybe you said this already but what version of Drill are you using and do you have the metastore enabled? --C
> On Jun 4, 2020, at 9:02 PM, Avner Levy <[email protected]> wrote: > > Thanks Rafael for your answer. > As I wrote in the previous email these planning times occur even when > selecting one fields from one tiny file (60k) that I pass directly by full > path (select name from `parquet/data/data.parquet` limit 1). > Any idea what can influence the time in such a trivial scenario? > In addition, doesn't Drill cache execution plans between similar queries > executions? > Best regards, > Avner > > > On Thu, Jun 4, 2020 at 2:55 PM Rafael Jaimes III <[email protected]> > wrote: > >> Hi Avner, >> >> One way you might be able to optimize this is by modifying the size >> and number of the parquet files. How many files do you have and how >> big are they? Do you know what the row group size is? What is the HDFS >> block size is on your storage? >> >> There's probably a lot more intricate ways to improve performance with >> the Drill settings, but I have not modified them. >> >> - Rafael >> >> On Thu, Jun 4, 2020 at 2:43 PM Avner Levy <[email protected]> wrote: >>> >>> I'm running Apache Drill (1.18 master branch) in a docker with data >> stored >>> in Parquet files on S3. >>> When I run queries, even the most simple ones such as: >>> >>> select name from `parquet/data/data.parquet` limit 1 >>> >>> The "Planning" time is 0.7-1.5 sec while the "Execution" is only 0.112 >> sec. >>> These proportions are maintained even if I run the same query multiple >>> times in a row. >>> Since I'm trying to minimize query times to a minimum, I was wondering if >>> such planning times (compared to execution) make sense and is there any >> way >>> to reduce it? (some plan caching mechanism) >>> Thanks, >>> Avner >>
