Re: Planning times

Rafael Jaimes III Thu, 04 Jun 2020 11:55:40 -0700

Hi Avner,

One way you might be able to optimize this is by modifying the size
and number of the parquet files. How many files do you have and how
big are they? Do you know what the row group size is? What is the HDFS
block size is on your storage?


There's probably a lot more intricate ways to improve performance with
the Drill settings, but I have not modified them.

- Rafael

On Thu, Jun 4, 2020 at 2:43 PM Avner Levy <[email protected]> wrote:
>
> I'm running Apache Drill (1.18 master branch) in a docker with data stored
> in Parquet files on S3.
> When I run queries, even the most simple ones such as:
>
> select name from `parquet/data/data.parquet` limit 1
>
> The "Planning" time is 0.7-1.5 sec while the "Execution" is only 0.112 sec.
> These proportions are maintained even if I run the same query multiple
> times in a row.
> Since I'm trying to minimize query times to a minimum, I was wondering if
> such planning times (compared to execution) make sense and is there any way
> to reduce it? (some plan caching mechanism)
> Thanks,
>   Avner

Re: Planning times

Reply via email to