Hi Avner, One way you might be able to optimize this is by modifying the size and number of the parquet files. How many files do you have and how big are they? Do you know what the row group size is? What is the HDFS block size is on your storage?
There's probably a lot more intricate ways to improve performance with the Drill settings, but I have not modified them. - Rafael On Thu, Jun 4, 2020 at 2:43 PM Avner Levy <[email protected]> wrote: > > I'm running Apache Drill (1.18 master branch) in a docker with data stored > in Parquet files on S3. > When I run queries, even the most simple ones such as: > > select name from `parquet/data/data.parquet` limit 1 > > The "Planning" time is 0.7-1.5 sec while the "Execution" is only 0.112 sec. > These proportions are maintained even if I run the same query multiple > times in a row. > Since I'm trying to minimize query times to a minimum, I was wondering if > such planning times (compared to execution) make sense and is there any way > to reduce it? (some plan caching mechanism) > Thanks, > Avner
