[ 
https://issues.apache.org/jira/browse/IMPALA-11453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell updated IMPALA-11453:
-----------------------------------
    Summary: Add option to run-workload.py to have warm-up runs of query  (was: 
Add option to run-workload.py to have "warm-up" runs of query)

> Add option to run-workload.py to have warm-up runs of query
> -----------------------------------------------------------
>
>                 Key: IMPALA-11453
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11453
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Infrastructure
>    Affects Versions: Impala 4.2.0
>            Reporter: Joe McDonnell
>            Priority: Major
>
> bin/run-workload.py has an option to explain the query before running it the 
> first time. This gets the metadata loading out of the way so that it doesn't 
> impact the first query time.
> It would be useful to add another option that runs the query a couple times 
> to warm up any caches before starting measurement. This would reduce 
> variation due to the data not being in OS buffer caches, etc.
> In my runs of perf-AB-test, the first run of a query sometimes shows this 
> difference (for either A or B):
> {noformat}
> Run 1-3:
> 22:34:36 | TPCH-Q1  | 2022-07-20 04:16:12 | 7.52           | 1         |
> 22:34:36 | TPCH-Q1  | 2022-07-20 04:16:20 | 4.82           | 1         |
> 22:34:36 | TPCH-Q1  | 2022-07-20 04:16:25 | 5.04           | 1         |
> Run 1-3:
> 22:34:36 | TPCH-Q11 | 2022-07-20 04:23:21 | 1.12           | 1         |
> 22:34:36 | TPCH-Q11 | 2022-07-20 04:23:23 | 0.93           | 1         |
> 22:34:36 | TPCH-Q11 | 2022-07-20 04:23:23 | 0.97           | 1         |
> Run 1-3:
> 22:34:36 | TPCH-Q12 | 2022-07-20 04:24:13 | 2.23           | 1         |
> 22:34:36 | TPCH-Q12 | 2022-07-20 04:24:15 | 1.88           | 1         |
> 22:34:36 | TPCH-Q12 | 2022-07-20 04:24:17 | 1.78           | 1         
> |{noformat}
> If we ran the query a couple times before starting recordings, it would be a 
> more consistent benchmark. This seems a useful setting to use for 
> single_node_perf_run.py.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to