Re: [PR] minor: Add benchmark query and corresponding documentation for Average Duration [datafusion]

2025-05-19 Thread via GitHub


alamb merged PR #16105:
URL: https://github.com/apache/datafusion/pull/16105


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] minor: Add benchmark query and corresponding documentation for Average Duration [datafusion]

2025-05-19 Thread via GitHub


alamb commented on PR #16105:
URL: https://github.com/apache/datafusion/pull/16105#issuecomment-2891864844

   ```shell
   ./bench.sh run clickbench_extended
   ...
   Q8: SELECT "RegionID", "UserAgent", "OS", 
AVG(to_timestamp("ResponseEndTiming")-to_timestamp("ResponseStartTiming")) as 
avg_response_time, 
AVG(to_timestamp("ResponseEndTiming")-to_timestamp("ConnectTiming")) as 
avg_latency FROM hits GROUP BY "RegionID", "UserAgent", "OS" ORDER BY 
avg_latency DESC limit 10;
   Query 8 iteration 0 took 712.4 ms and returned 10 rows
   Query 8 iteration 1 took 674.1 ms and returned 10 rows
   Query 8 iteration 2 took 705.6 ms and returned 10 rows
   Query 8 iteration 3 took 686.4 ms and returned 10 rows
   Query 8 iteration 4 took 693.0 ms and returned 10 rows
   Query 8 avg time: 694.29 ms
   Done
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] minor: Add benchmark query and corresponding documentation for Average Duration [datafusion]

2025-05-19 Thread via GitHub


alamb commented on PR #16105:
URL: https://github.com/apache/datafusion/pull/16105#issuecomment-2891815932

   I am just running this quickly locally and then I'll merge it in
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] minor: Add benchmark query and corresponding documentation for Average Duration [datafusion]

2025-05-19 Thread via GitHub


alamb commented on code in PR #16105:
URL: https://github.com/apache/datafusion/pull/16105#discussion_r2096222012


##
benchmarks/queries/clickbench/README.md:
##
@@ -192,10 +193,46 @@ Results look like
 +-+--+--+--+
 ```
 
+### Q8: Average Latency and Response Time Analysis
+
+**Question**: Which combinations of operating system, region, and user agent 
exhibit the highest average latency? For each of these combinations, also 
report the average response time.
+
+**Important Query Properties**: Multiple average of Duration, high cardinality 
grouping
+
+```sql
+SELECT "RegionID", "UserAgent", "OS", 
AVG(to_timestamp("ResponseEndTiming")-to_timestamp("ResponseStartTiming")) as 
avg_response_time, 
AVG(to_timestamp("ResponseEndTiming")-to_timestamp("ConnectTiming")) as 
avg_latency
+FROM hits
+GROUP BY "RegionID", "UserAgent", "OS"
+ORDER BY avg_latency DESC
+LIMIT 10;
+```
+
+Results look like
+
+```
++--+---+-+--+--+
+| RegionID | UserAgent | OS  | avg_response_time| 
avg_latency  |
++--+---+-+--+--+
+| 22934| 5 | 126 | 0 days 8 hours 20 mins 0.0 secs  | 0 
days 8 hours 20 mins 0.0 secs  |
+| 22735| 82| 74  | 0 days 8 hours 20 mins 0.0 secs  | 0 
days 8 hours 20 mins 0.0 secs  |
+| 21687| 32| 49  | 0 days 8 hours 20 mins 0.0 secs  | 0 
days 8 hours 20 mins 0.0 secs  |
+| 18518| 82| 77  | 0 days 8 hours 20 mins 0.0 secs  | 0 
days 8 hours 20 mins 0.0 secs  |
+| 14006| 7 | 126 | 0 days 7 hours 58 mins 20.0 secs | 0 
days 8 hours 20 mins 0.0 secs  |
+| 9803 | 82| 77  | 0 days 8 hours 20 mins 0.0 secs  | 0 
days 8 hours 20 mins 0.0 secs  |
+| 107108   | 82| 77  | 0 days 8 hours 20 mins 0.0 secs  | 0 
days 8 hours 20 mins 0.0 secs  |
+| 111626   | 7 | 44  | 0 days 7 hours 23 mins 12.5 secs | 0 
days 8 hours 0 mins 47.0 secs  |
+| 17716| 56| 44  | 0 days 6 hours 48 mins 44.5 secs | 0 
days 7 hours 35 mins 47.0 secs |
+| 13631| 82| 45  | 0 days 7 hours 23 mins 1.0 secs  | 0 
days 7 hours 23 mins 1.0 secs  |
++--+---+-+--+--+
+10 row(s) fetched.
+Elapsed 30.195 seconds.

Review Comment:
   oof -- love that it takes 30 seconds. The new accumulator will be so much 
faster



##
benchmarks/queries/clickbench/README.md:
##
@@ -192,10 +193,46 @@ Results look like
 +-+--+--+--+
 ```
 
+### Q8: Average Latency and Response Time Analysis
+
+**Question**: Which combinations of operating system, region, and user agent 
exhibit the highest average latency? For each of these combinations, also 
report the average response time.
+
+**Important Query Properties**: Multiple average of Duration, high cardinality 
grouping

Review Comment:
   this is great -- thank you @logan-keede 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org