Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3245736923
##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -1075,41 +1076,54 @@ impl RowGroupsPrunedParquetOpen {
let file_metadata = Arc::clone(reader_metadata.metadata());
let rg_metadata = file_metadata.row_groups();
-// Filter pushdown: evaluate predicates during scan
-let row_filter = if let Some(predicate) = prepared
+// Filter pushdown: evaluate predicates during scan.
+// Keep the predicate around so we can rebuild RowFilter per decoder
run
+// when fully matched row groups split the scan into multiple decoders.
+let pushdown_predicate = prepared
.pushdown_filters
.then_some(prepared.predicate.clone())
-.flatten()
-{
-let row_filter = row_filter::build_row_filter(
-&predicate,
-&prepared.physical_file_schema,
-file_metadata.as_ref(),
-prepared.reorder_predicates,
-&prepared.file_metrics,
-);
+.flatten();
-match row_filter {
-Ok(Some(filter)) => Some(filter),
-Ok(None) => None,
-Err(e) => {
-debug!("Ignoring error building row filter for
'{predicate:?}': {e}");
-None
+let try_build_row_filter =
+|predicate: &Arc| -> Option {
+match row_filter::build_row_filter(
+predicate,
+&prepared.physical_file_schema,
+file_metadata.as_ref(),
+prepared.reorder_predicates,
+&prepared.file_metrics,
+) {
+Ok(Some(filter)) => Some(filter),
+Ok(None) => None,
+Err(e) => {
+debug!(
+"Ignoring error building row filter for
'{predicate:?}': {e}"
+);
+None
+}
}
-}
-} else {
-None
-};
+};
+
+// Build the first RowFilter eagerly; it will be reused for the first
Review Comment:
https://github.com/apache/datafusion/pull/22191/ the follow-up PR.
@adriangb I made the PR first, after your PR on arrow side lands on DF, we
can do more refactor!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 merged PR #21637: URL: https://github.com/apache/datafusion/pull/21637 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4456332067 @alamb @adriangb thanks for the review, let's move forward -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3245411663
##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -1075,41 +1076,54 @@ impl RowGroupsPrunedParquetOpen {
let file_metadata = Arc::clone(reader_metadata.metadata());
let rg_metadata = file_metadata.row_groups();
-// Filter pushdown: evaluate predicates during scan
-let row_filter = if let Some(predicate) = prepared
+// Filter pushdown: evaluate predicates during scan.
+// Keep the predicate around so we can rebuild RowFilter per decoder
run
+// when fully matched row groups split the scan into multiple decoders.
+let pushdown_predicate = prepared
.pushdown_filters
.then_some(prepared.predicate.clone())
-.flatten()
-{
-let row_filter = row_filter::build_row_filter(
-&predicate,
-&prepared.physical_file_schema,
-file_metadata.as_ref(),
-prepared.reorder_predicates,
-&prepared.file_metrics,
-);
+.flatten();
-match row_filter {
-Ok(Some(filter)) => Some(filter),
-Ok(None) => None,
-Err(e) => {
-debug!("Ignoring error building row filter for
'{predicate:?}': {e}");
-None
+let try_build_row_filter =
+|predicate: &Arc| -> Option {
+match row_filter::build_row_filter(
+predicate,
+&prepared.physical_file_schema,
+file_metadata.as_ref(),
+prepared.reorder_predicates,
+&prepared.file_metrics,
+) {
+Ok(Some(filter)) => Some(filter),
+Ok(None) => None,
+Err(e) => {
+debug!(
+"Ignoring error building row filter for
'{predicate:?}': {e}"
+);
+None
+}
}
-}
-} else {
-None
-};
+};
+
+// Build the first RowFilter eagerly; it will be reused for the first
Review Comment:
I'll merge the PR, then start looking into the refactor
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3245404958
##
datafusion/datasource-parquet/src/metrics.rs:
##
@@ -213,4 +213,28 @@ impl ParquetFileMetrics {
predicate_cache_records,
}
}
+
+/// Record pages whose page-index pruning was skipped because the
containing
+/// row group was fully matched by row-group statistics.
+///
+/// The counter is only registered when there is a non-zero value. This
keeps
Review Comment:
This is a nice follow-up exploration,
https://github.com/apache/datafusion/issues/22189 created an issue for this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4454204384 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4454053108) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark clickbench_partitioned.json βββββ³ββββ³ββββ³ββββ β Query β HEAD β datafusion_issue-19028-benchmark βChange β β‘ββββββββββββββββ© β QQuery 0 β 1.19 / 4.62 Β±6.74 / 18.09 ms β 1.20 / 4.70 Β±6.83 / 18.36 ms β no change β β QQuery 1 β12.70 / 13.12 Β±0.33 / 13.61 ms β12.61 / 12.96 Β±0.20 / 13.21 ms β no change β β QQuery 2 β35.85 / 36.11 Β±0.20 / 36.35 ms β36.14 / 36.54 Β±0.31 / 37.01 ms β no change β β QQuery 3 β30.33 / 30.98 Β±0.65 / 32.24 ms β30.38 / 30.76 Β±0.24 / 31.00 ms β no change β β QQuery 4 β 233.62 / 236.73 Β±3.24 / 242.46 ms β 233.83 / 237.42 Β±3.29 / 242.76 ms β no change β β QQuery 5 β 277.34 / 279.76 Β±1.77 / 281.96 ms β 277.78 / 279.58 Β±2.09 / 283.24 ms β no change β β QQuery 6 β 6.01 / 7.09 Β±0.61 / 7.72 ms β 6.36 / 7.00 Β±0.52 / 7.69 ms β no change β β QQuery 7 β13.83 / 13.89 Β±0.07 / 14.01 ms β13.83 / 13.92 Β±0.06 / 14.02 ms β no change β β QQuery 8 β 315.41 / 317.51 Β±1.42 / 319.56 ms β 314.85 / 320.12 Β±3.16 / 323.89 ms β no change β β QQuery 9 β 445.80 / 459.64 Β±8.67 / 469.08 ms β442.29 / 460.28 Β±12.42 / 477.36 ms β no change β β QQuery 10 β68.62 / 69.58 Β±0.87 / 70.83 ms β68.78 / 69.47 Β±0.54 / 70.26 ms β no change β β QQuery 11 β78.49 / 80.02 Β±1.08 / 81.49 ms β79.36 / 81.07 Β±0.99 / 82.43 ms β no change β β QQuery 12 β 271.85 / 276.18 Β±5.79 / 286.96 ms β 273.22 / 278.69 Β±3.39 / 283.51 ms β no change β β QQuery 13 β 381.80 / 390.49 Β±7.31 / 401.25 ms β 385.81 / 391.10 Β±3.41 / 394.63 ms β no change β β QQuery 14 β 278.91 / 282.63 Β±3.30 / 288.78 ms β 280.20 / 282.51 Β±3.57 / 289.60 ms β no change β
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
alamb commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3243789052
##
datafusion/datasource-parquet/src/metrics.rs:
##
@@ -213,4 +213,28 @@ impl ParquetFileMetrics {
predicate_cache_records,
}
}
+
+/// Record pages whose page-index pruning was skipped because the
containing
+/// row group was fully matched by row-group statistics.
+///
+/// The counter is only registered when there is a non-zero value. This
keeps
Review Comment:
I wonder if we should apply the same pattern to the other metrics (lazily
initialize them) -- if you can get a few percent in this query maybe it would
get us a few in the others
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4454072344 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4454053108) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4454053108-107-wf2vv 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (426154ea1ddc5d57f909d948735229c6f40398d6) to 937dfda (merge-base) [diff](https://github.com/apache/datafusion/compare/937dfdad748589aa7372848bb2a57ef04109b931..426154ea1ddc5d57f909d948735229c6f40398d6) using: clickbench_partitioned Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
alamb commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4454064121
> The issue was that ParquetFileMetrics::new created a
LazyParquetSummaryCount for page_index_pages_skipped_by_fully_matched for every
opened file.
Wild -- that seems like non trivial overhead
Looking at the code and what you changed, maybe it is because the metric
builder is expensive (it is copying strings)
```
let count = MetricBuilder::new(metrics)
.with_new_label("filename", filename.to_string())
.with_type(MetricType::Summary)
.with_category(MetricCategory::Rows)
.counter("page_index_pages_skipped_by_fully_matched", partition);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
alamb commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4454053108 run benchmark clickbench_partitioned -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4448087201 > π€ the benchmarks look slower -- maybe we can profile some of those queries and find space to get the performance back @alamb Good finding to avoid the PR introducing regression! I profiled the repeated ClickBench partitioned slow queries (`q6`, `q29`, and `q41`) on the PR build. `q29` was dominated by normal parquet decode / aggregation work (`snap::decompress`, RLE decoding, `SumAccumulator`), so I did not see a PR-specific hot spot there. `q6` was more useful: it was dominated by parquet open/planning/statistics/metrics setup rather than decode work. In particular, `ParquetFileMetrics::new`, `MetricBuilder::build`, and `LazyParquetSummaryCount` construction/destruction showed up in the sample profile. Since `q6` has no filters, this suggested the regression was from fixed per-file setup overhead rather than the fully-matched pruning path itself. The issue was that `ParquetFileMetrics::new` created a `LazyParquetSummaryCount` for `page_index_pages_skipped_by_fully_matched` for every opened file. Even though the counter was only registered on first use, constructing the lazy wrapper still cloned the filename, cloned the metrics set, and allocated an `Arc>` for every file, including queries that never used this metric. I fixed this by removing the per-file `LazyParquetSummaryCount` field entirely. Page pruning now returns the `pages_skipped_by_fully_matched` count, and the opener registers `page_index_pages_skipped_by_fully_matched` only when that count is non-zero, using the already available `PreparedParquetOpen` filename / partition / metrics context. This keeps `ParquetFileMetrics::new` off the extra allocation/clone path for the common case. Now the benchmark is good: https://github.com/apache/datafusion/pull/21637#issuecomment-4447945184 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3239348425
##
datafusion/physical-expr-common/src/metrics/value.rs:
##
@@ -1010,7 +1010,10 @@ impl MetricValue {
Self::SpilledBytes(_) => 11,
Self::SpilledRows(_) => 12,
Self::CurrentMemoryUsage(_) => 13,
-Self::Count { .. } => 14,
+Self::Count { name, .. } => match name.as_ref() {
+"page_index_pages_skipped_by_fully_matched" => 8,
Review Comment:
Added a comment explaining why this Count is ordered with the Parquet
page-index pruning metrics in EXPLAIN output.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4447945184 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4447774292) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark clickbench_partitioned.json βββββ³ββββ³ββββ³ββββ β Query β HEAD β datafusion_issue-19028-benchmark βChange β β‘ββββββββββββββββ© β QQuery 0 β 1.23 / 4.77 Β±6.88 / 18.54 ms β 1.21 / 4.73 Β±6.90 / 18.54 ms β no change β β QQuery 1 β13.14 / 13.82 Β±0.36 / 14.20 ms β13.10 / 13.57 Β±0.24 / 13.73 ms β no change β β QQuery 2 β36.41 / 36.89 Β±0.33 / 37.37 ms β35.91 / 36.47 Β±0.60 / 37.49 ms β no change β β QQuery 3 β31.18 / 32.19 Β±1.68 / 35.53 ms β30.96 / 31.25 Β±0.19 / 31.51 ms β no change β β QQuery 4 β 245.47 / 248.10 Β±1.97 / 251.52 ms β 240.31 / 243.94 Β±1.94 / 245.61 ms β no change β β QQuery 5 β 289.69 / 292.04 Β±2.09 / 295.15 ms β 283.16 / 286.21 Β±2.18 / 289.31 ms β no change β β QQuery 6 β 7.35 / 8.23 Β±1.25 / 10.70 ms β 7.14 / 7.55 Β±0.30 / 7.93 ms β +1.09x faster β β QQuery 7 β14.99 / 15.06 Β±0.05 / 15.14 ms β14.63 / 15.60 Β±1.68 / 18.95 ms β no change β β QQuery 8 β 329.84 / 332.35 Β±2.10 / 335.81 ms β 326.69 / 329.96 Β±2.83 / 334.21 ms β no change β β QQuery 9 β 473.34 / 479.60 Β±6.87 / 491.53 ms β446.50 / 465.35 Β±12.46 / 484.80 ms β no change β β QQuery 10 β72.19 / 75.45 Β±3.71 / 81.99 ms β71.45 / 76.92 Β±9.79 / 96.48 ms β no change β β QQuery 11 β82.98 / 83.75 Β±0.43 / 84.24 ms β83.61 / 85.38 Β±2.84 / 91.04 ms β no change β β QQuery 12 β 282.10 / 285.48 Β±3.94 / 291.32 ms β 281.81 / 286.78 Β±4.82 / 295.21 ms β no change β β QQuery 13 β 398.45 / 409.27 Β±9.41 / 422.68 ms β 401.41 / 412.17 Β±7.60 / 424.18 ms β no change β β QQuery 14 β287.61 / 296.17 Β±12.69 / 321.41 ms β 288.95 / 294.25 Β±5.60 / 301.78 ms β no change β
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4447801562 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4447774292) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4447774292-60-bc7n5 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (d0b4c309746bc29830ed398daf55d775e08e5b83) to 937dfda (merge-base) [diff](https://github.com/apache/datafusion/compare/937dfdad748589aa7372848bb2a57ef04109b931..d0b4c309746bc29830ed398daf55d775e08e5b83) using: clickbench_partitioned Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4447774292 run benchmark clickbench_partitioned -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
alamb commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4442878086 π€ the benchmarks look slower -- maybe we can profile some of those queries and find space to get the performance back -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangb commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3235585653
##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -1075,41 +1076,54 @@ impl RowGroupsPrunedParquetOpen {
let file_metadata = Arc::clone(reader_metadata.metadata());
let rg_metadata = file_metadata.row_groups();
-// Filter pushdown: evaluate predicates during scan
-let row_filter = if let Some(predicate) = prepared
+// Filter pushdown: evaluate predicates during scan.
+// Keep the predicate around so we can rebuild RowFilter per decoder
run
+// when fully matched row groups split the scan into multiple decoders.
+let pushdown_predicate = prepared
.pushdown_filters
.then_some(prepared.predicate.clone())
-.flatten()
-{
-let row_filter = row_filter::build_row_filter(
-&predicate,
-&prepared.physical_file_schema,
-file_metadata.as_ref(),
-prepared.reorder_predicates,
-&prepared.file_metrics,
-);
+.flatten();
-match row_filter {
-Ok(Some(filter)) => Some(filter),
-Ok(None) => None,
-Err(e) => {
-debug!("Ignoring error building row filter for
'{predicate:?}': {e}");
-None
+let try_build_row_filter =
+|predicate: &Arc| -> Option {
+match row_filter::build_row_filter(
+predicate,
+&prepared.physical_file_schema,
+file_metadata.as_ref(),
+prepared.reorder_predicates,
+&prepared.file_metrics,
+) {
+Ok(Some(filter)) => Some(filter),
+Ok(None) => None,
+Err(e) => {
+debug!(
+"Ignoring error building row filter for
'{predicate:?}': {e}"
+);
+None
+}
}
-}
-} else {
-None
-};
+};
+
+// Build the first RowFilter eagerly; it will be reused for the first
Review Comment:
This is the arrow-rs PR in case the APIs are interested to inform direction
here: https://github.com/apache/arrow-rs/pull/9968
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4442571294 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4442420069) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark clickbench_partitioned.json βββββ³ββββ³ββββ³ββββ β Query β HEAD β datafusion_issue-19028-benchmark βChange β β‘ββββββββββββββββ© β QQuery 0 β 1.20 / 4.71 Β±6.87 / 18.46 ms β 1.19 / 4.62 Β±6.74 / 18.09 ms β no change β β QQuery 1 β12.84 / 13.04 Β±0.13 / 13.23 ms β12.79 / 12.99 Β±0.15 / 13.23 ms β no change β β QQuery 2 β35.68 / 36.12 Β±0.41 / 36.79 ms β35.68 / 35.95 Β±0.33 / 36.59 ms β no change β β QQuery 3 β30.70 / 31.50 Β±0.99 / 33.40 ms β30.63 / 30.94 Β±0.41 / 31.69 ms β no change β β QQuery 4 β 233.60 / 235.45 Β±2.63 / 240.67 ms β 230.21 / 235.18 Β±3.84 / 239.83 ms β no change β β QQuery 5 β 275.77 / 277.86 Β±1.64 / 280.62 ms β 279.33 / 280.44 Β±0.75 / 281.28 ms β no change β β QQuery 6 β 6.26 / 7.05 Β±0.51 / 7.84 ms β 6.97 / 7.56 Β±0.56 / 8.34 ms β 1.07x slower β β QQuery 7 β13.87 / 14.08 Β±0.14 / 14.25 ms β13.88 / 14.09 Β±0.11 / 14.18 ms β no change β β QQuery 8 β 310.54 / 314.08 Β±3.36 / 319.63 ms β 314.54 / 318.09 Β±3.44 / 324.00 ms β no change β β QQuery 9 β446.17 / 467.23 Β±17.98 / 494.44 ms β 451.61 / 461.65 Β±8.16 / 472.23 ms β no change β β QQuery 10 β71.15 / 71.45 Β±0.27 / 71.87 ms β69.47 / 70.26 Β±0.56 / 71.12 ms β no change β β QQuery 11 β81.97 / 82.78 Β±0.56 / 83.59 ms β81.58 / 82.28 Β±0.49 / 82.86 ms β no change β β QQuery 12 β 284.00 / 289.68 Β±3.83 / 295.31 ms β 273.06 / 275.18 Β±2.78 / 280.33 ms β +1.05x faster β β QQuery 13 β 383.15 / 400.48 Β±9.19 / 410.48 ms β 377.97 / 388.01 Β±6.64 / 395.28 ms β no change β β QQuery 14 β 275.41 / 284.01 Β±6.85 / 293.85 ms β 279.69 / 282.28 Β±2.52 / 287.02 ms β no change β
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4442442500 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4442420069) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4442420069-34-94mwb 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (67a0526e7d928b701c50c2d0100ddce375a328ae) to 937dfda (merge-base) [diff](https://github.com/apache/datafusion/compare/937dfdad748589aa7372848bb2a57ef04109b931..67a0526e7d928b701c50c2d0100ddce375a328ae) using: clickbench_partitioned Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
alamb commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4442437610 I want to make sure the slowdowns on clickbench_partitioned in https://github.com/apache/datafusion/pull/21637#issuecomment-4440926604 are not reproducable > ``` > Comparing HEAD and datafusion_issue-19028-benchmark > > Benchmark clickbench_partitioned.json > > βββββ³ββββ³ββββ³ββββ > β Query β HEAD β datafusion_issue-19028-benchmark βChange β > β‘ββββββββββββββββ© ... > β QQuery 6 β 6.87 / 7.54 Β±0.43 / 8.23 ms β 7.39 / 8.23 Β±0.50 / 8.93 ms β 1.09x slower β > β QQuery 7 β14.68 / 14.85 Β±0.14 / 15.01 ms β14.56 / 15.87 Β±2.19 / 20.23 ms β 1.07x slower β > β QQuery 8 β 321.11 / 330.67 Β±8.00 / 340.69 ms β 353.25 / 355.63 Β±1.68 / 357.75 ms β 1.08x slower β > β QQuery 9 β 511.37 / 522.69 Β±8.03 / 534.54 ms β463.77 / 486.85 Β±19.75 / 519.75 ms β +1.07x faster β > β QQuery 10 β73.68 / 74.86 Β±0.75 / 75.89 ms β70.46 / 71.76 Β±0.73 / 72.63 ms β no change β > β QQuery 11 β82.81 / 84.92 Β±1.32 / 86.03 ms β81.43 / 82.40 Β±0.98 / 84.28 ms β no change β > β QQuery 12 β288.31 / 303.84 Β±12.04 / 323.63 ms β 302.90 / 308.51 Β±4.72 / 316.56 ms β no change β > β QQuery 13 β 388.65 / 402.24 Β±9.41 / 416.80 ms β415.00 / 427.92 Β±10.87 / 447.70 ms β 1.06x slower β > β QQuery 14 β 282.25 / 285.05 Β±2.06 / 287.60 ms β 304.01 / 309.51 Β±5.09 / 317.91 ms β 1.09x slower β > β QQuery 15 β 286.19 / 290.24 Β±2.51 / 293.56 ms β 308.11 / 317.91 Β±6.69 / 327.52 ms β 1.10x slower β > β QQuery 16 β617.93 / 661.36 Β±26.32 / 686.84 ms β 638.82 / 648.64 Β±6.42 / 654.82 ms β no change β > ... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
alamb commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4442420069 run benchmark clickbench_partitioned -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4440926604 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4440697825) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark clickbench_partitioned.json βββββ³ββββ³ββββ³ββββ β Query β HEAD β datafusion_issue-19028-benchmark βChange β β‘ββββββββββββββββ© β QQuery 0 β 1.28 / 4.86 Β±6.96 / 18.77 ms β 1.30 / 4.93 Β±7.06 / 19.04 ms β no change β β QQuery 1 β13.03 / 13.41 Β±0.27 / 13.76 ms β13.02 / 13.55 Β±0.29 / 13.86 ms β no change β β QQuery 2 β36.21 / 36.67 Β±0.35 / 37.23 ms β36.84 / 37.21 Β±0.39 / 37.93 ms β no change β β QQuery 3 β31.95 / 32.94 Β±0.78 / 34.32 ms β32.69 / 33.24 Β±0.31 / 33.55 ms β no change β β QQuery 4 β 260.15 / 261.68 Β±1.86 / 265.30 ms β 270.03 / 273.33 Β±2.88 / 277.72 ms β no change β β QQuery 5 β 292.78 / 298.05 Β±3.16 / 302.12 ms β 303.23 / 306.71 Β±3.04 / 310.80 ms β no change β β QQuery 6 β 6.87 / 7.54 Β±0.43 / 8.23 ms β 7.39 / 8.23 Β±0.50 / 8.93 ms β 1.09x slower β β QQuery 7 β14.68 / 14.85 Β±0.14 / 15.01 ms β14.56 / 15.87 Β±2.19 / 20.23 ms β 1.07x slower β β QQuery 8 β 321.11 / 330.67 Β±8.00 / 340.69 ms β 353.25 / 355.63 Β±1.68 / 357.75 ms β 1.08x slower β β QQuery 9 β 511.37 / 522.69 Β±8.03 / 534.54 ms β463.77 / 486.85 Β±19.75 / 519.75 ms β +1.07x faster β β QQuery 10 β73.68 / 74.86 Β±0.75 / 75.89 ms β70.46 / 71.76 Β±0.73 / 72.63 ms β no change β β QQuery 11 β82.81 / 84.92 Β±1.32 / 86.03 ms β81.43 / 82.40 Β±0.98 / 84.28 ms β no change β β QQuery 12 β288.31 / 303.84 Β±12.04 / 323.63 ms β 302.90 / 308.51 Β±4.72 / 316.56 ms β no change β β QQuery 13 β 388.65 / 402.24 Β±9.41 / 416.80 ms β415.00 / 427.92 Β±10.87 / 447.70 ms β 1.06x slower β β QQuery 14 β 282.25 / 285.05 Β±2.06 / 287.60 ms β 304.01 / 309.51 Β±5.09 / 317.91 ms β 1.09x slower β
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4440889301 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4440697825) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark tpcds_sf1.json βββββ³ββββ³ββββ³ββββ β Query β HEAD β datafusion_issue-19028-benchmark βChange β β‘ββββββββββββββββ© β QQuery 1 β 6.33 / 6.81 Β±0.86 / 8.52 ms β 6.25 / 6.81 Β±0.88 / 8.56 ms β no change β β QQuery 2 β81.34 / 81.89 Β±0.29 / 82.17 ms β81.65 / 81.95 Β±0.25 / 82.35 ms β no change β β QQuery 3 β29.06 / 29.29 Β±0.19 / 29.57 ms β29.19 / 29.48 Β±0.23 / 29.90 ms β no change β β QQuery 4 β 506.78 / 513.30 Β±5.56 / 522.50 ms β 508.53 / 511.81 Β±2.20 / 515.01 ms β no change β β QQuery 5 β53.10 / 53.36 Β±0.24 / 53.81 ms β52.90 / 53.18 Β±0.41 / 54.00 ms β no change β β QQuery 6 β35.62 / 35.84 Β±0.29 / 36.41 ms β35.31 / 35.81 Β±0.35 / 36.19 ms β no change β β QQuery 7 β 110.34 / 111.13 Β±1.03 / 113.13 ms β 109.80 / 110.44 Β±0.92 / 112.26 ms β no change β β QQuery 8 β38.83 / 39.14 Β±0.39 / 39.89 ms β38.86 / 39.16 Β±0.21 / 39.45 ms β no change β β QQuery 9 β53.43 / 55.60 Β±1.94 / 58.99 ms β55.54 / 57.53 Β±1.41 / 59.59 ms β no change β β QQuery 10 β80.81 / 82.01 Β±1.90 / 85.80 ms β81.42 / 81.74 Β±0.20 / 81.96 ms β no change β β QQuery 11 β 315.07 / 318.95 Β±2.12 / 321.23 ms β 313.21 / 316.45 Β±2.94 / 321.17 ms β no change β β QQuery 12 β28.90 / 29.35 Β±0.33 / 29.76 ms β28.69 / 29.01 Β±0.35 / 29.60 ms β no change β β QQuery 13 β 128.82 / 129.14 Β±0.36 / 129.84 ms β 129.01 / 129.37 Β±0.24 / 129.65 ms β no change β β QQuery 14 β 513.80 / 516.32 Β±2.70 / 520.36 ms β517.32 / 524.34 Β±11.68 / 547.54 ms β no change β β QQuery 15 β61.20 / 62.38 Β±0.70 / 63.19 ms β60.60 / 61.03 Β±0.37 / 61.55 ms β no change β β QQuery 16
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4440872892 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4440697825) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark tpch_sf1.json βββββ³β³βββ³ββββ β Query β HEAD β datafusion_issue-19028-benchmark βChange β β‘ββββββββββββ© β QQuery 1 β 39.06 / 40.96 Β±2.11 / 44.35 ms β 38.85 / 39.75 Β±0.94 / 41.56 ms β no change β β QQuery 2 β 20.61 / 21.19 Β±0.73 / 22.62 ms β 20.49 / 20.96 Β±0.37 / 21.54 ms β no change β β QQuery 3 β 36.53 / 37.31 Β±0.93 / 38.78 ms β 35.50 / 36.73 Β±0.66 / 37.41 ms β no change β β QQuery 4 β 17.93 / 18.52 Β±0.76 / 20.01 ms β 18.26 / 18.33 Β±0.08 / 18.47 ms β no change β β QQuery 5 β 43.89 / 45.24 Β±1.08 / 46.84 ms β 43.88 / 45.68 Β±1.47 / 47.26 ms β no change β β QQuery 6 β 16.81 / 17.94 Β±1.11 / 19.65 ms β 16.76 / 17.16 Β±0.36 / 17.78 ms β no change β β QQuery 7 β 48.95 / 50.65 Β±1.18 / 52.08 ms β 50.24 / 51.92 Β±1.83 / 55.43 ms β no change β β QQuery 8 β 45.74 / 46.53 Β±0.63 / 47.49 ms β 45.99 / 46.18 Β±0.15 / 46.44 ms β no change β β QQuery 9 β 50.89 / 52.03 Β±0.84 / 53.39 ms β 51.32 / 52.45 Β±1.06 / 54.18 ms β no change β β QQuery 10 β 64.67 / 65.51 Β±1.07 / 67.50 ms β 65.23 / 65.62 Β±0.63 / 66.88 ms β no change β β QQuery 11 β 13.84 / 14.19 Β±0.48 / 15.12 ms β 13.95 / 14.43 Β±0.53 / 15.43 ms β no change β β QQuery 12 β 25.62 / 26.03 Β±0.32 / 26.46 ms β 25.38 / 26.05 Β±0.47 / 26.74 ms β no change β β QQuery 13 β 35.94 / 36.88 Β±0.68 / 37.72 ms β 35.54 / 36.22 Β±0.56 / 37.09 ms β no change β β QQuery 14 β 26.09 / 26.28 Β±0.16 / 26.53 ms β 26.11 / 26.34 Β±0.18 / 26.63 ms β no change β β QQuery 15 β 32.08 / 32.32 Β±0.20 / 32.68 ms β 32.26 / 33.49 Β±1.22 / 35.44 ms β no change β β QQuery 16 β 14.94 / 15.14 Β±0.11 / 15.22 ms β 15.17 / 15.57 Β±0.37 / 16.08 ms β no change β β QQuery 17 β 76.74 / 78.32 Β±2.21 / 82.70 ms β 76.21 / 78.27 Β±1.55 / 80.27 ms β no change β β QQuery 18 β 68.30 / 69.77 Β±0.85 / 70.79 ms β 68.97 / 69.89 Β±0.64 / 70.58 ms β no change β β QQue
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4440729122 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4440697825) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4440697825-31-88b5h 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (67a0526e7d928b701c50c2d0100ddce375a328ae) to 937dfda (merge-base) [diff](https://github.com/apache/datafusion/compare/937dfdad748589aa7372848bb2a57ef04109b931..67a0526e7d928b701c50c2d0100ddce375a328ae) using: tpch Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4440726209 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4440697825) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4440697825-29-hmbw2 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (67a0526e7d928b701c50c2d0100ddce375a328ae) to 937dfda (merge-base) [diff](https://github.com/apache/datafusion/compare/937dfdad748589aa7372848bb2a57ef04109b931..67a0526e7d928b701c50c2d0100ddce375a328ae) using: clickbench_partitioned Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4440721425 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4440697825) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4440697825-30-mbkj8 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (67a0526e7d928b701c50c2d0100ddce375a328ae) to 937dfda (merge-base) [diff](https://github.com/apache/datafusion/compare/937dfdad748589aa7372848bb2a57ef04109b931..67a0526e7d928b701c50c2d0100ddce375a328ae) using: tpcds Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
alamb commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4440697825 run benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
alamb commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3233928827
##
datafusion/physical-expr-common/src/metrics/value.rs:
##
@@ -1010,7 +1010,10 @@ impl MetricValue {
Self::SpilledBytes(_) => 11,
Self::SpilledRows(_) => 12,
Self::CurrentMemoryUsage(_) => 13,
-Self::Count { .. } => 14,
+Self::Count { name, .. } => match name.as_ref() {
+"page_index_pages_skipped_by_fully_matched" => 8,
Review Comment:
this may be worth a comment to explain why it is special casing
page_index_pages_skipped_by_fully_matched
##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -1075,41 +1076,54 @@ impl RowGroupsPrunedParquetOpen {
let file_metadata = Arc::clone(reader_metadata.metadata());
let rg_metadata = file_metadata.row_groups();
-// Filter pushdown: evaluate predicates during scan
-let row_filter = if let Some(predicate) = prepared
+// Filter pushdown: evaluate predicates during scan.
+// Keep the predicate around so we can rebuild RowFilter per decoder
run
+// when fully matched row groups split the scan into multiple decoders.
+let pushdown_predicate = prepared
.pushdown_filters
.then_some(prepared.predicate.clone())
-.flatten()
-{
-let row_filter = row_filter::build_row_filter(
-&predicate,
-&prepared.physical_file_schema,
-file_metadata.as_ref(),
-prepared.reorder_predicates,
-&prepared.file_metrics,
-);
+.flatten();
-match row_filter {
-Ok(Some(filter)) => Some(filter),
-Ok(None) => None,
-Err(e) => {
-debug!("Ignoring error building row filter for
'{predicate:?}': {e}");
-None
+let try_build_row_filter =
+|predicate: &Arc| -> Option {
+match row_filter::build_row_filter(
+predicate,
+&prepared.physical_file_schema,
+file_metadata.as_ref(),
+prepared.reorder_predicates,
+&prepared.file_metrics,
+) {
+Ok(Some(filter)) => Some(filter),
+Ok(None) => None,
+Err(e) => {
+debug!(
+"Ignoring error building row filter for
'{predicate:?}': {e}"
+);
+None
+}
}
-}
-} else {
-None
-};
+};
+
+// Build the first RowFilter eagerly; it will be reused for the first
Review Comment:
SOunds good.
I think @adriangb was also talking recently about restructuing the Parquet
opener so it could decide more dynamically decide how to evaluate predicates
(in this case for example it decides not to evaluate a predicate at all). He
was also thinking we could dynamically choose between pushdown predicate into
the scan or not
no action required for this PR, I am just commenting here that we seem to be
treding in this direction
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
alamb commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4440631526 @adriangb and I were talking about this PR last night. I am checking it out -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4394407082 > > @alamb thanks for the review, before getting the PR in, I think it's better to have your look for the comment [#21637 (comment)](https://github.com/apache/datafusion/pull/21637#discussion_r3156327107), and it's fix commit: [da7db27](https://github.com/apache/datafusion/commit/da7db27a6b51345991d67907b9985a0d67224153) (this is the lowest cost way I found to fix the metric. Let me know if you have other thoughts) > > Maybe we should just add a new metric on ParquetScanMetrics π€ > > https://github.com/apache/datafusion/blob/4c909bafc5c50749884fdd80a06235d7bd72dbde/datafusion/datasource-parquet/src/metrics.rs#L30 Thanks @alamb, I agree that adding a separate metric is cleaner. I changed the PR https://github.com/apache/datafusion/pull/21637/commits/3f2401e0b422e2ddb590660626fc1716c84a22ae to keep `page_index_pages_pruned` reporting only pages that were actually evaluated by page-index pruning, and added `page_index_pages_skipped_by_fully_matched` for pages where page-index pruning was skipped because row-group statistics already proved the row group was fully matched. For example, the metrics can now look like: ```text row_groups_pruned_statistics=4 total β 3 matched -> 1 fully matched, page_index_pages_pruned=2 total β 2 matched, page_index_pages_skipped_by_fully_matched=1 ``` I would read this as: 1. row-group statistics evaluated 4 row groups, 3 matched, and 1 of those was fully matched; 2. page-index pruning actually evaluated 2 pages, and both matched; 3. 1 additional page belonged to the fully matched row group, so page-index pruning was skipped for that page. The page is still scanned; only page-index predicate evaluation was skipped. This avoids counting statistics-derived fully matched pages as page-index matched pages. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
github-actions[bot] commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4394375026 Thank you for opening this pull request! Reviewer note: [cargo-semver-checks](https://github.com/obi1kenobi/cargo-semver-checks) reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details ``` Cloning apache/main Building datafusion-datasource-parquet v53.1.0 (current) Built [ 43.029s] (current) Parsing datafusion-datasource-parquet v53.1.0 (current) Parsed [ 0.026s] (current) Building datafusion-datasource-parquet v53.1.0 (baseline) Built [ 42.813s] (baseline) Parsing datafusion-datasource-parquet v53.1.0 (baseline) Parsed [ 0.025s] (baseline) Checking datafusion-datasource-parquet v53.1.0 -> v53.1.0 (no change; assume patch) Checked [ 0.142s] 222 checks: 220 pass, 2 fail, 0 warn, 30 skip --- failure auto_trait_impl_removed: auto trait no longer implemented --- Description: A public type has stopped implementing one or more auto traits. This can break downstream code that depends on the traits being implemented. ref: https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/auto_trait_impl_removed.ron Failed in: type ParquetFileMetrics is no longer UnwindSafe, in /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/metrics.rs:31 type ParquetFileMetrics is no longer RefUnwindSafe, in /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/metrics.rs:31 --- failure constructible_struct_adds_private_field: struct no longer constructible due to new private field --- Description: A struct constructible with a struct literal has a new non-public field. It can no longer be constructed using a struct literal outside of its crate. ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_private_field.ron Failed in: field ParquetFileMetrics.page_index_pages_skipped_by_fully_matched in /home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/metrics.rs:75 Summary semver requires new major version: 2 major and 0 minor checks failed Finished [ 88.643s] datafusion-datasource-parquet Building datafusion-physical-expr-common v53.1.0 (current) Built [ 20.482s] (current) Parsing datafusion-physical-expr-common v53.1.0 (current) Parsed [ 0.020s] (current) Building datafusion-physical-expr-common v53.1.0 (baseline) Built [ 20.210s] (baseline) Parsing datafusion-physical-expr-common v53.1.0 (baseline) Parsed [ 0.020s] (baseline) Checking datafusion-physical-expr-common v53.1.0 -> v53.1.0 (no change; assume patch) Checked [ 0.196s] 222 checks: 222 pass, 30 skip Summary no semver update required Finished [ 42.486s] datafusion-physical-expr-common Building datafusion-sqllogictest v53.1.0 (current) Built [ 136.503s] (current) Parsing datafusion-sqllogictest v53.1.0 (current) Parsed [ 0.022s] (current) Building datafusion-sqllogictest v53.1.0 (baseline) Built [ 135.436s] (baseline) Parsing datafusion-sqllogictest v53.1.0 (baseline) Parsed [ 0.023s] (baseline) Checking datafusion-sqllogictest v53.1.0 -> v53.1.0 (no change; assume patch) Checked [ 0.085s] 222 checks: 222 pass, 30 skip Summary no semver update required Finished [ 277.236s] datafusion-sqllogictest ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3199127552
##
datafusion/datasource-parquet/src/metrics.rs:
##
@@ -67,6 +68,11 @@ pub struct ParquetFileMetrics {
pub page_index_rows_pruned: PruningMetrics,
/// Total pages filtered or matched by parquet page index
pub page_index_pages_pruned: PruningMetrics,
+/// Lazily registered counter for pages whose page-index pruning was
skipped
+/// because the containing row group was fully matched by row-group
statistics.
+///
+/// These pages are still scanned; only page-index predicate evaluation is
skipped.
+page_index_pages_skipped_by_fully_matched: LazyParquetSummaryCount,
Review Comment:
It is registered lazily so normal Parquet scans do not show an extra
`page_index_pages_skipped_by_fully_matched=0` metric.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
alamb commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4390663385 > @alamb thanks for the review, before getting the PR in, I think it's better to have your look for the comment [#21637 (comment)](https://github.com/apache/datafusion/pull/21637#discussion_r3156327107), and it's fix commit: [da7db27](https://github.com/apache/datafusion/commit/da7db27a6b51345991d67907b9985a0d67224153) (this is the lowest cost way I found to fix the metric. Let me know if you have other thoughts) Maybe we should just add a new metric on ParquetScanMetrics π€ https://github.com/apache/datafusion/blob/4c909bafc5c50749884fdd80a06235d7bd72dbde/datafusion/datasource-parquet/src/metrics.rs#L30 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4385445616 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4385338380) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark clickbench_extended.json βββββ³ββββ³ββββ³β β Query β HEAD β datafusion_issue-19028-benchmark β Change β β‘βββββββββββββ© β QQuery 0 β 811.88 / 821.08 Β±7.70 / 834.76 ms β800.65 / 816.09 Β±10.06 / 831.13 ms β no change β β QQuery 1 β 196.31 / 198.30 Β±3.50 / 205.30 ms β 191.05 / 194.30 Β±5.52 / 205.30 ms β no change β β QQuery 2 β 483.55 / 487.48 Β±3.15 / 492.28 ms β 469.04 / 470.24 Β±1.37 / 472.90 ms β no change β β QQuery 3 β 310.29 / 311.17 Β±0.67 / 312.01 ms β 308.38 / 310.78 Β±2.99 / 316.41 ms β no change β β QQuery 4 β 661.10 / 671.06 Β±5.58 / 677.71 ms β 663.47 / 677.74 Β±9.50 / 693.45 ms β no change β β QQuery 5 β 10480.76 / 10749.26 Β±135.39 / 10838.42 ms β 10381.10 / 10707.66 Β±229.67 / 11069.32 ms β no change β β QQuery 6 β 29.83 / 41.34 Β±15.31 / 69.33 ms β28.00 / 32.99 Β±8.92 / 50.81 ms β +1.25x faster β β QQuery 7 β771.70 / 787.65 Β±13.90 / 803.77 ms β750.14 / 768.13 Β±16.43 / 787.55 ms β no change β β QQuery 8 β378.07 / 403.59 Β±35.92 / 474.57 ms β380.54 / 399.98 Β±32.19 / 464.04 ms β no change β β QQuery 9 β 2872.22 / 2922.21 Β±31.65 / 2960.94 ms β 2795.45 / 2887.03 Β±72.17 / 2963.15 ms β no change β β QQuery 10 β 641.11 / 647.49 Β±4.55 / 653.68 ms β643.03 / 671.47 Β±43.17 / 757.36 ms β no change β β QQuery 11 β 2185.14 / 2209.10 Β±19.59 / 2233.01 ms β 2165.30 / 2213.30 Β±42.41 / 2275.07 ms β no change β β QQuery 12 β197.95 / 215.56 Β±29.02 / 273.46 ms β193.75 / 212.64 Β±25.73 / 262.96 ms β no change β β QQuery 13 β540.86 / 560.53 Β±11.38 / 571.47 ms β14.03 / 14.18 Β±
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4385353090 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4385338380) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4385338380-2036-5z6mw 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (da7db27a6b51345991d67907b9985a0d67224153) to ba038e9 (merge-base) [diff](https://github.com/apache/datafusion/compare/ba038e99c861ac4cb034c7159167c3a91a8ea740..da7db27a6b51345991d67907b9985a0d67224153) using: clickbench_extended Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4385338380 run benchmark clickbench_extended ``` env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4385334800 @alamb thanks for the review, before getting the PR in, I think it's better to have your look for the comment https://github.com/apache/datafusion/pull/21637#discussion_r3156327107, and it's fix commit: https://github.com/apache/datafusion/pull/21637/commits/da7db27a6b51345991d67907b9985a0d67224153 (this is the lowest cost way I found to fix the metric. Let me know if you have other thoughts) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3192608315
##
datafusion/sqllogictest/test_files/limit_pruning.slt:
##
@@ -63,7 +63,55 @@ set datafusion.explain.analyze_level = summary;
query TT
explain analyze select * from tracking_data where species > 'M' AND s >= 50
limit 3;
-Plan with Metrics DataSourceExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/limit_pruning/data.parquet]]},
projection=[species, s], limit=3, file_type=parquet, predicate=species@0 > M
AND s@1 >= 50, pruning_predicate=species_null_count@1 != row_count@2 AND
species_max@0 > M AND s_null_count@4 != row_count@2 AND s_max@3 >= 50,
required_guarantees=[], metrics=[output_rows=3, elapsed_compute=,
output_bytes=, files_ranges_pruned_statistics=1 total β 1 matched,
row_groups_pruned_statistics=4 total β 3 matched -> 1 fully matched,
row_groups_pruned_bloom_filter=3 total β 3 matched, page_index_pages_pruned=2
total β 2 matched, limit_pruned_row_groups=2 total β 0 matched,
bytes_scanned=, metadata_load_time=,
scan_efficiency_ratio= (171/2.35 K)]
+Plan with Metrics DataSourceExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/limit_pruning/data.parquet]]},
projection=[species, s], limit=3, file_type=parquet, predicate=species@0 > M
AND s@1 >= 50, pruning_predicate=species_null_count@1 != row_count@2 AND
species_max@0 > M AND s_null_count@4 != row_count@2 AND s_max@3 >= 50,
required_guarantees=[], metrics=[output_rows=3, elapsed_compute=,
output_bytes=, files_ranges_pruned_statistics=1 total β 1 matched,
row_groups_pruned_statistics=4 total β 3 matched -> 1 fully matched,
row_groups_pruned_bloom_filter=3 total β 3 matched, page_index_pages_pruned=0
total β 0 matched, limit_pruned_row_groups=2 total β 0 matched,
bytes_scanned=, metadata_load_time=,
scan_efficiency_ratio= (171/2.35 K)]
+
+statement ok
+CREATE TABLE fully_matched_limit_source AS VALUES
+ (1),
+ (2),
+ (3),
+ (4),
+ (5),
+ (6),
+ (7),
+ (1),
+ (2);
+
+query I
+COPY (SELECT column1 as a FROM fully_matched_limit_source)
+TO 'test_files/scratch/limit_pruning/fully_matched_limit.parquet'
+STORED AS PARQUET
+OPTIONS (
+ 'format.max_row_group_size' '3'
+);
+
+9
+
+statement ok
+drop table fully_matched_limit_source;
+
+statement ok
+CREATE EXTERNAL TABLE fully_matched_limit
+STORED AS PARQUET
+LOCATION 'test_files/scratch/limit_pruning/fully_matched_limit.parquet';
+
+# One fully matched row group sits between two filtered row groups.
+# LIMIT must apply across the entire scan, not once per decoder run.
+query TT
+explain analyze select a from fully_matched_limit where a >= 3 limit 4;
+
+Plan with Metrics DataSourceExec: metrics=[output_rows=4,
row_groups_pruned_statistics=3 total β 3 matched -> 1 fully
matched]
+
+query I
+select a from fully_matched_limit where a >= 3 limit 4;
Review Comment:
done
https://github.com/apache/datafusion/pull/21637/changes/3aa4a4700e7a876f3db4686ae906f0db05ddfc99
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3192595733
##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -1075,41 +1076,54 @@ impl RowGroupsPrunedParquetOpen {
let file_metadata = Arc::clone(reader_metadata.metadata());
let rg_metadata = file_metadata.row_groups();
-// Filter pushdown: evaluate predicates during scan
-let row_filter = if let Some(predicate) = prepared
+// Filter pushdown: evaluate predicates during scan.
+// Keep the predicate around so we can rebuild RowFilter per decoder
run
+// when fully matched row groups split the scan into multiple decoders.
+let pushdown_predicate = prepared
.pushdown_filters
.then_some(prepared.predicate.clone())
-.flatten()
-{
-let row_filter = row_filter::build_row_filter(
-&predicate,
-&prepared.physical_file_schema,
-file_metadata.as_ref(),
-prepared.reorder_predicates,
-&prepared.file_metrics,
-);
+.flatten();
-match row_filter {
-Ok(Some(filter)) => Some(filter),
-Ok(None) => None,
-Err(e) => {
-debug!("Ignoring error building row filter for
'{predicate:?}': {e}");
-None
+let try_build_row_filter =
+|predicate: &Arc| -> Option {
+match row_filter::build_row_filter(
+predicate,
+&prepared.physical_file_schema,
+file_metadata.as_ref(),
+prepared.reorder_predicates,
+&prepared.file_metrics,
+) {
+Ok(Some(filter)) => Some(filter),
+Ok(None) => None,
+Err(e) => {
+debug!(
+"Ignoring error building row filter for
'{predicate:?}': {e}"
+);
+None
+}
}
-}
-} else {
-None
-};
+};
+
+// Build the first RowFilter eagerly; it will be reused for the first
Review Comment:
Thanks, that makes sense. I agree both would make the code easier to read.
Since this PR is already focused on the fully matched row group behavior,
Iβll keep this as-is here and follow up with a small cleanup PR to introduce
helper(s) for RowFilter generation / decoder building.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3192575652
##
benchmarks/src/bin/dfbench.rs:
##
@@ -20,16 +20,13 @@ use datafusion::error::Result;
use clap::{Parser, Subcommand};
-#[cfg(all(feature = "snmalloc", feature = "mimalloc"))]
-compile_error!(
-"feature \"snmalloc\" and feature \"mimalloc\" cannot be enabled at the
same time"
-);
-
#[cfg(feature = "snmalloc")]
#[global_allocator]
static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc;
-#[cfg(feature = "mimalloc")]
+// `cargo clippy --all-features` enables both allocator features, so prefer
Review Comment:
reverted the changes in the PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3192576459
##
datafusion/datasource-parquet/benches/parquet_fully_matched_filter.rs:
##
@@ -0,0 +1,292 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Benchmark for skipping filter evaluation on fully matched row groups.
+//!
+//! This benchmark measures the performance improvement from skipping
+//! RowFilter evaluation when row group statistics prove that all rows
+//! in a row group satisfy the predicate.
+//!
+//! Dataset layout:
+//! - 20 row groups, each with 50_000 rows
+//! - Column `x`: i64, values in range [0, 100) for all row groups
+//! - Column `payload`: Utf8, 1 KB string (makes filter column decoding cost
visible)
+//!
+//! Predicate: `x < 200`
+//! - ALL row groups are fully matched (max(x) < 200 for every row group)
+//! - Without the optimization: RowFilter decodes `x` and evaluates predicate
for every row
+//! - With the optimization: RowFilter is skipped entirely (statistics prove
all rows match)
+//!
+//! Uses `ParquetPushDecoder` directly to exercise the exact code path
+//! that DataFusion's async opener uses.
+
+use std::path::PathBuf;
+use std::sync::{Arc, LazyLock};
+
+use arrow::array::{Int64Array, RecordBatch, StringBuilder};
+use arrow::datatypes::{DataType, Field, Schema, SchemaRef};
+use bytes::Bytes;
+use criterion::{Criterion, Throughput, criterion_group, criterion_main};
+use datafusion_common::ScalarValue;
+use datafusion_datasource_parquet::{ParquetFileMetrics, build_row_filter};
+use datafusion_expr::{Expr, col};
+use datafusion_physical_expr::planner::logical2physical;
+use datafusion_physical_plan::metrics::ExecutionPlanMetricsSet;
+use parquet::DecodeResult;
+use parquet::arrow::arrow_reader::ArrowReaderMetadata;
+use parquet::arrow::push_decoder::ParquetPushDecoderBuilder;
+use parquet::file::properties::WriterProperties;
+use parquet::{arrow::ArrowWriter, file::metadata::ParquetMetaData};
+use tempfile::TempDir;
+
+const ROW_GROUP_SIZE: usize = 50_000;
+const NUM_ROW_GROUPS: usize = 20;
+const TOTAL_ROWS: usize = ROW_GROUP_SIZE * NUM_ROW_GROUPS;
+const PAYLOAD_LEN: usize = 1024;
+
+struct BenchmarkDataset {
Review Comment:
reverted the changes in the PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4373502775 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4373338359) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark clickbench_extended.json βββββ³βββ³ββββ³β β Query β HEAD β datafusion_issue-19028-benchmark β Change β β‘ββββββββββββ© β QQuery 0 β 788.73 / 861.44 Β±60.61 / 949.57 ms β913.95 / 949.43 Β±23.09 / 978.21 ms β 1.10x slower β β QQuery 1 β197.33 / 199.01 Β±2.10 / 203.10 ms β 190.50 / 191.22 Β±0.71 / 192.37 ms β no change β β QQuery 2 β485.39 / 487.27 Β±1.98 / 490.63 ms β 465.41 / 468.42 Β±1.57 / 469.92 ms β no change β β QQuery 3 β310.24 / 311.90 Β±1.68 / 315.13 ms β 310.82 / 313.37 Β±1.68 / 314.97 ms β no change β β QQuery 4 β 658.30 / 674.94 Β±10.52 / 686.00 ms β658.33 / 680.01 Β±12.81 / 691.99 ms β no change β β QQuery 5 β 10695.14 / 10778.26 Β±92.00 / 10942.80 ms β 10258.39 / 10553.98 Β±180.12 / 10788.80 ms β no change β β QQuery 6 β 29.76 / 65.65 Β±70.26 / 206.15 ms β27.83 / 28.07 Β±0.23 / 28.42 ms β +2.34x faster β β QQuery 7 β 811.80 / 824.14 Β±13.28 / 846.22 ms β753.09 / 802.01 Β±30.46 / 849.06 ms β no change β β QQuery 8 β 379.77 / 396.29 Β±13.78 / 414.81 ms β372.72 / 399.00 Β±41.02 / 480.10 ms β no change β β QQuery 9 β2790.95 / 2929.75 Β±77.42 / 3001.87 ms β 3147.97 / 3190.25 Β±24.98 / 3218.54 ms β 1.09x slower β β QQuery 10 β654.35 / 664.27 Β±5.41 / 670.71 ms β651.01 / 677.90 Β±38.17 / 753.66 ms β no change β β QQuery 11 β2157.94 / 2270.58 Β±59.79 / 2335.75 ms β 2321.41 / 2383.76 Β±33.72 / 2422.34 ms β no change β β QQuery 12 β 193.89 / 212.85 Β±29.31 / 271.05 ms β190.86 / 208.33 Β±20.46 / 246.75 ms β no change β β QQuery 13 β 559.07 / 575.62 Β±13.58 / 594.20 ms β13.31 / 13.58 Β±0.23 / 13.97 ms β
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4373370993 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4373338359) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4373338359-2015-4xqfq 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (87f71e95fc864a3dcd53ad8d943891b5465f1520) to ba038e9 (merge-base) [diff](https://github.com/apache/datafusion/compare/ba038e99c861ac4cb034c7159167c3a91a8ea740..87f71e95fc864a3dcd53ad8d943891b5465f1520) using: clickbench_extended Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
alamb commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4373343682 (I do think you'll have to merge up from main and likely resolve comments once this one is merged - https://github.com/apache/datafusion/pull/21907 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
alamb commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3183489922
##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -1075,41 +1076,54 @@ impl RowGroupsPrunedParquetOpen {
let file_metadata = Arc::clone(reader_metadata.metadata());
let rg_metadata = file_metadata.row_groups();
-// Filter pushdown: evaluate predicates during scan
-let row_filter = if let Some(predicate) = prepared
+// Filter pushdown: evaluate predicates during scan.
+// Keep the predicate around so we can rebuild RowFilter per decoder
run
+// when fully matched row groups split the scan into multiple decoders.
+let pushdown_predicate = prepared
.pushdown_filters
.then_some(prepared.predicate.clone())
-.flatten()
-{
-let row_filter = row_filter::build_row_filter(
-&predicate,
-&prepared.physical_file_schema,
-file_metadata.as_ref(),
-prepared.reorder_predicates,
-&prepared.file_metrics,
-);
+.flatten();
-match row_filter {
-Ok(Some(filter)) => Some(filter),
-Ok(None) => None,
-Err(e) => {
-debug!("Ignoring error building row filter for
'{predicate:?}': {e}");
-None
+let try_build_row_filter =
+|predicate: &Arc| -> Option {
+match row_filter::build_row_filter(
+predicate,
+&prepared.physical_file_schema,
+file_metadata.as_ref(),
+prepared.reorder_predicates,
+&prepared.file_metrics,
+) {
+Ok(Some(filter)) => Some(filter),
+Ok(None) => None,
+Err(e) => {
+debug!(
+"Ignoring error building row filter for
'{predicate:?}': {e}"
+);
+None
+}
}
-}
-} else {
-None
-};
+};
+
+// Build the first RowFilter eagerly; it will be reused for the first
Review Comment:
This is needed because the RowFilter must be owned, right? I think it migth
make this code easier to understand if you pulled the RowFilter generator logic
into its own structure rather than a closure and manually tracked Option
like
```rust
let row_filter_generator = RowFilterGenerator::new(predicate,
&prepared.physical_file_schema, ...);
...
```
Perhaps as a follow on PR
##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -1139,33 +1155,69 @@ impl RowGroupsPrunedParquetOpen {
reader_metadata.parquet_schema(),
);
-let mut decoder_builder =
-ParquetPushDecoderBuilder::new_with_metadata(reader_metadata)
-.with_projection(read_plan.projection_mask)
+// Split into consecutive runs of row groups that share the same filter
+// requirement. Fully matched row groups skip the RowFilter; others
need it.
+// Reverse the run order for reverse scans so the combined decoder
stream
+// preserves the requested global row group order.
+let mut runs = access_plan.split_runs(has_row_filter);
+if prepared.reverse_row_groups {
+runs.reverse();
+}
+let run_count = runs.len();
+let decoder_limit = prepared.limit.filter(|_| run_count == 1);
+let remaining_limit = prepared.limit.filter(|_| run_count > 1);
+
+// Helper: configure a decoder builder with shared options from
+// the prepared plan.
+let build_decoder = |prepared_access_plan: PreparedAccessPlan,
Review Comment:
likewise here it would be nice to see this as its own function rather than a
closure, but we can do that as a follow on PR I think
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
alamb commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4373338359 run benchmark clickbench_extended ``` env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4362878551 lol QQuery 13 β 546.46 / 549.37 Β±2.87 / 554.08 ms β13.56 / 13.72 Β±0.11 / 13.86 ms β +40.03x faster β the newly added query is exactly matched with the optimization in the pr -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4362788303 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4362734827) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark clickbench_extended.json βββββ³ββββ³ββββ³β β Query β HEAD β datafusion_issue-19028-benchmark β Change β β‘βββββββββββββ© β QQuery 0 β791.81 / 810.06 Β±13.41 / 832.22 ms β778.49 / 818.79 Β±24.50 / 847.08 ms β no change β β QQuery 1 β 195.40 / 197.42 Β±3.09 / 203.52 ms β 190.40 / 193.71 Β±5.64 / 204.97 ms β no change β β QQuery 2 β 482.37 / 484.71 Β±2.19 / 487.63 ms β 465.28 / 469.19 Β±4.10 / 476.66 ms β no change β β QQuery 3 β 312.99 / 313.96 Β±1.43 / 316.78 ms β 309.21 / 310.75 Β±1.19 / 312.48 ms β no change β β QQuery 4 β645.31 / 664.75 Β±13.51 / 682.62 ms β 667.47 / 678.91 Β±7.49 / 687.36 ms β no change β β QQuery 5 β 10276.29 / 10583.20 Β±257.34 / 11008.54 ms β 10075.40 / 10380.57 Β±231.93 / 10733.22 ms β no change β β QQuery 6 β 29.45 / 56.17 Β±34.05 / 112.92 ms β27.70 / 28.44 Β±1.22 / 30.87 ms β +1.98x faster β β QQuery 7 β763.69 / 816.99 Β±43.76 / 885.36 ms β742.30 / 766.54 Β±19.62 / 801.60 ms β +1.07x faster β β QQuery 8 β376.57 / 388.00 Β±10.73 / 406.05 ms β373.37 / 392.63 Β±22.96 / 437.19 ms β no change β β QQuery 9 β 2936.58 / 2986.44 Β±35.95 / 3041.68 ms β 3124.33 / 3154.76 Β±15.64 / 3166.38 ms β 1.06x slower β β QQuery 10 β648.39 / 662.47 Β±12.01 / 681.58 ms β647.31 / 686.85 Β±62.33 / 810.96 ms β no change β β QQuery 11 β 2205.27 / 2244.62 Β±46.70 / 2335.37 ms β 2354.25 / 2434.02 Β±55.64 / 2510.80 ms β 1.08x slower β β QQuery 12 β189.47 / 219.42 Β±58.62 / 336.65 ms β191.74 / 205.18 Β±15.90 / 233.24 ms β +1.07x faster β β QQuery 13 β 546.46 / 549.37 Β±2.87 / 554.08 ms β13.56 / 13.72 Β±
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4362744021 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4362734827) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4362734827-1969-fgdr4 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (87f71e95fc864a3dcd53ad8d943891b5465f1520) to ba038e9 (merge-base) [diff](https://github.com/apache/datafusion/compare/ba038e99c861ac4cb034c7159167c3a91a8ea740..87f71e95fc864a3dcd53ad8d943891b5465f1520) using: clickbench_extended Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
github-actions[bot] commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4362741427 Thank you for opening this pull request! Reviewer note: [cargo-semver-checks](https://github.com/obi1kenobi/cargo-semver-checks) reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details ``` Cloning origin/main Building datafusion-datasource-parquet v53.1.0 (current) Built [ 45.034s] (current) Parsing datafusion-datasource-parquet v53.1.0 (current) Parsed [ 0.025s] (current) Building datafusion-datasource-parquet v53.1.0 (baseline) Built [ 43.069s] (baseline) Parsing datafusion-datasource-parquet v53.1.0 (baseline) Parsed [ 0.026s] (baseline) Checking datafusion-datasource-parquet v53.1.0 -> v53.1.0 (no change; assume patch) Checked [ 0.154s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip --- failure inherent_method_missing: pub method removed or renamed --- Description: A publicly-visible method or associated fn is no longer available under its prior name. It may have been renamed or removed entirely. ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/inherent_method_missing.ron Failed in: RowGroupAccessPlanFilter::is_fully_matched, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-origin_main/1b4aa23fc54dabface2da814e74fe26e0b84c6a8/datafusion/datasource-parquet/src/row_group_filter.rs:83 Summary semver requires new major version: 1 major and 0 minor checks failed Finished [ 89.849s] datafusion-datasource-parquet Building datafusion-sqllogictest v53.1.0 (current) Built [ 135.401s] (current) Parsing datafusion-sqllogictest v53.1.0 (current) Parsed [ 0.022s] (current) Building datafusion-sqllogictest v53.1.0 (baseline) Built [ 134.160s] (baseline) Parsing datafusion-sqllogictest v53.1.0 (baseline) Parsed [ 0.023s] (baseline) Checking datafusion-sqllogictest v53.1.0 -> v53.1.0 (no change; assume patch) Checked [ 0.092s] 222 checks: 222 pass, 30 skip Summary no semver update required Finished [ 273.003s] datafusion-sqllogictest ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4362734827 run benchmark clickbench_extended ``` env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3165239136
##
datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt:
##
@@ -104,7 +104,7 @@ Plan with Metrics
03)ProjectionExec: expr=[id@0 as id, value@1 as v, value@1 + id@0 as
name], metrics=[output_rows=10, ]
04)--FilterExec: value@1 > 3, metrics=[output_rows=10, ,
selectivity=100% (10/10)]
05)RepartitionExec: partitioning=RoundRobinBatch(4),
input_partitions=1, metrics=[output_rows=10, ]
-06)--DataSourceExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/dynamic_filter_pushdown_config/test_data.parquet]]},
projection=[id, value], file_type=parquet, predicate=value@1 > 3 AND
DynamicFilter [ value@1 IS NULL OR value@1 > 800 ],
pruning_predicate=value_null_count@1 != row_count@2 AND value_max@0 > 3 AND
(value_null_count@1 > 0 OR value_null_count@1 != row_count@2 AND value_max@0 >
800), required_guarantees=[], metrics=[output_rows=10,
elapsed_compute=, output_bytes=80.0 B,
files_ranges_pruned_statistics=1 total β 1 matched,
row_groups_pruned_statistics=1 total β 1 matched -> 1 fully matched,
row_groups_pruned_bloom_filter=1 total β 1 matched, page_index_pages_pruned=1
total β 1 matched, limit_pruned_row_groups=0 total β 0 matched,
bytes_scanned=210, metadata_load_time=,
scan_efficiency_ratio=18.31% (210/1.15 K)]
+06)--DataSourceExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/dynamic_filter_pushdown_config/test_data.parquet]]},
projection=[id, value], file_type=parquet, predicate=value@1 > 3 AND
DynamicFilter [ value@1 IS NULL OR value@1 > 800 ],
pruning_predicate=value_null_count@1 != row_count@2 AND value_max@0 > 3 AND
(value_null_count@1 > 0 OR value_null_count@1 != row_count@2 AND value_max@0 >
800), required_guarantees=[], metrics=[output_rows=10,
elapsed_compute=, output_bytes=80.0 B,
files_ranges_pruned_statistics=1 total β 1 matched,
row_groups_pruned_statistics=1 total β 1 matched -> 1 fully matched,
row_groups_pruned_bloom_filter=1 total β 1 matched, page_index_pages_pruned=0
total β 0 matched, limit_pruned_row_groups=0 total β 0 matched,
bytes_scanned=210, metadata_load_time=,
scan_efficiency_ratio=18.31% (210/1.15 K)]
Review Comment:
~yes, fixed it~
I found it's hard to fix without extra cost, investigating
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3165248469
##
datafusion/datasource-parquet/src/row_group_filter.rs:
##
@@ -357,13 +366,38 @@ impl RowGroupAccessPlanFilter {
return;
};
+// Collect unique column names referenced by the predicate so we can
+// check for NULLs. Rows with NULL predicate columns evaluate to NULL
+// (not true), so a row group with NULLs cannot be "fully matched."
+let predicate_columns =
+
datafusion_physical_expr::utils::collect_columns(predicate.orig_expr());
+
+let null_count_converters: Vec = predicate_columns
+.iter()
+.filter_map(|col| {
+StatisticsConverter::try_new(col.name(), arrow_schema,
parquet_schema)
Review Comment:
THe PR https://github.com/apache/datafusion/pull/21907 uses a different way
by adding IS NULL checks for nullable columns referenced by the predicate
before evaluating the inverted pruning predicate.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3165244817
##
datafusion/datasource-parquet/src/row_group_filter.rs:
##
@@ -357,13 +366,38 @@ impl RowGroupAccessPlanFilter {
return;
};
+// Collect unique column names referenced by the predicate so we can
Review Comment:
yes, the PR makes the bug surface. I opened a separate PR:
https://github.com/apache/datafusion/pull/21907
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3165239136
##
datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt:
##
@@ -104,7 +104,7 @@ Plan with Metrics
03)ProjectionExec: expr=[id@0 as id, value@1 as v, value@1 + id@0 as
name], metrics=[output_rows=10, ]
04)--FilterExec: value@1 > 3, metrics=[output_rows=10, ,
selectivity=100% (10/10)]
05)RepartitionExec: partitioning=RoundRobinBatch(4),
input_partitions=1, metrics=[output_rows=10, ]
-06)--DataSourceExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/dynamic_filter_pushdown_config/test_data.parquet]]},
projection=[id, value], file_type=parquet, predicate=value@1 > 3 AND
DynamicFilter [ value@1 IS NULL OR value@1 > 800 ],
pruning_predicate=value_null_count@1 != row_count@2 AND value_max@0 > 3 AND
(value_null_count@1 > 0 OR value_null_count@1 != row_count@2 AND value_max@0 >
800), required_guarantees=[], metrics=[output_rows=10,
elapsed_compute=, output_bytes=80.0 B,
files_ranges_pruned_statistics=1 total β 1 matched,
row_groups_pruned_statistics=1 total β 1 matched -> 1 fully matched,
row_groups_pruned_bloom_filter=1 total β 1 matched, page_index_pages_pruned=1
total β 1 matched, limit_pruned_row_groups=0 total β 0 matched,
bytes_scanned=210, metadata_load_time=,
scan_efficiency_ratio=18.31% (210/1.15 K)]
+06)--DataSourceExec: file_groups={1 group:
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/dynamic_filter_pushdown_config/test_data.parquet]]},
projection=[id, value], file_type=parquet, predicate=value@1 > 3 AND
DynamicFilter [ value@1 IS NULL OR value@1 > 800 ],
pruning_predicate=value_null_count@1 != row_count@2 AND value_max@0 > 3 AND
(value_null_count@1 > 0 OR value_null_count@1 != row_count@2 AND value_max@0 >
800), required_guarantees=[], metrics=[output_rows=10,
elapsed_compute=, output_bytes=80.0 B,
files_ranges_pruned_statistics=1 total β 1 matched,
row_groups_pruned_statistics=1 total β 1 matched -> 1 fully matched,
row_groups_pruned_bloom_filter=1 total β 1 matched, page_index_pages_pruned=0
total β 0 matched, limit_pruned_row_groups=0 total β 0 matched,
bytes_scanned=210, metadata_load_time=,
scan_efficiency_ratio=18.31% (210/1.15 K)]
Review Comment:
yes, fixed it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3165227615
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -3773,11 +3773,11 @@ impl PartialOrd for Aggregate {
/// Returns 0 when no grouping set is duplicated.
fn max_grouping_set_duplicate_ordinal(group_expr: &[Expr]) -> usize {
if let Some(Expr::GroupingSet(GroupingSet::GroupingSets(sets))) =
group_expr.first() {
-let mut counts: HashMap<&[Expr], usize> = HashMap::new();
-for set in sets {
-*counts.entry(set).or_insert(0) += 1;
-}
-counts.into_values().max().unwrap_or(0).saturating_sub(1)
+sets.iter()
Review Comment:
yes, I reverted the changes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3165224353
##
benchmarks/src/bin/dfbench.rs:
##
@@ -20,16 +20,13 @@ use datafusion::error::Result;
use clap::{Parser, Subcommand};
-#[cfg(all(feature = "snmalloc", feature = "mimalloc"))]
-compile_error!(
-"feature \"snmalloc\" and feature \"mimalloc\" cannot be enabled at the
same time"
-);
-
#[cfg(feature = "snmalloc")]
#[global_allocator]
static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc;
-#[cfg(feature = "mimalloc")]
+// `cargo clippy --all-features` enables both allocator features, so prefer
Review Comment:
yes, opened a seperate PR: https://github.com/apache/datafusion/pull/21905
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3165226806
##
datafusion/datasource-parquet/benches/parquet_fully_matched_filter.rs:
##
@@ -0,0 +1,292 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Benchmark for skipping filter evaluation on fully matched row groups.
+//!
+//! This benchmark measures the performance improvement from skipping
+//! RowFilter evaluation when row group statistics prove that all rows
+//! in a row group satisfy the predicate.
+//!
+//! Dataset layout:
+//! - 20 row groups, each with 50_000 rows
+//! - Column `x`: i64, values in range [0, 100) for all row groups
+//! - Column `payload`: Utf8, 1 KB string (makes filter column decoding cost
visible)
+//!
+//! Predicate: `x < 200`
+//! - ALL row groups are fully matched (max(x) < 200 for every row group)
+//! - Without the optimization: RowFilter decodes `x` and evaluates predicate
for every row
+//! - With the optimization: RowFilter is skipped entirely (statistics prove
all rows match)
+//!
+//! Uses `ParquetPushDecoder` directly to exercise the exact code path
+//! that DataFusion's async opener uses.
+
+use std::path::PathBuf;
+use std::sync::{Arc, LazyLock};
+
+use arrow::array::{Int64Array, RecordBatch, StringBuilder};
+use arrow::datatypes::{DataType, Field, Schema, SchemaRef};
+use bytes::Bytes;
+use criterion::{Criterion, Throughput, criterion_group, criterion_main};
+use datafusion_common::ScalarValue;
+use datafusion_datasource_parquet::{ParquetFileMetrics, build_row_filter};
+use datafusion_expr::{Expr, col};
+use datafusion_physical_expr::planner::logical2physical;
+use datafusion_physical_plan::metrics::ExecutionPlanMetricsSet;
+use parquet::DecodeResult;
+use parquet::arrow::arrow_reader::ArrowReaderMetadata;
+use parquet::arrow::push_decoder::ParquetPushDecoderBuilder;
+use parquet::file::properties::WriterProperties;
+use parquet::{arrow::ArrowWriter, file::metadata::ParquetMetaData};
+use tempfile::TempDir;
+
+const ROW_GROUP_SIZE: usize = 50_000;
+const NUM_ROW_GROUPS: usize = 20;
+const TOTAL_ROWS: usize = ROW_GROUP_SIZE * NUM_ROW_GROUPS;
+const PAYLOAD_LEN: usize = 1024;
+
+struct BenchmarkDataset {
Review Comment:
yes, this one https://github.com/apache/datafusion/pull/21945
Do you think we should remove the current bench code in the PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4340475327 @alamb THanks for review, I marked the PR as draft, after I resolve all of them and the pre PRs are merged, I'll make it ready to review again -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
alamb commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3156498869
##
datafusion/datasource-parquet/src/row_group_filter.rs:
##
@@ -357,13 +366,38 @@ impl RowGroupAccessPlanFilter {
return;
};
+// Collect unique column names referenced by the predicate so we can
+// check for NULLs. Rows with NULL predicate columns evaluate to NULL
+// (not true), so a row group with NULLs cannot be "fully matched."
+let predicate_columns =
+
datafusion_physical_expr::utils::collect_columns(predicate.orig_expr());
+
+let null_count_converters: Vec = predicate_columns
+.iter()
+.filter_map(|col| {
+StatisticsConverter::try_new(col.name(), arrow_schema,
parquet_schema)
Review Comment:
We should probably set this option to `false` (it defaults to true) to be
super safe:
```
pub fn with_missing_null_counts_as_zero(mut self,
missing_null_counts_as_zero: bool) -> Self
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
alamb commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3156285145
##
benchmarks/src/bin/dfbench.rs:
##
@@ -20,16 +20,13 @@ use datafusion::error::Result;
use clap::{Parser, Subcommand};
-#[cfg(all(feature = "snmalloc", feature = "mimalloc"))]
-compile_error!(
-"feature \"snmalloc\" and feature \"mimalloc\" cannot be enabled at the
same time"
-);
-
#[cfg(feature = "snmalloc")]
#[global_allocator]
static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc;
-#[cfg(feature = "mimalloc")]
+// `cargo clippy --all-features` enables both allocator features, so prefer
Review Comment:
this seems unrelated to the rest of this PR -- perhaps we can pull it into
its own PR for easier review and consideration
##
datafusion/datasource-parquet/benches/parquet_fully_matched_filter.rs:
##
@@ -0,0 +1,292 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Benchmark for skipping filter evaluation on fully matched row groups.
+//!
+//! This benchmark measures the performance improvement from skipping
+//! RowFilter evaluation when row group statistics prove that all rows
+//! in a row group satisfy the predicate.
+//!
+//! Dataset layout:
+//! - 20 row groups, each with 50_000 rows
+//! - Column `x`: i64, values in range [0, 100) for all row groups
+//! - Column `payload`: Utf8, 1 KB string (makes filter column decoding cost
visible)
+//!
+//! Predicate: `x < 200`
+//! - ALL row groups are fully matched (max(x) < 200 for every row group)
+//! - Without the optimization: RowFilter decodes `x` and evaluates predicate
for every row
+//! - With the optimization: RowFilter is skipped entirely (statistics prove
all rows match)
+//!
+//! Uses `ParquetPushDecoder` directly to exercise the exact code path
+//! that DataFusion's async opener uses.
+
+use std::path::PathBuf;
+use std::sync::{Arc, LazyLock};
+
+use arrow::array::{Int64Array, RecordBatch, StringBuilder};
+use arrow::datatypes::{DataType, Field, Schema, SchemaRef};
+use bytes::Bytes;
+use criterion::{Criterion, Throughput, criterion_group, criterion_main};
+use datafusion_common::ScalarValue;
+use datafusion_datasource_parquet::{ParquetFileMetrics, build_row_filter};
+use datafusion_expr::{Expr, col};
+use datafusion_physical_expr::planner::logical2physical;
+use datafusion_physical_plan::metrics::ExecutionPlanMetricsSet;
+use parquet::DecodeResult;
+use parquet::arrow::arrow_reader::ArrowReaderMetadata;
+use parquet::arrow::push_decoder::ParquetPushDecoderBuilder;
+use parquet::file::properties::WriterProperties;
+use parquet::{arrow::ArrowWriter, file::metadata::ParquetMetaData};
+use tempfile::TempDir;
+
+const ROW_GROUP_SIZE: usize = 50_000;
+const NUM_ROW_GROUPS: usize = 20;
+const TOTAL_ROWS: usize = ROW_GROUP_SIZE * NUM_ROW_GROUPS;
+const PAYLOAD_LEN: usize = 1024;
+
+struct BenchmarkDataset {
Review Comment:
Rather than a targeted benchmark like this that will likely not get run all
that often, I recommend adding a new benchmark to the "clickbench_extended"
https://github.com/apache/datafusion/tree/main/benchmarks/queries/clickbench#extended-queries
I bet you could write a pretty good one with some substring match where this
optimization would help a lot.
I recommend making a separate PR to add such a query so we can show off this
PR's performance improvement
##
benchmarks/src/bin/imdb.rs:
##
@@ -21,16 +21,13 @@ use clap::{Parser, Subcommand};
use datafusion::error::Result;
use datafusion_benchmarks::imdb;
-#[cfg(all(feature = "snmalloc", feature = "mimalloc"))]
-compile_error!(
-"feature \"snmalloc\" and feature \"mimalloc\" cannot be enabled at the
same time"
-);
-
#[cfg(feature = "snmalloc")]
#[global_allocator]
static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc;
-#[cfg(feature = "mimalloc")]
+// `cargo clippy --all-features` enables both allocator features, so prefer
Review Comment:
likewise here
##
datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt:
##
@@ -104,7 +104,7 @@ Plan with Metrics
03)ProjectionExec: expr=[id@0 as id, value@1 as v, value@1 + id@0 as
name], met
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
alamb commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4337946630 Checking this one out -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4316119761 Benchmark for [this request](https://github.com/apache/datafusion/pull/21637#issuecomment-4315401410) hit the 7200s job deadline before finishing. Benchmarks requested: `tpch` Kubernetes message ``` Job was active longer than specified deadline ``` --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4315558192 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4315401410) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark tpcds_sf1.json βββββ³ββββ³ββββ³ββββ β Query β HEAD β datafusion_issue-19028-benchmark βChange β β‘ββββββββββββββββ© β QQuery 1 β 6.36 / 6.79 Β±0.77 / 8.33 ms β 6.34 / 6.82 Β±0.72 / 8.27 ms β no change β β QQuery 2 β 111.62 / 112.52 Β±0.84 / 114.02 ms β 112.17 / 113.85 Β±2.07 / 117.48 ms β no change β β QQuery 3 β 109.40 / 111.11 Β±1.03 / 112.42 ms β 109.52 / 110.65 Β±1.11 / 112.08 ms β no change β β QQuery 4 β 1063.85 / 1079.77 Β±11.64 / 1096.79 ms β 1069.66 / 1081.30 Β±6.57 / 1087.59 ms β no change β β QQuery 5 β 190.68 / 196.08 Β±2.96 / 199.40 ms β 197.50 / 199.03 Β±2.11 / 203.21 ms β no change β β QQuery 6 β 256.22 / 263.97 Β±5.03 / 271.37 ms β 265.28 / 269.52 Β±3.66 / 275.88 ms β no change β β QQuery 7 β 331.83 / 336.36 Β±3.29 / 341.13 ms β 331.63 / 337.42 Β±4.01 / 342.43 ms β no change β β QQuery 8 β 158.03 / 163.27 Β±2.94 / 166.58 ms β 160.05 / 163.64 Β±3.55 / 168.95 ms β no change β β QQuery 9 β223.97 / 240.95 Β±18.05 / 275.63 ms β225.92 / 246.17 Β±15.49 / 264.82 ms β no change β β QQuery 10 β 168.84 / 171.94 Β±2.22 / 174.46 ms β 164.65 / 173.05 Β±6.51 / 181.03 ms β no change β β QQuery 11 β 708.27 / 717.29 Β±8.19 / 730.30 ms β 699.93 / 713.92 Β±7.71 / 721.53 ms β no change β β QQuery 12 β37.36 / 40.07 Β±2.19 / 43.94 ms β37.58 / 39.99 Β±1.40 / 41.78 ms β no change β β QQuery 13 β 569.29 / 577.80 Β±7.20 / 589.16 ms β 561.17 / 575.21 Β±9.89 / 591.97 ms β no change β β QQuery 14 β904.33 / 918.83 Β±10.33 / 929.94 ms β 899.59 / 912.64 Β±9.64 / 928.42 ms β no c
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4315511766 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4315401410) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark tpch_sf10.json βββββ³ββββ³βββ³ββββ β Query β HEAD β datafusion_issue-19028-benchmark βChange β β‘βββββββββββββββ© β QQuery 1 β 341.86 / 343.87 Β±1.57 / 345.89 ms β341.55 / 342.83 Β±1.50 / 345.60 ms β no change β β QQuery 2 β 179.02 / 182.05 Β±1.67 / 184.13 ms β178.22 / 180.43 Β±1.81 / 182.75 ms β no change β β QQuery 3 β 376.68 / 377.93 Β±0.92 / 379.08 ms β372.42 / 376.12 Β±2.91 / 379.53 ms β no change β β QQuery 4 β 330.86 / 333.01 Β±1.95 / 335.80 ms β323.24 / 327.00 Β±3.56 / 333.50 ms β no change β β QQuery 5 β 631.16 / 646.36 Β±9.78 / 659.99 ms β618.86 / 630.04 Β±7.42 / 640.48 ms β no change β β QQuery 6 β 295.54 / 296.50 Β±0.87 / 298.02 ms β292.67 / 294.98 Β±1.30 / 296.27 ms β no change β β QQuery 7 β 435.42 / 438.78 Β±2.54 / 442.85 ms β432.02 / 434.27 Β±1.86 / 437.54 ms β no change β β QQuery 8 β 620.16 / 623.49 Β±3.23 / 627.85 ms β614.57 / 618.60 Β±2.10 / 620.58 ms β no change β β QQuery 9 β 1427.74 / 1444.82 Β±11.23 / 1459.99 ms β 1420.76 / 1429.41 Β±5.97 / 1437.76 ms β no change β β QQuery 10 β 414.45 / 420.02 Β±3.60 / 424.13 ms β409.96 / 412.91 Β±2.35 / 415.56 ms β no change β β QQuery 11 β 143.61 / 145.47 Β±3.02 / 151.49 ms β142.75 / 145.97 Β±3.38 / 151.37 ms β no change β β QQuery 12 β 351.88 / 356.48 Β±3.60 / 362.79 ms β350.62 / 352.53 Β±2.15 / 356.43 ms β no change β β QQuery 13 β 395.48 / 399.86 Β±2.41 / 402.69 ms β402.20 / 410.77 Β±9.07 / 423.63 ms β no change β β QQuery 14 β 217.73 / 218.84 Β±0.93 / 220.14 ms β215.12 / 217.76 Β±2.21 / 220.83 ms β no change β β QQuery 15 β 426.15 / 433.18 Β±4.47 / 438.50 ms β424.73 / 430.62 Β±3.77 / 435.54 ms β no change β β QQuery 16 β 111.62 / 114.07 Β±1.71 / 116.46 ms β111.56 / 115.66 Β±2.74 / 118.63 ms β no chan
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4315431482 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4315401410) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4315401410-1829-s9qw7 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (0b1f441c84a3f62e783f0a5d050da478b5a64233) to a0dbbab (merge-base) [diff](https://github.com/apache/datafusion/compare/a0dbbab5849596ecb3db48d9e168f247155209e1..0b1f441c84a3f62e783f0a5d050da478b5a64233) using: tpch10 Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4315418648 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4315401410) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4315401410-1827-q7smj 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (0b1f441c84a3f62e783f0a5d050da478b5a64233) to a0dbbab (merge-base) [diff](https://github.com/apache/datafusion/compare/a0dbbab5849596ecb3db48d9e168f247155209e1..0b1f441c84a3f62e783f0a5d050da478b5a64233) using: tpcds Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4315415411 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4315401410) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4315401410-1828-kdbct 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (0b1f441c84a3f62e783f0a5d050da478b5a64233) to a0dbbab (merge-base) [diff](https://github.com/apache/datafusion/compare/a0dbbab5849596ecb3db48d9e168f247155209e1..0b1f441c84a3f62e783f0a5d050da478b5a64233) using: tpch Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
Dandandan commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4315401410 run benchmarks tpcds tpch tpch10 ``` env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258571734 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4258430739) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark tpcds_sf1.json βββββ³ββββ³ββββ³ββββ β Query β HEAD β datafusion_issue-19028-benchmark βChange β β‘ββββββββββββββββ© β QQuery 1 β 6.78 / 7.24 Β±0.79 / 8.83 ms β 6.26 / 6.73 Β±0.77 / 8.26 ms β +1.08x faster β β QQuery 2 β113.71 / 124.29 Β±18.52 / 161.25 ms β112.31 / 123.13 Β±18.74 / 160.53 ms β no change β β QQuery 3 β 112.25 / 113.70 Β±0.77 / 114.45 ms β 111.22 / 112.97 Β±1.16 / 114.69 ms β no change β β QQuery 4 β 1132.74 / 1143.41 Β±5.89 / 1148.64 ms β 1107.26 / 1146.56 Β±21.33 / 1168.38 ms β no change β β QQuery 5 β 193.18 / 198.73 Β±3.48 / 202.75 ms β 192.26 / 198.74 Β±4.87 / 204.14 ms β no change β β QQuery 6 β268.47 / 283.49 Β±14.07 / 305.59 ms β 289.24 / 296.08 Β±7.41 / 306.80 ms β no change β β QQuery 7 β 329.27 / 334.93 Β±4.12 / 341.05 ms β 333.72 / 345.13 Β±8.97 / 358.71 ms β no change β β QQuery 8 β 164.06 / 167.16 Β±2.20 / 169.86 ms β 161.48 / 164.40 Β±1.76 / 166.98 ms β no change β β QQuery 9 β249.71 / 257.13 Β±10.05 / 277.05 ms β 243.52 / 255.37 Β±6.89 / 263.77 ms β no change β β QQuery 10 β 173.92 / 179.44 Β±3.67 / 183.86 ms β 170.42 / 176.28 Β±3.60 / 181.23 ms β no change β β QQuery 11 β731.39 / 743.51 Β±10.05 / 760.89 ms β739.67 / 750.42 Β±13.72 / 777.02 ms β no change β β QQuery 12 β39.06 / 41.90 Β±2.83 / 47.25 ms β40.35 / 42.34 Β±1.88 / 45.21 ms β no change β β QQuery 13 β 564.43 / 571.92 Β±7.85 / 586.81 ms β 572.14 / 580.99 Β±5.22 / 586.01 ms β no change β β QQuery 14 β 920.93 / 931.52 Β±8.71 / 941.30 ms β917.34 / 937.33 Β±16.00 / 955.10 ms β no c
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258444222 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4258430739) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4258430739-1347-rltwq 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (0b1f441c84a3f62e783f0a5d050da478b5a64233) to a0dbbab (merge-base) [diff](https://github.com/apache/datafusion/compare/a0dbbab5849596ecb3db48d9e168f247155209e1..0b1f441c84a3f62e783f0a5d050da478b5a64233) using: tpcds Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258443837 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4258430739) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4258430739-1348-rbszv 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (0b1f441c84a3f62e783f0a5d050da478b5a64233) to a0dbbab (merge-base) [diff](https://github.com/apache/datafusion/compare/a0dbbab5849596ecb3db48d9e168f247155209e1..0b1f441c84a3f62e783f0a5d050da478b5a64233) using: tpch Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258430739 run benchmarks tpcds tpch ``` env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258411232 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4258253391) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark tpcds_sf1.json βββββ³ββββ³ββββ³ββββ β Query β HEAD β datafusion_issue-19028-benchmark βChange β β‘ββββββββββββββββ© β QQuery 1 β 6.56 / 7.07 Β±0.72 / 8.50 ms β 6.39 / 6.83 Β±0.75 / 8.32 ms β no change β β QQuery 2 β 111.08 / 111.90 Β±0.60 / 112.85 ms β112.56 / 123.17 Β±18.94 / 160.92 ms β 1.10x slower β β QQuery 3 β 109.73 / 110.86 Β±1.14 / 112.61 ms β 109.83 / 110.94 Β±0.63 / 111.46 ms β no change β β QQuery 4 β 1107.32 / 1128.24 Β±17.71 / 1149.81 ms β 1094.31 / 1122.25 Β±16.85 / 1138.95 ms β no change β β QQuery 5 β 195.12 / 197.75 Β±2.90 / 203.16 ms β 191.09 / 195.59 Β±2.92 / 199.56 ms β no change β β QQuery 6 β270.75 / 290.38 Β±12.06 / 304.92 ms β 264.99 / 271.22 Β±5.06 / 279.74 ms β +1.07x faster β β QQuery 7 β 333.39 / 341.33 Β±6.33 / 352.06 ms β 336.01 / 341.39 Β±5.41 / 348.64 ms β no change β β QQuery 8 β 163.92 / 165.53 Β±1.33 / 167.48 ms β 157.48 / 162.72 Β±4.33 / 169.19 ms β no change β β QQuery 9 β210.53 / 243.02 Β±26.88 / 271.53 ms β200.29 / 247.70 Β±26.70 / 273.25 ms β no change β β QQuery 10 β 174.16 / 179.08 Β±4.06 / 184.11 ms β 172.96 / 182.10 Β±8.80 / 196.09 ms β no change β β QQuery 11 β 749.49 / 756.26 Β±4.16 / 762.43 ms β 732.29 / 740.02 Β±6.31 / 751.01 ms β no change β β QQuery 12 β37.94 / 40.63 Β±1.46 / 41.99 ms β37.67 / 39.33 Β±0.88 / 40.10 ms β no change β β QQuery 13 β 564.75 / 575.46 Β±6.80 / 583.56 ms β 553.62 / 557.32 Β±3.11 / 561.74 ms β no change β β QQuery 14 β902.52 / 919.84 Β±10.02 / 932.50 ms β 917.26 / 923.35 Β±6.56 / 933.44 ms β no c
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258384364 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4258253391) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark clickbench_partitioned.json βββββ³ββββ³ββββ³ββββ β Query β HEAD β datafusion_issue-19028-benchmark βChange β β‘ββββββββββββββββ© β QQuery 0 β 1.19 / 4.48 Β±6.41 / 17.31 ms β 1.21 / 4.53 Β±6.45 / 17.43 ms β no change β β QQuery 1 β15.53 / 15.86 Β±0.20 / 16.13 ms β15.48 / 16.04 Β±0.37 / 16.56 ms β no change β β QQuery 2 β45.06 / 45.31 Β±0.28 / 45.84 ms β44.29 / 44.47 Β±0.17 / 44.74 ms β no change β β QQuery 3 β42.27 / 44.56 Β±1.49 / 45.96 ms β41.94 / 45.97 Β±2.32 / 48.58 ms β no change β β QQuery 4 β 283.50 / 294.28 Β±7.03 / 305.72 ms β329.85 / 365.17 Β±41.65 / 444.83 ms β 1.24x slower β β QQuery 5 β 342.84 / 345.35 Β±3.13 / 351.35 ms β374.97 / 395.60 Β±19.29 / 427.23 ms β 1.15x slower β β QQuery 6 β 5.59 / 6.12 Β±0.37 / 6.55 ms β 5.43 / 6.96 Β±1.13 / 8.44 ms β 1.14x slower β β QQuery 7 β22.07 / 22.76 Β±0.61 / 23.76 ms β22.08 / 23.25 Β±1.23 / 25.53 ms β no change β β QQuery 8 β 419.28 / 426.50 Β±3.99 / 429.99 ms β439.54 / 471.62 Β±34.51 / 534.88 ms β 1.11x slower β β QQuery 9 β 637.80 / 646.44 Β±6.72 / 658.14 ms β 669.57 / 677.05 Β±6.41 / 685.61 ms β no change β β QQuery 10 β 116.52 / 118.44 Β±1.50 / 120.88 ms β 118.71 / 121.93 Β±2.15 / 125.30 ms β no change β β QQuery 11 β 133.26 / 133.89 Β±0.61 / 135.01 ms β 131.75 / 133.61 Β±1.99 / 137.12 ms β no change β β QQuery 12 β 371.46 / 381.23 Β±7.11 / 392.22 ms β 387.30 / 401.70 Β±9.11 / 411.47 ms β 1.05x slower β β QQuery 13 β498.04 / 509.93 Β±11.14 / 528.73 ms β536.76 / 577.98 Β±31.48 / 619.60 ms β 1.13x slower β β QQuery 14 β 377.93 / 381.33 Β±3.52 / 387.57 ms β 388.16 / 396.09 Β±5.96 / 406.11 ms β no change β
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258264372 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4258253391) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4258253391-1344-qgksw 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (d6c387974b53610cd13c03a15d7b125fbad31eae) to a0dbbab (merge-base) [diff](https://github.com/apache/datafusion/compare/a0dbbab5849596ecb3db48d9e168f247155209e1..d6c387974b53610cd13c03a15d7b125fbad31eae) using: clickbench_partitioned Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258266017 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4258253391) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4258253391-1346-gz9gj 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (d6c387974b53610cd13c03a15d7b125fbad31eae) to a0dbbab (merge-base) [diff](https://github.com/apache/datafusion/compare/a0dbbab5849596ecb3db48d9e168f247155209e1..d6c387974b53610cd13c03a15d7b125fbad31eae) using: tpch Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258264405 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4258253391) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4258253391-1345-h2qr4 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (d6c387974b53610cd13c03a15d7b125fbad31eae) to a0dbbab (merge-base) [diff](https://github.com/apache/datafusion/compare/a0dbbab5849596ecb3db48d9e168f247155209e1..d6c387974b53610cd13c03a15d7b125fbad31eae) using: tpcds Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
xudong963 commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258253391 run benchmarks ``` env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4252030873 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4251909707) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark tpcds_sf1.json βββββ³β³ββββ³ββββ β Query β HEAD β datafusion_issue-19028-benchmark βChange β β‘βββββββββββββ© β QQuery 1 β6.48 / 6.94 Β±0.78 / 8.50 ms β 6.28 / 6.77 Β±0.85 / 8.46 ms β no change β β QQuery 2 β 112.44 / 113.18 Β±0.62 / 114.19 ms β 111.54 / 112.77 Β±0.99 / 114.35 ms β no change β β QQuery 3 β 111.25 / 112.13 Β±0.75 / 113.22 ms β 109.03 / 110.61 Β±1.34 / 113.07 ms β no change β β QQuery 4 β 1077.60 / 1091.75 Β±8.46 / 1102.27 ms β 1082.02 / 1100.65 Β±14.36 / 1119.07 ms β no change β β QQuery 5 β 191.56 / 194.21 Β±1.72 / 196.11 ms β 194.80 / 197.60 Β±1.92 / 199.84 ms β no change β β QQuery 6 β 248.92 / 260.54 Β±7.02 / 268.19 ms β 259.99 / 266.94 Β±3.97 / 270.83 ms β no change β β QQuery 7 β 331.88 / 338.40 Β±3.84 / 343.08 ms β 328.92 / 336.27 Β±3.99 / 339.90 ms β no change β β QQuery 8 β 160.55 / 164.47 Β±2.55 / 166.74 ms β 158.41 / 165.70 Β±3.79 / 169.06 ms β no change β β QQuery 9 β 226.33 / 244.34 Β±11.59 / 262.42 ms β232.22 / 247.89 Β±12.22 / 264.39 ms β no change β β QQuery 10 β 169.16 / 177.09 Β±6.14 / 184.86 ms β 176.28 / 179.23 Β±2.30 / 182.14 ms β no change β β QQuery 11 β 699.75 / 718.20 Β±10.41 / 729.03 ms β 715.56 / 720.36 Β±6.36 / 732.87 ms β no change β β QQuery 12 β 37.91 / 39.73 Β±1.69 / 42.63 ms β37.92 / 39.43 Β±1.18 / 41.54 ms β no change β β QQuery 13 β 547.59 / 570.55 Β±16.21 / 589.20 ms β 566.78 / 574.16 Β±5.48 / 582.10 ms β no change β β QQuery 14 β 898.55 / 913.18 Β±8.63 / 921.11 ms β893.15 / 908.07 Β±11.95 / 924
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4251925070 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4251909707) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4251909707-1287-bzczn 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (7cff519005e62b343c8be3f58fcf95ab4efde4a8) to dc973cc (merge-base) [diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..7cff519005e62b343c8be3f58fcf95ab4efde4a8) using: tpch Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4251912293 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4251909707) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4251909707-1286-42w8w 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (7cff519005e62b343c8be3f58fcf95ab4efde4a8) to dc973cc (merge-base) [diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..7cff519005e62b343c8be3f58fcf95ab4efde4a8) using: tpcds Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
Dandandan commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4251909707 run benchmarks tpcds tpch ``` env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4250189791 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4250064780) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark tpcds_sf1.json βββββ³ββββ³ββββ³ββββ β Query β HEAD β datafusion_issue-19028-benchmark βChange β β‘ββββββββββββββββ© β QQuery 1 β 6.43 / 6.89 Β±0.82 / 8.53 ms β 6.42 / 6.86 Β±0.75 / 8.35 ms β no change β β QQuery 2 β112.26 / 133.08 Β±24.59 / 164.16 ms β111.97 / 134.28 Β±24.27 / 164.01 ms β no change β β QQuery 3 β 109.72 / 111.10 Β±1.35 / 113.17 ms β 108.94 / 109.47 Β±0.52 / 110.44 ms β no change β β QQuery 4 β 1101.37 / 1120.63 Β±21.62 / 1154.49 ms β 1083.91 / 1117.06 Β±20.26 / 1145.73 ms β no change β β QQuery 5 β 194.04 / 196.44 Β±1.93 / 199.35 ms β 193.48 / 198.82 Β±3.05 / 202.47 ms β no change β β QQuery 6 β 276.05 / 283.55 Β±6.72 / 295.55 ms β 259.52 / 264.28 Β±4.27 / 270.83 ms β +1.07x faster β β QQuery 7 β 339.13 / 344.97 Β±7.00 / 357.81 ms β 334.97 / 338.71 Β±4.50 / 346.55 ms β no change β β QQuery 8 β 154.97 / 162.83 Β±5.43 / 169.26 ms β 157.10 / 163.80 Β±3.57 / 167.09 ms β no change β β QQuery 9 β212.58 / 243.40 Β±16.47 / 258.99 ms β220.92 / 234.46 Β±11.91 / 249.86 ms β no change β β QQuery 10 β 174.79 / 180.39 Β±3.95 / 186.71 ms β 178.04 / 181.39 Β±2.50 / 185.13 ms β no change β β QQuery 11 β721.40 / 736.38 Β±11.06 / 749.75 ms β 713.05 / 723.96 Β±5.92 / 729.55 ms β no change β β QQuery 12 β38.04 / 40.22 Β±1.85 / 42.98 ms β36.84 / 39.78 Β±1.87 / 42.32 ms β no change β β QQuery 13 β 560.99 / 573.09 Β±6.18 / 577.57 ms β 556.10 / 564.05 Β±7.09 / 576.52 ms β no change β β QQuery 14 β 900.42 / 910.47 Β±8.89 / 926.69 ms β895.98 / 908.53 Β±10.03 / 920.02 ms β no c
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4250163361 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4250064780) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark clickbench_partitioned.json βββββ³ββββ³ββββ³ββββ β Query β HEAD β datafusion_issue-19028-benchmark βChange β β‘ββββββββββββββββ© β QQuery 0 β 1.22 / 4.56 Β±6.51 / 17.57 ms β 1.20 / 4.41 Β±6.37 / 17.15 ms β no change β β QQuery 1 β15.66 / 16.27 Β±0.37 / 16.61 ms β15.56 / 15.84 Β±0.17 / 16.07 ms β no change β β QQuery 2 β44.40 / 44.97 Β±0.48 / 45.76 ms β41.67 / 42.03 Β±0.20 / 42.24 ms β +1.07x faster β β QQuery 3 β43.26 / 45.25 Β±1.10 / 46.30 ms β39.90 / 42.02 Β±2.10 / 44.89 ms β +1.08x faster β β QQuery 4 β 292.95 / 297.72 Β±3.64 / 301.93 ms β 343.96 / 351.63 Β±5.86 / 358.86 ms β 1.18x slower β β QQuery 5 β342.48 / 355.45 Β±11.55 / 373.20 ms β358.60 / 377.58 Β±12.48 / 393.81 ms β 1.06x slower β β QQuery 6 β 5.79 / 6.67 Β±0.78 / 8.12 ms β 5.35 / 6.56 Β±0.84 / 7.59 ms β no change β β QQuery 7 β23.57 / 24.78 Β±1.71 / 28.16 ms β21.17 / 21.85 Β±0.46 / 22.59 ms β +1.13x faster β β QQuery 8 β433.73 / 449.23 Β±10.95 / 466.52 ms β 431.86 / 443.96 Β±8.36 / 455.92 ms β no change β β QQuery 9 β 696.57 / 705.90 Β±6.71 / 716.77 ms β647.40 / 662.69 Β±15.12 / 684.72 ms β +1.07x faster β β QQuery 10 β 124.78 / 129.26 Β±4.24 / 136.50 ms β 120.73 / 123.99 Β±2.90 / 128.90 ms β no change β β QQuery 11 β 134.73 / 136.85 Β±1.62 / 139.55 ms β 134.13 / 135.40 Β±1.09 / 136.71 ms β no change β β QQuery 12 β 392.49 / 396.53 Β±4.92 / 406.13 ms β375.90 / 401.28 Β±13.81 / 415.70 ms β no change β β QQuery 13 β486.25 / 505.93 Β±18.83 / 531.11 ms β495.55 / 516.01 Β±14.69 / 534.80 ms β no change β β QQuery 14 β 379.48 / 388.55 Β±9.95 / 401.10 ms β 392.04 / 405.22 Β±7.95 / 415.55 ms β no change β
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4250078989 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4250064780) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4250064780-1275-p6dsj 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (7cff519005e62b343c8be3f58fcf95ab4efde4a8) to dc973cc (merge-base) [diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..7cff519005e62b343c8be3f58fcf95ab4efde4a8) using: clickbench_partitioned Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4250078401 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4250064780) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4250064780-1276-485dd 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (7cff519005e62b343c8be3f58fcf95ab4efde4a8) to dc973cc (merge-base) [diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..7cff519005e62b343c8be3f58fcf95ab4efde4a8) using: tpcds Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4250078226 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4250064780) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4250064780-1277-bn57j 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (7cff519005e62b343c8be3f58fcf95ab4efde4a8) to dc973cc (merge-base) [diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..7cff519005e62b343c8be3f58fcf95ab4efde4a8) using: tpch Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
Dandandan commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4250064780 run benchmarks ``` env: DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4249657574 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4249531952) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark clickbench_partitioned.json βββββ³ββββ³ββββ³ββββ β Query β HEAD β datafusion_issue-19028-benchmark βChange β β‘ββββββββββββββββ© β QQuery 0 β 1.21 / 4.57 Β±6.54 / 17.64 ms β 1.19 / 4.53 Β±6.50 / 17.53 ms β no change β β QQuery 1 β14.60 / 15.09 Β±0.28 / 15.35 ms β14.18 / 14.54 Β±0.21 / 14.72 ms β no change β β QQuery 2 β44.67 / 45.36 Β±0.41 / 45.95 ms β42.47 / 42.68 Β±0.16 / 42.95 ms β +1.06x faster β β QQuery 3 β45.03 / 45.92 Β±0.68 / 47.13 ms β38.79 / 39.49 Β±0.36 / 39.76 ms β +1.16x faster β β QQuery 4 β 289.32 / 295.98 Β±3.58 / 300.05 ms β 292.13 / 297.11 Β±2.76 / 299.59 ms β no change β β QQuery 5 β 345.56 / 349.97 Β±3.42 / 355.25 ms β 345.69 / 351.59 Β±4.12 / 358.14 ms β no change β β QQuery 6 β 5.94 / 6.96 Β±1.12 / 8.92 ms β 5.64 / 7.14 Β±0.86 / 8.03 ms β no change β β QQuery 7 β16.99 / 17.17 Β±0.13 / 17.37 ms β17.01 / 17.15 Β±0.10 / 17.28 ms β no change β β QQuery 8 β 416.42 / 420.65 Β±4.94 / 429.53 ms β 423.96 / 432.43 Β±5.73 / 439.81 ms β no change β β QQuery 9 β 670.51 / 675.82 Β±2.96 / 678.97 ms β664.02 / 680.86 Β±13.11 / 699.96 ms β no change β β QQuery 10 β 93.78 / 96.79 Β±3.92 / 104.07 ms β 90.09 / 94.33 Β±4.02 / 101.56 ms β no change β β QQuery 11 β 107.42 / 108.60 Β±0.83 / 110.01 ms β 103.27 / 104.63 Β±1.20 / 106.15 ms β no change β β QQuery 12 β 340.86 / 348.17 Β±4.58 / 354.86 ms β 347.42 / 353.78 Β±6.55 / 365.94 ms β no change β β QQuery 13 β458.87 / 479.04 Β±15.10 / 503.79 ms β471.89 / 494.71 Β±15.40 / 513.63 ms β no change β β QQuery 14 β 344.75 / 352.07 Β±3.77 / 355.18 ms β 353.25 / 362.30 Β±5.53 / 368.55 ms β no change β
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4249652693 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4249531952) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and datafusion_issue-19028-benchmark Benchmark tpcds_sf1.json βββββ³βββ³βββ³ββββ β Query β HEAD β datafusion_issue-19028-benchmark βChange β β‘ββββββββββββββ© β QQuery 1 β 6.82 / 7.29 Β±0.79 / 8.85 ms β 6.74 / 7.14 Β±0.70 / 8.54 ms β no change β β QQuery 2 β146.98 / 147.42 Β±0.35 / 147.79 ms β145.96 / 147.02 Β±0.81 / 147.93 ms β no change β β QQuery 3 β113.51 / 114.64 Β±1.05 / 116.56 ms β113.47 / 114.28 Β±0.44 / 114.73 ms β no change β β QQuery 4 β1386.09 / 1399.21 Β±10.38 / 1411.77 ms β1352.19 / 1407.26 Β±35.55 / 1444.22 ms β no change β β QQuery 5 β172.46 / 174.50 Β±1.35 / 175.76 ms β173.06 / 175.55 Β±2.07 / 177.88 ms β no change β β QQuery 6 β 860.53 / 877.99 Β±13.21 / 895.55 ms β 849.00 / 884.78 Β±31.41 / 930.63 ms β no change β β QQuery 7 β350.78 / 352.55 Β±1.44 / 354.25 ms β341.24 / 343.02 Β±2.99 / 348.99 ms β no change β β QQuery 8 β116.16 / 117.82 Β±1.05 / 118.93 ms β118.75 / 120.05 Β±1.25 / 122.19 ms β no change β β QQuery 9 β 103.17 / 111.09 Β±10.38 / 131.32 ms β103.31 / 106.20 Β±1.68 / 108.15 ms β no change β β QQuery 10 β109.43 / 111.50 Β±1.39 / 113.23 ms β102.24 / 103.69 Β±0.87 / 104.62 ms β +1.08x faster β β QQuery 11 β 992.86 / 1000.18 Β±5.18 / 1006.68 ms β 1003.62 / 1020.01 Β±9.58 / 1033.23 ms β no change β β QQuery 12 β 45.38 / 48.41 Β±1.77 / 50.72 ms β 45.25 / 47.00 Β±1.02 / 48.22 ms β no change β β QQuery 13 β402.64 / 405.27 Β±1.45 / 407.07 ms β401.73 / 406.37 Β±3.14 / 410.21 ms β no change β β QQuery 14 β 999.85 / 1012.43 Β±6.34 / 1016.71 ms β 1016.34 / 1025.63 Β±5.29 / 1031.40 ms β no change β β QQuery 15 β
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4249548180 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4249531952) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4249531952-1270-d8c7t 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (5da11eab0b2d8feabb51a3f7a08ee6855029f363) to dc973cc (merge-base) [diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..5da11eab0b2d8feabb51a3f7a08ee6855029f363) using: tpcds Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4249548216 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4249531952) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4249531952-1269-r4t4b 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (5da11eab0b2d8feabb51a3f7a08ee6855029f363) to dc973cc (merge-base) [diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..5da11eab0b2d8feabb51a3f7a08ee6855029f363) using: clickbench_partitioned Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
adriangbot commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4249548411 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4249531952) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4249531952-1271-fbkqb 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing datafusion/issue-19028-benchmark (5da11eab0b2d8feabb51a3f7a08ee6855029f363) to dc973cc (merge-base) [diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..5da11eab0b2d8feabb51a3f7a08ee6855029f363) using: tpch Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]
Dandandan commented on PR #21637: URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4249531952 run benchmarks ``` env: PUSHDOWN_FILTERS: true REORDER_FILTERS: true ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
