Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-14 Thread via GitHub


xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3245736923


##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -1075,41 +1076,54 @@ impl RowGroupsPrunedParquetOpen {
 let file_metadata = Arc::clone(reader_metadata.metadata());
 let rg_metadata = file_metadata.row_groups();
 
-// Filter pushdown: evaluate predicates during scan
-let row_filter = if let Some(predicate) = prepared
+// Filter pushdown: evaluate predicates during scan.
+// Keep the predicate around so we can rebuild RowFilter per decoder 
run
+// when fully matched row groups split the scan into multiple decoders.
+let pushdown_predicate = prepared
 .pushdown_filters
 .then_some(prepared.predicate.clone())
-.flatten()
-{
-let row_filter = row_filter::build_row_filter(
-&predicate,
-&prepared.physical_file_schema,
-file_metadata.as_ref(),
-prepared.reorder_predicates,
-&prepared.file_metrics,
-);
+.flatten();
 
-match row_filter {
-Ok(Some(filter)) => Some(filter),
-Ok(None) => None,
-Err(e) => {
-debug!("Ignoring error building row filter for 
'{predicate:?}': {e}");
-None
+let try_build_row_filter =
+|predicate: &Arc| -> Option {
+match row_filter::build_row_filter(
+predicate,
+&prepared.physical_file_schema,
+file_metadata.as_ref(),
+prepared.reorder_predicates,
+&prepared.file_metrics,
+) {
+Ok(Some(filter)) => Some(filter),
+Ok(None) => None,
+Err(e) => {
+debug!(
+"Ignoring error building row filter for 
'{predicate:?}': {e}"
+);
+None
+}
 }
-}
-} else {
-None
-};
+};
+
+// Build the first RowFilter eagerly; it will be reused for the first

Review Comment:
   https://github.com/apache/datafusion/pull/22191/ the follow-up PR.
   
   @adriangb I made the PR first, after your PR on arrow side lands on DF, we 
can do more refactor!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-14 Thread via GitHub


xudong963 merged PR #21637:
URL: https://github.com/apache/datafusion/pull/21637


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-14 Thread via GitHub


xudong963 commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4456332067

   @alamb @adriangb thanks for the review, let's move forward


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-14 Thread via GitHub


xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3245411663


##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -1075,41 +1076,54 @@ impl RowGroupsPrunedParquetOpen {
 let file_metadata = Arc::clone(reader_metadata.metadata());
 let rg_metadata = file_metadata.row_groups();
 
-// Filter pushdown: evaluate predicates during scan
-let row_filter = if let Some(predicate) = prepared
+// Filter pushdown: evaluate predicates during scan.
+// Keep the predicate around so we can rebuild RowFilter per decoder 
run
+// when fully matched row groups split the scan into multiple decoders.
+let pushdown_predicate = prepared
 .pushdown_filters
 .then_some(prepared.predicate.clone())
-.flatten()
-{
-let row_filter = row_filter::build_row_filter(
-&predicate,
-&prepared.physical_file_schema,
-file_metadata.as_ref(),
-prepared.reorder_predicates,
-&prepared.file_metrics,
-);
+.flatten();
 
-match row_filter {
-Ok(Some(filter)) => Some(filter),
-Ok(None) => None,
-Err(e) => {
-debug!("Ignoring error building row filter for 
'{predicate:?}': {e}");
-None
+let try_build_row_filter =
+|predicate: &Arc| -> Option {
+match row_filter::build_row_filter(
+predicate,
+&prepared.physical_file_schema,
+file_metadata.as_ref(),
+prepared.reorder_predicates,
+&prepared.file_metrics,
+) {
+Ok(Some(filter)) => Some(filter),
+Ok(None) => None,
+Err(e) => {
+debug!(
+"Ignoring error building row filter for 
'{predicate:?}': {e}"
+);
+None
+}
 }
-}
-} else {
-None
-};
+};
+
+// Build the first RowFilter eagerly; it will be reused for the first

Review Comment:
   I'll merge the PR, then start looking into the refactor



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-14 Thread via GitHub


xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3245404958


##
datafusion/datasource-parquet/src/metrics.rs:
##
@@ -213,4 +213,28 @@ impl ParquetFileMetrics {
 predicate_cache_records,
 }
 }
+
+/// Record pages whose page-index pruning was skipped because the 
containing
+/// row group was fully matched by row-group statistics.
+///
+/// The counter is only registered when there is a non-zero value. This 
keeps

Review Comment:
   This is a nice follow-up exploration, 
https://github.com/apache/datafusion/issues/22189 created an issue for this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-14 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4454204384

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4454053108)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark clickbench_partitioned.json
   
   
┏━━━┳━━━┳━━━┳━━━┓
   ┃ Query ┃  HEAD ┃  
datafusion_issue-19028-benchmark ┃Change ┃
   
┑━━━╇━━━╇━━━╇━━━┩
   β”‚ QQuery 0  β”‚  1.19 / 4.62 Β±6.74 / 18.09 ms β”‚  1.20 / 4.70 
Β±6.83 / 18.36 ms β”‚ no change β”‚
   β”‚ QQuery 1  β”‚12.70 / 13.12 Β±0.33 / 13.61 ms β”‚12.61 / 12.96 
Β±0.20 / 13.21 ms β”‚ no change β”‚
   β”‚ QQuery 2  β”‚35.85 / 36.11 Β±0.20 / 36.35 ms β”‚36.14 / 36.54 
Β±0.31 / 37.01 ms β”‚ no change β”‚
   β”‚ QQuery 3  β”‚30.33 / 30.98 Β±0.65 / 32.24 ms β”‚30.38 / 30.76 
Β±0.24 / 31.00 ms β”‚ no change β”‚
   β”‚ QQuery 4  β”‚ 233.62 / 236.73 Β±3.24 / 242.46 ms β”‚ 233.83 / 237.42 
Β±3.29 / 242.76 ms β”‚ no change β”‚
   β”‚ QQuery 5  β”‚ 277.34 / 279.76 Β±1.77 / 281.96 ms β”‚ 277.78 / 279.58 
Β±2.09 / 283.24 ms β”‚ no change β”‚
   β”‚ QQuery 6  β”‚   6.01 / 7.09 Β±0.61 / 7.72 ms β”‚   6.36 / 7.00 
Β±0.52 / 7.69 ms β”‚ no change β”‚
   β”‚ QQuery 7  β”‚13.83 / 13.89 Β±0.07 / 14.01 ms β”‚13.83 / 13.92 
Β±0.06 / 14.02 ms β”‚ no change β”‚
   β”‚ QQuery 8  β”‚ 315.41 / 317.51 Β±1.42 / 319.56 ms β”‚ 314.85 / 320.12 
Β±3.16 / 323.89 ms β”‚ no change β”‚
   β”‚ QQuery 9  β”‚ 445.80 / 459.64 Β±8.67 / 469.08 ms β”‚442.29 / 460.28 
Β±12.42 / 477.36 ms β”‚ no change β”‚
   β”‚ QQuery 10 β”‚68.62 / 69.58 Β±0.87 / 70.83 ms β”‚68.78 / 69.47 
Β±0.54 / 70.26 ms β”‚ no change β”‚
   β”‚ QQuery 11 β”‚78.49 / 80.02 Β±1.08 / 81.49 ms β”‚79.36 / 81.07 
Β±0.99 / 82.43 ms β”‚ no change β”‚
   β”‚ QQuery 12 β”‚ 271.85 / 276.18 Β±5.79 / 286.96 ms β”‚ 273.22 / 278.69 
Β±3.39 / 283.51 ms β”‚ no change β”‚
   β”‚ QQuery 13 β”‚ 381.80 / 390.49 Β±7.31 / 401.25 ms β”‚ 385.81 / 391.10 
Β±3.41 / 394.63 ms β”‚ no change β”‚
   β”‚ QQuery 14 β”‚ 278.91 / 282.63 Β±3.30 / 288.78 ms β”‚ 280.20 / 282.51 
Β±3.57 / 289.60 ms β”‚ no change β”‚
 

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-14 Thread via GitHub


alamb commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3243789052


##
datafusion/datasource-parquet/src/metrics.rs:
##
@@ -213,4 +213,28 @@ impl ParquetFileMetrics {
 predicate_cache_records,
 }
 }
+
+/// Record pages whose page-index pruning was skipped because the 
containing
+/// row group was fully matched by row-group statistics.
+///
+/// The counter is only registered when there is a non-zero value. This 
keeps

Review Comment:
   I wonder if we should apply the same pattern to the other metrics (lazily 
initialize them) -- if you can get a few percent in this query maybe it would 
get us a few in the others 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-14 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4454072344

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4454053108)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4454053108-107-wf2vv 6.12.68+ #1 SMP Wed Apr  1 02:23:28 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(426154ea1ddc5d57f909d948735229c6f40398d6) to 937dfda (merge-base) 
[diff](https://github.com/apache/datafusion/compare/937dfdad748589aa7372848bb2a57ef04109b931..426154ea1ddc5d57f909d948735229c6f40398d6)
 using: clickbench_partitioned
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-14 Thread via GitHub


alamb commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4454064121

   > The issue was that ParquetFileMetrics::new created a 
LazyParquetSummaryCount for page_index_pages_skipped_by_fully_matched for every 
opened file.
   
   Wild -- that seems like non trivial overhead
   
   Looking at the code and what you changed, maybe it is because the metric 
builder is expensive (it is copying strings)
   
   ```
   let count = MetricBuilder::new(metrics)
   .with_new_label("filename", filename.to_string())
   .with_type(MetricType::Summary)
   .with_category(MetricCategory::Rows)
   .counter("page_index_pages_skipped_by_fully_matched", partition);
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-14 Thread via GitHub


alamb commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4454053108

   
   run benchmark clickbench_partitioned
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


xudong963 commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4448087201

   > πŸ€” the benchmarks look slower -- maybe we can profile some of those queries 
and find space to get the performance back
   
   @alamb Good finding to avoid the PR introducing regression!
   
   I profiled the repeated ClickBench partitioned slow queries (`q6`, `q29`, 
and `q41`) on the PR build.
   
   `q29` was dominated by normal parquet decode / aggregation work 
(`snap::decompress`, RLE decoding, `SumAccumulator`), so I did not see a 
PR-specific hot spot there.
   
   `q6` was more useful: it was dominated by parquet 
open/planning/statistics/metrics setup rather than decode work. In particular, 
`ParquetFileMetrics::new`, `MetricBuilder::build`, and 
`LazyParquetSummaryCount` construction/destruction showed up in the sample 
profile. Since `q6` has no filters, this suggested the regression was from 
fixed per-file setup overhead rather than the fully-matched pruning path itself.
   
   The issue was that `ParquetFileMetrics::new` created a 
`LazyParquetSummaryCount` for `page_index_pages_skipped_by_fully_matched` for 
every opened file. Even though the counter was only registered on first use, 
constructing the lazy wrapper still cloned the filename, cloned the metrics 
set, and allocated an `Arc>` for every file, including queries that 
never used this metric.
   
   I fixed this by removing the per-file `LazyParquetSummaryCount` field 
entirely. Page pruning now returns the `pages_skipped_by_fully_matched` count, 
and the opener registers `page_index_pages_skipped_by_fully_matched` only when 
that count is non-zero, using the already available `PreparedParquetOpen` 
filename / partition / metrics context. This keeps `ParquetFileMetrics::new` 
off the extra allocation/clone path for the common case.
   
   Now the benchmark is good: 
https://github.com/apache/datafusion/pull/21637#issuecomment-4447945184


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3239348425


##
datafusion/physical-expr-common/src/metrics/value.rs:
##
@@ -1010,7 +1010,10 @@ impl MetricValue {
 Self::SpilledBytes(_) => 11,
 Self::SpilledRows(_) => 12,
 Self::CurrentMemoryUsage(_) => 13,
-Self::Count { .. } => 14,
+Self::Count { name, .. } => match name.as_ref() {
+"page_index_pages_skipped_by_fully_matched" => 8,

Review Comment:
   Added a comment explaining why this Count is ordered with the Parquet 
page-index pruning metrics in EXPLAIN output.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4447945184

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4447774292)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark clickbench_partitioned.json
   
   
┏━━━┳━━━┳━━━┳━━━┓
   ┃ Query ┃  HEAD ┃  
datafusion_issue-19028-benchmark ┃Change ┃
   
┑━━━╇━━━╇━━━╇━━━┩
   β”‚ QQuery 0  β”‚  1.23 / 4.77 Β±6.88 / 18.54 ms β”‚  1.21 / 4.73 
Β±6.90 / 18.54 ms β”‚ no change β”‚
   β”‚ QQuery 1  β”‚13.14 / 13.82 Β±0.36 / 14.20 ms β”‚13.10 / 13.57 
Β±0.24 / 13.73 ms β”‚ no change β”‚
   β”‚ QQuery 2  β”‚36.41 / 36.89 Β±0.33 / 37.37 ms β”‚35.91 / 36.47 
Β±0.60 / 37.49 ms β”‚ no change β”‚
   β”‚ QQuery 3  β”‚31.18 / 32.19 Β±1.68 / 35.53 ms β”‚30.96 / 31.25 
Β±0.19 / 31.51 ms β”‚ no change β”‚
   β”‚ QQuery 4  β”‚ 245.47 / 248.10 Β±1.97 / 251.52 ms β”‚ 240.31 / 243.94 
Β±1.94 / 245.61 ms β”‚ no change β”‚
   β”‚ QQuery 5  β”‚ 289.69 / 292.04 Β±2.09 / 295.15 ms β”‚ 283.16 / 286.21 
Β±2.18 / 289.31 ms β”‚ no change β”‚
   β”‚ QQuery 6  β”‚  7.35 / 8.23 Β±1.25 / 10.70 ms β”‚   7.14 / 7.55 
Β±0.30 / 7.93 ms β”‚ +1.09x faster β”‚
   β”‚ QQuery 7  β”‚14.99 / 15.06 Β±0.05 / 15.14 ms β”‚14.63 / 15.60 
Β±1.68 / 18.95 ms β”‚ no change β”‚
   β”‚ QQuery 8  β”‚ 329.84 / 332.35 Β±2.10 / 335.81 ms β”‚ 326.69 / 329.96 
Β±2.83 / 334.21 ms β”‚ no change β”‚
   β”‚ QQuery 9  β”‚ 473.34 / 479.60 Β±6.87 / 491.53 ms β”‚446.50 / 465.35 
Β±12.46 / 484.80 ms β”‚ no change β”‚
   β”‚ QQuery 10 β”‚72.19 / 75.45 Β±3.71 / 81.99 ms β”‚71.45 / 76.92 
Β±9.79 / 96.48 ms β”‚ no change β”‚
   β”‚ QQuery 11 β”‚82.98 / 83.75 Β±0.43 / 84.24 ms β”‚83.61 / 85.38 
Β±2.84 / 91.04 ms β”‚ no change β”‚
   β”‚ QQuery 12 β”‚ 282.10 / 285.48 Β±3.94 / 291.32 ms β”‚ 281.81 / 286.78 
Β±4.82 / 295.21 ms β”‚ no change β”‚
   β”‚ QQuery 13 β”‚ 398.45 / 409.27 Β±9.41 / 422.68 ms β”‚ 401.41 / 412.17 
Β±7.60 / 424.18 ms β”‚ no change β”‚
   β”‚ QQuery 14 β”‚287.61 / 296.17 Β±12.69 / 321.41 ms β”‚ 288.95 / 294.25 
Β±5.60 / 301.78 ms β”‚ no change β”‚
 

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4447801562

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4447774292)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4447774292-60-bc7n5 6.12.68+ #1 SMP Wed Apr  1 02:23:28 UTC 2026 aarch64 
GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(d0b4c309746bc29830ed398daf55d775e08e5b83) to 937dfda (merge-base) 
[diff](https://github.com/apache/datafusion/compare/937dfdad748589aa7372848bb2a57ef04109b931..d0b4c309746bc29830ed398daf55d775e08e5b83)
 using: clickbench_partitioned
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


xudong963 commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4447774292

   run benchmark clickbench_partitioned


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


alamb commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4442878086

   πŸ€”  the benchmarks look slower -- maybe we can profile some of those queries 
and find space to get the performance back


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


adriangb commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3235585653


##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -1075,41 +1076,54 @@ impl RowGroupsPrunedParquetOpen {
 let file_metadata = Arc::clone(reader_metadata.metadata());
 let rg_metadata = file_metadata.row_groups();
 
-// Filter pushdown: evaluate predicates during scan
-let row_filter = if let Some(predicate) = prepared
+// Filter pushdown: evaluate predicates during scan.
+// Keep the predicate around so we can rebuild RowFilter per decoder 
run
+// when fully matched row groups split the scan into multiple decoders.
+let pushdown_predicate = prepared
 .pushdown_filters
 .then_some(prepared.predicate.clone())
-.flatten()
-{
-let row_filter = row_filter::build_row_filter(
-&predicate,
-&prepared.physical_file_schema,
-file_metadata.as_ref(),
-prepared.reorder_predicates,
-&prepared.file_metrics,
-);
+.flatten();
 
-match row_filter {
-Ok(Some(filter)) => Some(filter),
-Ok(None) => None,
-Err(e) => {
-debug!("Ignoring error building row filter for 
'{predicate:?}': {e}");
-None
+let try_build_row_filter =
+|predicate: &Arc| -> Option {
+match row_filter::build_row_filter(
+predicate,
+&prepared.physical_file_schema,
+file_metadata.as_ref(),
+prepared.reorder_predicates,
+&prepared.file_metrics,
+) {
+Ok(Some(filter)) => Some(filter),
+Ok(None) => None,
+Err(e) => {
+debug!(
+"Ignoring error building row filter for 
'{predicate:?}': {e}"
+);
+None
+}
 }
-}
-} else {
-None
-};
+};
+
+// Build the first RowFilter eagerly; it will be reused for the first

Review Comment:
   This is the arrow-rs PR in case the APIs are interested to inform direction 
here: https://github.com/apache/arrow-rs/pull/9968



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4442571294

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4442420069)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark clickbench_partitioned.json
   
   
┏━━━┳━━━┳━━━┳━━━┓
   ┃ Query ┃  HEAD ┃  
datafusion_issue-19028-benchmark ┃Change ┃
   
┑━━━╇━━━╇━━━╇━━━┩
   β”‚ QQuery 0  β”‚  1.20 / 4.71 Β±6.87 / 18.46 ms β”‚  1.19 / 4.62 
Β±6.74 / 18.09 ms β”‚ no change β”‚
   β”‚ QQuery 1  β”‚12.84 / 13.04 Β±0.13 / 13.23 ms β”‚12.79 / 12.99 
Β±0.15 / 13.23 ms β”‚ no change β”‚
   β”‚ QQuery 2  β”‚35.68 / 36.12 Β±0.41 / 36.79 ms β”‚35.68 / 35.95 
Β±0.33 / 36.59 ms β”‚ no change β”‚
   β”‚ QQuery 3  β”‚30.70 / 31.50 Β±0.99 / 33.40 ms β”‚30.63 / 30.94 
Β±0.41 / 31.69 ms β”‚ no change β”‚
   β”‚ QQuery 4  β”‚ 233.60 / 235.45 Β±2.63 / 240.67 ms β”‚ 230.21 / 235.18 
Β±3.84 / 239.83 ms β”‚ no change β”‚
   β”‚ QQuery 5  β”‚ 275.77 / 277.86 Β±1.64 / 280.62 ms β”‚ 279.33 / 280.44 
Β±0.75 / 281.28 ms β”‚ no change β”‚
   β”‚ QQuery 6  β”‚   6.26 / 7.05 Β±0.51 / 7.84 ms β”‚   6.97 / 7.56 
Β±0.56 / 8.34 ms β”‚  1.07x slower β”‚
   β”‚ QQuery 7  β”‚13.87 / 14.08 Β±0.14 / 14.25 ms β”‚13.88 / 14.09 
Β±0.11 / 14.18 ms β”‚ no change β”‚
   β”‚ QQuery 8  β”‚ 310.54 / 314.08 Β±3.36 / 319.63 ms β”‚ 314.54 / 318.09 
Β±3.44 / 324.00 ms β”‚ no change β”‚
   β”‚ QQuery 9  β”‚446.17 / 467.23 Β±17.98 / 494.44 ms β”‚ 451.61 / 461.65 
Β±8.16 / 472.23 ms β”‚ no change β”‚
   β”‚ QQuery 10 β”‚71.15 / 71.45 Β±0.27 / 71.87 ms β”‚69.47 / 70.26 
Β±0.56 / 71.12 ms β”‚ no change β”‚
   β”‚ QQuery 11 β”‚81.97 / 82.78 Β±0.56 / 83.59 ms β”‚81.58 / 82.28 
Β±0.49 / 82.86 ms β”‚ no change β”‚
   β”‚ QQuery 12 β”‚ 284.00 / 289.68 Β±3.83 / 295.31 ms β”‚ 273.06 / 275.18 
Β±2.78 / 280.33 ms β”‚ +1.05x faster β”‚
   β”‚ QQuery 13 β”‚ 383.15 / 400.48 Β±9.19 / 410.48 ms β”‚ 377.97 / 388.01 
Β±6.64 / 395.28 ms β”‚ no change β”‚
   β”‚ QQuery 14 β”‚ 275.41 / 284.01 Β±6.85 / 293.85 ms β”‚ 279.69 / 282.28 
Β±2.52 / 287.02 ms β”‚ no change β”‚
 

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4442442500

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4442420069)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4442420069-34-94mwb 6.12.68+ #1 SMP Wed Apr  1 02:23:28 UTC 2026 aarch64 
GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(67a0526e7d928b701c50c2d0100ddce375a328ae) to 937dfda (merge-base) 
[diff](https://github.com/apache/datafusion/compare/937dfdad748589aa7372848bb2a57ef04109b931..67a0526e7d928b701c50c2d0100ddce375a328ae)
 using: clickbench_partitioned
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


alamb commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4442437610

   I want to make sure the slowdowns on clickbench_partitioned in 
https://github.com/apache/datafusion/pull/21637#issuecomment-4440926604 are not 
reproducable
   
   > ```
   > Comparing HEAD and datafusion_issue-19028-benchmark
   > 
   > Benchmark clickbench_partitioned.json
   > 
   > 
┏━━━┳━━━┳━━━┳━━━┓
   > ┃ Query ┃  HEAD ┃  
datafusion_issue-19028-benchmark ┃Change ┃
   > 
┑━━━╇━━━╇━━━╇━━━┩
   ...
   > β”‚ QQuery 6  β”‚   6.87 / 7.54 Β±0.43 / 8.23 ms β”‚   7.39 / 
8.23 Β±0.50 / 8.93 ms β”‚  1.09x slower β”‚
   > β”‚ QQuery 7  β”‚14.68 / 14.85 Β±0.14 / 15.01 ms β”‚14.56 / 15.87 
Β±2.19 / 20.23 ms β”‚  1.07x slower β”‚
   > β”‚ QQuery 8  β”‚ 321.11 / 330.67 Β±8.00 / 340.69 ms β”‚ 353.25 / 355.63 
Β±1.68 / 357.75 ms β”‚  1.08x slower β”‚
   > β”‚ QQuery 9  β”‚ 511.37 / 522.69 Β±8.03 / 534.54 ms β”‚463.77 / 486.85 
Β±19.75 / 519.75 ms β”‚ +1.07x faster β”‚
   > β”‚ QQuery 10 β”‚73.68 / 74.86 Β±0.75 / 75.89 ms β”‚70.46 / 71.76 
Β±0.73 / 72.63 ms β”‚ no change β”‚
   > β”‚ QQuery 11 β”‚82.81 / 84.92 Β±1.32 / 86.03 ms β”‚81.43 / 82.40 
Β±0.98 / 84.28 ms β”‚ no change β”‚
   > β”‚ QQuery 12 β”‚288.31 / 303.84 Β±12.04 / 323.63 ms β”‚ 302.90 / 308.51 
Β±4.72 / 316.56 ms β”‚ no change β”‚
   > β”‚ QQuery 13 β”‚ 388.65 / 402.24 Β±9.41 / 416.80 ms β”‚415.00 / 427.92 
Β±10.87 / 447.70 ms β”‚  1.06x slower β”‚
   > β”‚ QQuery 14 β”‚ 282.25 / 285.05 Β±2.06 / 287.60 ms β”‚ 304.01 / 309.51 
Β±5.09 / 317.91 ms β”‚  1.09x slower β”‚
   > β”‚ QQuery 15 β”‚ 286.19 / 290.24 Β±2.51 / 293.56 ms β”‚ 308.11 / 317.91 
Β±6.69 / 327.52 ms β”‚  1.10x slower β”‚
   > β”‚ QQuery 16 β”‚617.93 / 661.36 Β±26.32 / 686.84 ms β”‚ 638.82 / 648.64 
Β±6.42 / 654.82 ms β”‚ no change β”‚
   > ...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


alamb commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4442420069

   run benchmark clickbench_partitioned


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4440926604

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4440697825)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark clickbench_partitioned.json
   
   
┏━━━┳━━━┳━━━┳━━━┓
   ┃ Query ┃  HEAD ┃  
datafusion_issue-19028-benchmark ┃Change ┃
   
┑━━━╇━━━╇━━━╇━━━┩
   β”‚ QQuery 0  β”‚  1.28 / 4.86 Β±6.96 / 18.77 ms β”‚  1.30 / 4.93 
Β±7.06 / 19.04 ms β”‚ no change β”‚
   β”‚ QQuery 1  β”‚13.03 / 13.41 Β±0.27 / 13.76 ms β”‚13.02 / 13.55 
Β±0.29 / 13.86 ms β”‚ no change β”‚
   β”‚ QQuery 2  β”‚36.21 / 36.67 Β±0.35 / 37.23 ms β”‚36.84 / 37.21 
Β±0.39 / 37.93 ms β”‚ no change β”‚
   β”‚ QQuery 3  β”‚31.95 / 32.94 Β±0.78 / 34.32 ms β”‚32.69 / 33.24 
Β±0.31 / 33.55 ms β”‚ no change β”‚
   β”‚ QQuery 4  β”‚ 260.15 / 261.68 Β±1.86 / 265.30 ms β”‚ 270.03 / 273.33 
Β±2.88 / 277.72 ms β”‚ no change β”‚
   β”‚ QQuery 5  β”‚ 292.78 / 298.05 Β±3.16 / 302.12 ms β”‚ 303.23 / 306.71 
Β±3.04 / 310.80 ms β”‚ no change β”‚
   β”‚ QQuery 6  β”‚   6.87 / 7.54 Β±0.43 / 8.23 ms β”‚   7.39 / 8.23 
Β±0.50 / 8.93 ms β”‚  1.09x slower β”‚
   β”‚ QQuery 7  β”‚14.68 / 14.85 Β±0.14 / 15.01 ms β”‚14.56 / 15.87 
Β±2.19 / 20.23 ms β”‚  1.07x slower β”‚
   β”‚ QQuery 8  β”‚ 321.11 / 330.67 Β±8.00 / 340.69 ms β”‚ 353.25 / 355.63 
Β±1.68 / 357.75 ms β”‚  1.08x slower β”‚
   β”‚ QQuery 9  β”‚ 511.37 / 522.69 Β±8.03 / 534.54 ms β”‚463.77 / 486.85 
Β±19.75 / 519.75 ms β”‚ +1.07x faster β”‚
   β”‚ QQuery 10 β”‚73.68 / 74.86 Β±0.75 / 75.89 ms β”‚70.46 / 71.76 
Β±0.73 / 72.63 ms β”‚ no change β”‚
   β”‚ QQuery 11 β”‚82.81 / 84.92 Β±1.32 / 86.03 ms β”‚81.43 / 82.40 
Β±0.98 / 84.28 ms β”‚ no change β”‚
   β”‚ QQuery 12 β”‚288.31 / 303.84 Β±12.04 / 323.63 ms β”‚ 302.90 / 308.51 
Β±4.72 / 316.56 ms β”‚ no change β”‚
   β”‚ QQuery 13 β”‚ 388.65 / 402.24 Β±9.41 / 416.80 ms β”‚415.00 / 427.92 
Β±10.87 / 447.70 ms β”‚  1.06x slower β”‚
   β”‚ QQuery 14 β”‚ 282.25 / 285.05 Β±2.06 / 287.60 ms β”‚ 304.01 / 309.51 
Β±5.09 / 317.91 ms β”‚  1.09x slower β”‚
 

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4440889301

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4440697825)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark tpcds_sf1.json
   
   
┏━━━┳━━━┳━━━┳━━━┓
   ┃ Query ┃  HEAD ┃  
datafusion_issue-19028-benchmark ┃Change ┃
   
┑━━━╇━━━╇━━━╇━━━┩
   β”‚ QQuery 1  β”‚   6.33 / 6.81 Β±0.86 / 8.52 ms β”‚   6.25 / 6.81 
Β±0.88 / 8.56 ms β”‚ no change β”‚
   β”‚ QQuery 2  β”‚81.34 / 81.89 Β±0.29 / 82.17 ms β”‚81.65 / 81.95 
Β±0.25 / 82.35 ms β”‚ no change β”‚
   β”‚ QQuery 3  β”‚29.06 / 29.29 Β±0.19 / 29.57 ms β”‚29.19 / 29.48 
Β±0.23 / 29.90 ms β”‚ no change β”‚
   β”‚ QQuery 4  β”‚ 506.78 / 513.30 Β±5.56 / 522.50 ms β”‚ 508.53 / 511.81 
Β±2.20 / 515.01 ms β”‚ no change β”‚
   β”‚ QQuery 5  β”‚53.10 / 53.36 Β±0.24 / 53.81 ms β”‚52.90 / 53.18 
Β±0.41 / 54.00 ms β”‚ no change β”‚
   β”‚ QQuery 6  β”‚35.62 / 35.84 Β±0.29 / 36.41 ms β”‚35.31 / 35.81 
Β±0.35 / 36.19 ms β”‚ no change β”‚
   β”‚ QQuery 7  β”‚ 110.34 / 111.13 Β±1.03 / 113.13 ms β”‚ 109.80 / 110.44 
Β±0.92 / 112.26 ms β”‚ no change β”‚
   β”‚ QQuery 8  β”‚38.83 / 39.14 Β±0.39 / 39.89 ms β”‚38.86 / 39.16 
Β±0.21 / 39.45 ms β”‚ no change β”‚
   β”‚ QQuery 9  β”‚53.43 / 55.60 Β±1.94 / 58.99 ms β”‚55.54 / 57.53 
Β±1.41 / 59.59 ms β”‚ no change β”‚
   β”‚ QQuery 10 β”‚80.81 / 82.01 Β±1.90 / 85.80 ms β”‚81.42 / 81.74 
Β±0.20 / 81.96 ms β”‚ no change β”‚
   β”‚ QQuery 11 β”‚ 315.07 / 318.95 Β±2.12 / 321.23 ms β”‚ 313.21 / 316.45 
Β±2.94 / 321.17 ms β”‚ no change β”‚
   β”‚ QQuery 12 β”‚28.90 / 29.35 Β±0.33 / 29.76 ms β”‚28.69 / 29.01 
Β±0.35 / 29.60 ms β”‚ no change β”‚
   β”‚ QQuery 13 β”‚ 128.82 / 129.14 Β±0.36 / 129.84 ms β”‚ 129.01 / 129.37 
Β±0.24 / 129.65 ms β”‚ no change β”‚
   β”‚ QQuery 14 β”‚ 513.80 / 516.32 Β±2.70 / 520.36 ms β”‚517.32 / 524.34 
Β±11.68 / 547.54 ms β”‚ no change β”‚
   β”‚ QQuery 15 β”‚61.20 / 62.38 Β±0.70 / 63.19 ms β”‚60.60 / 61.03 
Β±0.37 / 61.55 ms β”‚ no change β”‚
   β”‚ QQuery 16

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4440872892

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4440697825)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark tpch_sf1.json
   
   
┏━━━┳┳━━┳━━━┓
   ┃ Query ┃   HEAD ┃ 
datafusion_issue-19028-benchmark ┃Change ┃
   
┑━━━╇╇━━╇━━━┩
   β”‚ QQuery 1  β”‚ 39.06 / 40.96 Β±2.11 / 44.35 ms β”‚   38.85 / 39.75 Β±0.94 / 41.56 
ms β”‚ no change β”‚
   β”‚ QQuery 2  β”‚ 20.61 / 21.19 Β±0.73 / 22.62 ms β”‚   20.49 / 20.96 Β±0.37 / 21.54 
ms β”‚ no change β”‚
   β”‚ QQuery 3  β”‚ 36.53 / 37.31 Β±0.93 / 38.78 ms β”‚   35.50 / 36.73 Β±0.66 / 37.41 
ms β”‚ no change β”‚
   β”‚ QQuery 4  β”‚ 17.93 / 18.52 Β±0.76 / 20.01 ms β”‚   18.26 / 18.33 Β±0.08 / 18.47 
ms β”‚ no change β”‚
   β”‚ QQuery 5  β”‚ 43.89 / 45.24 Β±1.08 / 46.84 ms β”‚   43.88 / 45.68 Β±1.47 / 47.26 
ms β”‚ no change β”‚
   β”‚ QQuery 6  β”‚ 16.81 / 17.94 Β±1.11 / 19.65 ms β”‚   16.76 / 17.16 Β±0.36 / 17.78 
ms β”‚ no change β”‚
   β”‚ QQuery 7  β”‚ 48.95 / 50.65 Β±1.18 / 52.08 ms β”‚   50.24 / 51.92 Β±1.83 / 55.43 
ms β”‚ no change β”‚
   β”‚ QQuery 8  β”‚ 45.74 / 46.53 Β±0.63 / 47.49 ms β”‚   45.99 / 46.18 Β±0.15 / 46.44 
ms β”‚ no change β”‚
   β”‚ QQuery 9  β”‚ 50.89 / 52.03 Β±0.84 / 53.39 ms β”‚   51.32 / 52.45 Β±1.06 / 54.18 
ms β”‚ no change β”‚
   β”‚ QQuery 10 β”‚ 64.67 / 65.51 Β±1.07 / 67.50 ms β”‚   65.23 / 65.62 Β±0.63 / 66.88 
ms β”‚ no change β”‚
   β”‚ QQuery 11 β”‚ 13.84 / 14.19 Β±0.48 / 15.12 ms β”‚   13.95 / 14.43 Β±0.53 / 15.43 
ms β”‚ no change β”‚
   β”‚ QQuery 12 β”‚ 25.62 / 26.03 Β±0.32 / 26.46 ms β”‚   25.38 / 26.05 Β±0.47 / 26.74 
ms β”‚ no change β”‚
   β”‚ QQuery 13 β”‚ 35.94 / 36.88 Β±0.68 / 37.72 ms β”‚   35.54 / 36.22 Β±0.56 / 37.09 
ms β”‚ no change β”‚
   β”‚ QQuery 14 β”‚ 26.09 / 26.28 Β±0.16 / 26.53 ms β”‚   26.11 / 26.34 Β±0.18 / 26.63 
ms β”‚ no change β”‚
   β”‚ QQuery 15 β”‚ 32.08 / 32.32 Β±0.20 / 32.68 ms β”‚   32.26 / 33.49 Β±1.22 / 35.44 
ms β”‚ no change β”‚
   β”‚ QQuery 16 β”‚ 14.94 / 15.14 Β±0.11 / 15.22 ms β”‚   15.17 / 15.57 Β±0.37 / 16.08 
ms β”‚ no change β”‚
   β”‚ QQuery 17 β”‚ 76.74 / 78.32 Β±2.21 / 82.70 ms β”‚   76.21 / 78.27 Β±1.55 / 80.27 
ms β”‚ no change β”‚
   β”‚ QQuery 18 β”‚ 68.30 / 69.77 Β±0.85 / 70.79 ms β”‚   68.97 / 69.89 Β±0.64 / 70.58 
ms β”‚ no change β”‚
   β”‚ QQue

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4440729122

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4440697825)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4440697825-31-88b5h 6.12.68+ #1 SMP Wed Apr  1 02:23:28 UTC 2026 aarch64 
GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(67a0526e7d928b701c50c2d0100ddce375a328ae) to 937dfda (merge-base) 
[diff](https://github.com/apache/datafusion/compare/937dfdad748589aa7372848bb2a57ef04109b931..67a0526e7d928b701c50c2d0100ddce375a328ae)
 using: tpch
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4440726209

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4440697825)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4440697825-29-hmbw2 6.12.68+ #1 SMP Wed Apr  1 02:23:28 UTC 2026 aarch64 
GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(67a0526e7d928b701c50c2d0100ddce375a328ae) to 937dfda (merge-base) 
[diff](https://github.com/apache/datafusion/compare/937dfdad748589aa7372848bb2a57ef04109b931..67a0526e7d928b701c50c2d0100ddce375a328ae)
 using: clickbench_partitioned
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4440721425

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4440697825)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4440697825-30-mbkj8 6.12.68+ #1 SMP Wed Apr  1 02:23:28 UTC 2026 aarch64 
GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(67a0526e7d928b701c50c2d0100ddce375a328ae) to 937dfda (merge-base) 
[diff](https://github.com/apache/datafusion/compare/937dfdad748589aa7372848bb2a57ef04109b931..67a0526e7d928b701c50c2d0100ddce375a328ae)
 using: tpcds
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


alamb commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4440697825

   run benchmarks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


alamb commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3233928827


##
datafusion/physical-expr-common/src/metrics/value.rs:
##
@@ -1010,7 +1010,10 @@ impl MetricValue {
 Self::SpilledBytes(_) => 11,
 Self::SpilledRows(_) => 12,
 Self::CurrentMemoryUsage(_) => 13,
-Self::Count { .. } => 14,
+Self::Count { name, .. } => match name.as_ref() {
+"page_index_pages_skipped_by_fully_matched" => 8,

Review Comment:
   this may be worth a comment to explain why it is special casing 
page_index_pages_skipped_by_fully_matched



##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -1075,41 +1076,54 @@ impl RowGroupsPrunedParquetOpen {
 let file_metadata = Arc::clone(reader_metadata.metadata());
 let rg_metadata = file_metadata.row_groups();
 
-// Filter pushdown: evaluate predicates during scan
-let row_filter = if let Some(predicate) = prepared
+// Filter pushdown: evaluate predicates during scan.
+// Keep the predicate around so we can rebuild RowFilter per decoder 
run
+// when fully matched row groups split the scan into multiple decoders.
+let pushdown_predicate = prepared
 .pushdown_filters
 .then_some(prepared.predicate.clone())
-.flatten()
-{
-let row_filter = row_filter::build_row_filter(
-&predicate,
-&prepared.physical_file_schema,
-file_metadata.as_ref(),
-prepared.reorder_predicates,
-&prepared.file_metrics,
-);
+.flatten();
 
-match row_filter {
-Ok(Some(filter)) => Some(filter),
-Ok(None) => None,
-Err(e) => {
-debug!("Ignoring error building row filter for 
'{predicate:?}': {e}");
-None
+let try_build_row_filter =
+|predicate: &Arc| -> Option {
+match row_filter::build_row_filter(
+predicate,
+&prepared.physical_file_schema,
+file_metadata.as_ref(),
+prepared.reorder_predicates,
+&prepared.file_metrics,
+) {
+Ok(Some(filter)) => Some(filter),
+Ok(None) => None,
+Err(e) => {
+debug!(
+"Ignoring error building row filter for 
'{predicate:?}': {e}"
+);
+None
+}
 }
-}
-} else {
-None
-};
+};
+
+// Build the first RowFilter eagerly; it will be reused for the first

Review Comment:
   SOunds good. 
   
   I think @adriangb was also talking recently about restructuing the Parquet 
opener so it could decide more dynamically decide how to evaluate predicates 
(in this case for example it decides not to evaluate a predicate at all). He 
was also thinking we could dynamically choose between pushdown predicate into 
the scan or not
   
   no action required for this PR, I am just commenting here that we seem to be 
treding in this direction



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-13 Thread via GitHub


alamb commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4440631526

   @adriangb and I were talking about this PR last night. I am checking it out


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-06 Thread via GitHub


xudong963 commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4394407082

   > > @alamb thanks for the review, before getting the PR in, I think it's 
better to have your look for the comment [#21637 
(comment)](https://github.com/apache/datafusion/pull/21637#discussion_r3156327107),
 and it's fix commit: 
[da7db27](https://github.com/apache/datafusion/commit/da7db27a6b51345991d67907b9985a0d67224153)
 (this is the lowest cost way I found to fix the metric. Let me know if you 
have other thoughts)
   > 
   > Maybe we should just add a new metric on ParquetScanMetrics πŸ€”
   > 
   > 
https://github.com/apache/datafusion/blob/4c909bafc5c50749884fdd80a06235d7bd72dbde/datafusion/datasource-parquet/src/metrics.rs#L30
   
   Thanks @alamb, I agree that adding a separate metric is cleaner.
   
   I changed the PR 
https://github.com/apache/datafusion/pull/21637/commits/3f2401e0b422e2ddb590660626fc1716c84a22ae
 to keep `page_index_pages_pruned` reporting only pages that were actually 
evaluated by page-index pruning, and added 
`page_index_pages_skipped_by_fully_matched` for pages where page-index pruning 
was skipped because row-group statistics already proved the row group was fully 
matched.
   
   For example, the metrics can now look like:
   
   ```text
   row_groups_pruned_statistics=4 total β†’ 3 matched -> 1 fully matched,
   page_index_pages_pruned=2 total β†’ 2 matched,
   page_index_pages_skipped_by_fully_matched=1
   ```
   
   I would read this as:
   1. row-group statistics evaluated 4 row groups, 3 matched, and 1 of those 
was fully matched;
   2. page-index pruning actually evaluated 2 pages, and both matched;
   3. 1 additional page belonged to the fully matched row group, so page-index 
pruning was skipped for that page. The page is still scanned; only page-index 
predicate evaluation was skipped.
   
   This avoids counting statistics-derived fully matched pages as page-index 
matched pages.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-06 Thread via GitHub


github-actions[bot] commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4394375026

   
   Thank you for opening this pull request!
   
   Reviewer note: 
[cargo-semver-checks](https://github.com/obi1kenobi/cargo-semver-checks) 
reported the current version number is not SemVer-compatible with the changes 
in this pull request (compared against the base branch).
   
   
   Details
   
   ```
Cloning apache/main
   Building datafusion-datasource-parquet v53.1.0 (current)
  Built [  43.029s] (current)
Parsing datafusion-datasource-parquet v53.1.0 (current)
 Parsed [   0.026s] (current)
   Building datafusion-datasource-parquet v53.1.0 (baseline)
  Built [  42.813s] (baseline)
Parsing datafusion-datasource-parquet v53.1.0 (baseline)
 Parsed [   0.025s] (baseline)
   Checking datafusion-datasource-parquet v53.1.0 -> v53.1.0 (no change; 
assume patch)
Checked [   0.142s] 222 checks: 220 pass, 2 fail, 0 warn, 30 skip
   
   --- failure auto_trait_impl_removed: auto trait no longer implemented ---
   
   Description:
   A public type has stopped implementing one or more auto traits. This can 
break downstream code that depends on the traits being implemented.
   ref: 
https://doc.rust-lang.org/reference/special-types-and-traits.html#auto-traits
  impl: 
https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/auto_trait_impl_removed.ron
   
   Failed in:
 type ParquetFileMetrics is no longer UnwindSafe, in 
/home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/metrics.rs:31
 type ParquetFileMetrics is no longer RefUnwindSafe, in 
/home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/metrics.rs:31
   
   --- failure constructible_struct_adds_private_field: struct no longer 
constructible due to new private field ---
   
   Description:
   A struct constructible with a struct literal has a new non-public field. It 
can no longer be constructed using a struct literal outside of its crate.
   ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
  impl: 
https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_private_field.ron
   
   Failed in:
 field ParquetFileMetrics.page_index_pages_skipped_by_fully_matched in 
/home/runner/work/datafusion/datafusion/datafusion/datasource-parquet/src/metrics.rs:75
   
Summary semver requires new major version: 2 major and 0 minor checks 
failed
   Finished [  88.643s] datafusion-datasource-parquet
   Building datafusion-physical-expr-common v53.1.0 (current)
  Built [  20.482s] (current)
Parsing datafusion-physical-expr-common v53.1.0 (current)
 Parsed [   0.020s] (current)
   Building datafusion-physical-expr-common v53.1.0 (baseline)
  Built [  20.210s] (baseline)
Parsing datafusion-physical-expr-common v53.1.0 (baseline)
 Parsed [   0.020s] (baseline)
   Checking datafusion-physical-expr-common v53.1.0 -> v53.1.0 (no change; 
assume patch)
Checked [   0.196s] 222 checks: 222 pass, 30 skip
Summary no semver update required
   Finished [  42.486s] datafusion-physical-expr-common
   Building datafusion-sqllogictest v53.1.0 (current)
  Built [ 136.503s] (current)
Parsing datafusion-sqllogictest v53.1.0 (current)
 Parsed [   0.022s] (current)
   Building datafusion-sqllogictest v53.1.0 (baseline)
  Built [ 135.436s] (baseline)
Parsing datafusion-sqllogictest v53.1.0 (baseline)
 Parsed [   0.023s] (baseline)
   Checking datafusion-sqllogictest v53.1.0 -> v53.1.0 (no change; assume 
patch)
Checked [   0.085s] 222 checks: 222 pass, 30 skip
Summary no semver update required
   Finished [ 277.236s] datafusion-sqllogictest
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-06 Thread via GitHub


xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3199127552


##
datafusion/datasource-parquet/src/metrics.rs:
##
@@ -67,6 +68,11 @@ pub struct ParquetFileMetrics {
 pub page_index_rows_pruned: PruningMetrics,
 /// Total pages filtered or matched by parquet page index
 pub page_index_pages_pruned: PruningMetrics,
+/// Lazily registered counter for pages whose page-index pruning was 
skipped
+/// because the containing row group was fully matched by row-group 
statistics.
+///
+/// These pages are still scanned; only page-index predicate evaluation is 
skipped.
+page_index_pages_skipped_by_fully_matched: LazyParquetSummaryCount,

Review Comment:
   It is registered lazily so normal Parquet scans do not show an extra 
`page_index_pages_skipped_by_fully_matched=0` metric.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-06 Thread via GitHub


alamb commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4390663385

   > @alamb thanks for the review, before getting the PR in, I think it's 
better to have your look for the comment [#21637 
(comment)](https://github.com/apache/datafusion/pull/21637#discussion_r3156327107),
 and it's fix commit: 
[da7db27](https://github.com/apache/datafusion/commit/da7db27a6b51345991d67907b9985a0d67224153)
 (this is the lowest cost way I found to fix the metric. Let me know if you 
have other thoughts)
   
   Maybe we should just add a new metric on ParquetScanMetrics πŸ€” 
https://github.com/apache/datafusion/blob/4c909bafc5c50749884fdd80a06235d7bd72dbde/datafusion/datasource-parquet/src/metrics.rs#L30
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-05 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4385445616

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4385338380)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark clickbench_extended.json
   
   
┏━━━┳━━━┳━━━┳┓
   ┃ Query ┃  HEAD ┃  
datafusion_issue-19028-benchmark ┃ Change ┃
   
┑━━━╇━━━╇━━━╇┩
   β”‚ QQuery 0  β”‚ 811.88 / 821.08 Β±7.70 / 834.76 ms β”‚800.65 / 
816.09 Β±10.06 / 831.13 ms β”‚  no change β”‚
   β”‚ QQuery 1  β”‚ 196.31 / 198.30 Β±3.50 / 205.30 ms β”‚ 191.05 / 
194.30 Β±5.52 / 205.30 ms β”‚  no change β”‚
   β”‚ QQuery 2  β”‚ 483.55 / 487.48 Β±3.15 / 492.28 ms β”‚ 469.04 / 
470.24 Β±1.37 / 472.90 ms β”‚  no change β”‚
   β”‚ QQuery 3  β”‚ 310.29 / 311.17 Β±0.67 / 312.01 ms β”‚ 308.38 / 
310.78 Β±2.99 / 316.41 ms β”‚  no change β”‚
   β”‚ QQuery 4  β”‚ 661.10 / 671.06 Β±5.58 / 677.71 ms β”‚ 663.47 / 
677.74 Β±9.50 / 693.45 ms β”‚  no change β”‚
   β”‚ QQuery 5  β”‚ 10480.76 / 10749.26 Β±135.39 / 10838.42 ms β”‚ 10381.10 / 
10707.66 Β±229.67 / 11069.32 ms β”‚  no change β”‚
   β”‚ QQuery 6  β”‚   29.83 / 41.34 Β±15.31 / 69.33 ms β”‚28.00 / 
32.99 Β±8.92 / 50.81 ms β”‚  +1.25x faster β”‚
   β”‚ QQuery 7  β”‚771.70 / 787.65 Β±13.90 / 803.77 ms β”‚750.14 / 
768.13 Β±16.43 / 787.55 ms β”‚  no change β”‚
   β”‚ QQuery 8  β”‚378.07 / 403.59 Β±35.92 / 474.57 ms β”‚380.54 / 
399.98 Β±32.19 / 464.04 ms β”‚  no change β”‚
   β”‚ QQuery 9  β”‚ 2872.22 / 2922.21 Β±31.65 / 2960.94 ms β”‚ 2795.45 / 
2887.03 Β±72.17 / 2963.15 ms β”‚  no change β”‚
   β”‚ QQuery 10 β”‚ 641.11 / 647.49 Β±4.55 / 653.68 ms β”‚643.03 / 
671.47 Β±43.17 / 757.36 ms β”‚  no change β”‚
   β”‚ QQuery 11 β”‚ 2185.14 / 2209.10 Β±19.59 / 2233.01 ms β”‚ 2165.30 / 
2213.30 Β±42.41 / 2275.07 ms β”‚  no change β”‚
   β”‚ QQuery 12 β”‚197.95 / 215.56 Β±29.02 / 273.46 ms β”‚193.75 / 
212.64 Β±25.73 / 262.96 ms β”‚  no change β”‚
   β”‚ QQuery 13 β”‚540.86 / 560.53 Β±11.38 / 571.47 ms β”‚14.03 / 
14.18 Β±

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-05 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4385353090

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4385338380)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4385338380-2036-5z6mw 6.12.68+ #1 SMP Wed Apr  1 02:23:28 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(da7db27a6b51345991d67907b9985a0d67224153) to ba038e9 (merge-base) 
[diff](https://github.com/apache/datafusion/compare/ba038e99c861ac4cb034c7159167c3a91a8ea740..da7db27a6b51345991d67907b9985a0d67224153)
 using: clickbench_extended
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-05 Thread via GitHub


xudong963 commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4385338380

   run benchmark clickbench_extended
   
   ```
   env:
 DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
 DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-05 Thread via GitHub


xudong963 commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4385334800

   @alamb thanks for the review, before getting the PR in, I think it's better 
to have your look for the comment 
https://github.com/apache/datafusion/pull/21637#discussion_r3156327107, and 
it's fix commit: 
https://github.com/apache/datafusion/pull/21637/commits/da7db27a6b51345991d67907b9985a0d67224153
 (this is the lowest cost way I found to fix the metric. Let me know if you 
have other thoughts)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-05 Thread via GitHub


xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3192608315


##
datafusion/sqllogictest/test_files/limit_pruning.slt:
##
@@ -63,7 +63,55 @@ set datafusion.explain.analyze_level = summary;
 query TT
 explain analyze select * from tracking_data where species > 'M' AND s >= 50 
limit 3;
 
-Plan with Metrics DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/limit_pruning/data.parquet]]},
 projection=[species, s], limit=3, file_type=parquet, predicate=species@0 > M 
AND s@1 >= 50, pruning_predicate=species_null_count@1 != row_count@2 AND 
species_max@0 > M AND s_null_count@4 != row_count@2 AND s_max@3 >= 50, 
required_guarantees=[], metrics=[output_rows=3, elapsed_compute=, 
output_bytes=, files_ranges_pruned_statistics=1 total β†’ 1 matched, 
row_groups_pruned_statistics=4 total β†’ 3 matched -> 1 fully matched, 
row_groups_pruned_bloom_filter=3 total β†’ 3 matched, page_index_pages_pruned=2 
total β†’ 2 matched, limit_pruned_row_groups=2 total β†’ 0 matched, 
bytes_scanned=, metadata_load_time=, 
scan_efficiency_ratio= (171/2.35 K)]
+Plan with Metrics DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/limit_pruning/data.parquet]]},
 projection=[species, s], limit=3, file_type=parquet, predicate=species@0 > M 
AND s@1 >= 50, pruning_predicate=species_null_count@1 != row_count@2 AND 
species_max@0 > M AND s_null_count@4 != row_count@2 AND s_max@3 >= 50, 
required_guarantees=[], metrics=[output_rows=3, elapsed_compute=, 
output_bytes=, files_ranges_pruned_statistics=1 total β†’ 1 matched, 
row_groups_pruned_statistics=4 total β†’ 3 matched -> 1 fully matched, 
row_groups_pruned_bloom_filter=3 total β†’ 3 matched, page_index_pages_pruned=0 
total β†’ 0 matched, limit_pruned_row_groups=2 total β†’ 0 matched, 
bytes_scanned=, metadata_load_time=, 
scan_efficiency_ratio= (171/2.35 K)]
+
+statement ok
+CREATE TABLE fully_matched_limit_source AS VALUES
+  (1),
+  (2),
+  (3),
+  (4),
+  (5),
+  (6),
+  (7),
+  (1),
+  (2);
+
+query I
+COPY (SELECT column1 as a FROM fully_matched_limit_source)
+TO 'test_files/scratch/limit_pruning/fully_matched_limit.parquet'
+STORED AS PARQUET
+OPTIONS (
+  'format.max_row_group_size' '3'
+);
+
+9
+
+statement ok
+drop table fully_matched_limit_source;
+
+statement ok
+CREATE EXTERNAL TABLE fully_matched_limit
+STORED AS PARQUET
+LOCATION 'test_files/scratch/limit_pruning/fully_matched_limit.parquet';
+
+# One fully matched row group sits between two filtered row groups.
+# LIMIT must apply across the entire scan, not once per decoder run.
+query TT
+explain analyze select a from fully_matched_limit where a >= 3 limit 4;
+
+Plan with Metrics DataSourceExec: metrics=[output_rows=4, 
row_groups_pruned_statistics=3 total β†’ 3 matched -> 1 fully 
matched]
+
+query I
+select a from fully_matched_limit where a >= 3 limit 4;

Review Comment:
   done 
https://github.com/apache/datafusion/pull/21637/changes/3aa4a4700e7a876f3db4686ae906f0db05ddfc99



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-05 Thread via GitHub


xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3192595733


##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -1075,41 +1076,54 @@ impl RowGroupsPrunedParquetOpen {
 let file_metadata = Arc::clone(reader_metadata.metadata());
 let rg_metadata = file_metadata.row_groups();
 
-// Filter pushdown: evaluate predicates during scan
-let row_filter = if let Some(predicate) = prepared
+// Filter pushdown: evaluate predicates during scan.
+// Keep the predicate around so we can rebuild RowFilter per decoder 
run
+// when fully matched row groups split the scan into multiple decoders.
+let pushdown_predicate = prepared
 .pushdown_filters
 .then_some(prepared.predicate.clone())
-.flatten()
-{
-let row_filter = row_filter::build_row_filter(
-&predicate,
-&prepared.physical_file_schema,
-file_metadata.as_ref(),
-prepared.reorder_predicates,
-&prepared.file_metrics,
-);
+.flatten();
 
-match row_filter {
-Ok(Some(filter)) => Some(filter),
-Ok(None) => None,
-Err(e) => {
-debug!("Ignoring error building row filter for 
'{predicate:?}': {e}");
-None
+let try_build_row_filter =
+|predicate: &Arc| -> Option {
+match row_filter::build_row_filter(
+predicate,
+&prepared.physical_file_schema,
+file_metadata.as_ref(),
+prepared.reorder_predicates,
+&prepared.file_metrics,
+) {
+Ok(Some(filter)) => Some(filter),
+Ok(None) => None,
+Err(e) => {
+debug!(
+"Ignoring error building row filter for 
'{predicate:?}': {e}"
+);
+None
+}
 }
-}
-} else {
-None
-};
+};
+
+// Build the first RowFilter eagerly; it will be reused for the first

Review Comment:
   Thanks, that makes sense. I agree both would make the code easier to read.
   Since this PR is already focused on the fully matched row group behavior, 
I’ll keep this as-is here and follow up with a small cleanup PR to introduce 
helper(s) for RowFilter generation / decoder building.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-05 Thread via GitHub


xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3192575652


##
benchmarks/src/bin/dfbench.rs:
##
@@ -20,16 +20,13 @@ use datafusion::error::Result;
 
 use clap::{Parser, Subcommand};
 
-#[cfg(all(feature = "snmalloc", feature = "mimalloc"))]
-compile_error!(
-"feature \"snmalloc\" and feature \"mimalloc\" cannot be enabled at the 
same time"
-);
-
 #[cfg(feature = "snmalloc")]
 #[global_allocator]
 static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc;
 
-#[cfg(feature = "mimalloc")]
+// `cargo clippy --all-features` enables both allocator features, so prefer

Review Comment:
   reverted the changes in the PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-05 Thread via GitHub


xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3192576459


##
datafusion/datasource-parquet/benches/parquet_fully_matched_filter.rs:
##
@@ -0,0 +1,292 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Benchmark for skipping filter evaluation on fully matched row groups.
+//!
+//! This benchmark measures the performance improvement from skipping
+//! RowFilter evaluation when row group statistics prove that all rows
+//! in a row group satisfy the predicate.
+//!
+//! Dataset layout:
+//! - 20 row groups, each with 50_000 rows
+//! - Column `x`: i64, values in range [0, 100) for all row groups
+//! - Column `payload`: Utf8, 1 KB string (makes filter column decoding cost 
visible)
+//!
+//! Predicate: `x < 200`
+//! - ALL row groups are fully matched (max(x) < 200 for every row group)
+//! - Without the optimization: RowFilter decodes `x` and evaluates predicate 
for every row
+//! - With the optimization: RowFilter is skipped entirely (statistics prove 
all rows match)
+//!
+//! Uses `ParquetPushDecoder` directly to exercise the exact code path
+//! that DataFusion's async opener uses.
+
+use std::path::PathBuf;
+use std::sync::{Arc, LazyLock};
+
+use arrow::array::{Int64Array, RecordBatch, StringBuilder};
+use arrow::datatypes::{DataType, Field, Schema, SchemaRef};
+use bytes::Bytes;
+use criterion::{Criterion, Throughput, criterion_group, criterion_main};
+use datafusion_common::ScalarValue;
+use datafusion_datasource_parquet::{ParquetFileMetrics, build_row_filter};
+use datafusion_expr::{Expr, col};
+use datafusion_physical_expr::planner::logical2physical;
+use datafusion_physical_plan::metrics::ExecutionPlanMetricsSet;
+use parquet::DecodeResult;
+use parquet::arrow::arrow_reader::ArrowReaderMetadata;
+use parquet::arrow::push_decoder::ParquetPushDecoderBuilder;
+use parquet::file::properties::WriterProperties;
+use parquet::{arrow::ArrowWriter, file::metadata::ParquetMetaData};
+use tempfile::TempDir;
+
+const ROW_GROUP_SIZE: usize = 50_000;
+const NUM_ROW_GROUPS: usize = 20;
+const TOTAL_ROWS: usize = ROW_GROUP_SIZE * NUM_ROW_GROUPS;
+const PAYLOAD_LEN: usize = 1024;
+
+struct BenchmarkDataset {

Review Comment:
   reverted the changes in the PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-04 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4373502775

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4373338359)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark clickbench_extended.json
   
   
┏━━━┳━━┳━━━┳┓
   ┃ Query ┃ HEAD ┃  
datafusion_issue-19028-benchmark ┃ Change ┃
   
┑━━━╇━━╇━━━╇┩
   β”‚ QQuery 0  β”‚   788.73 / 861.44 Β±60.61 / 949.57 ms β”‚913.95 / 
949.43 Β±23.09 / 978.21 ms β”‚   1.10x slower β”‚
   β”‚ QQuery 1  β”‚197.33 / 199.01 Β±2.10 / 203.10 ms β”‚ 190.50 / 
191.22 Β±0.71 / 192.37 ms β”‚  no change β”‚
   β”‚ QQuery 2  β”‚485.39 / 487.27 Β±1.98 / 490.63 ms β”‚ 465.41 / 
468.42 Β±1.57 / 469.92 ms β”‚  no change β”‚
   β”‚ QQuery 3  β”‚310.24 / 311.90 Β±1.68 / 315.13 ms β”‚ 310.82 / 
313.37 Β±1.68 / 314.97 ms β”‚  no change β”‚
   β”‚ QQuery 4  β”‚   658.30 / 674.94 Β±10.52 / 686.00 ms β”‚658.33 / 
680.01 Β±12.81 / 691.99 ms β”‚  no change β”‚
   β”‚ QQuery 5  β”‚ 10695.14 / 10778.26 Β±92.00 / 10942.80 ms β”‚ 10258.39 / 10553.98 
Β±180.12 / 10788.80 ms β”‚  no change β”‚
   β”‚ QQuery 6  β”‚ 29.76 / 65.65 Β±70.26 / 206.15 ms β”‚27.83 / 
28.07 Β±0.23 / 28.42 ms β”‚  +2.34x faster β”‚
   β”‚ QQuery 7  β”‚   811.80 / 824.14 Β±13.28 / 846.22 ms β”‚753.09 / 
802.01 Β±30.46 / 849.06 ms β”‚  no change β”‚
   β”‚ QQuery 8  β”‚   379.77 / 396.29 Β±13.78 / 414.81 ms β”‚372.72 / 
399.00 Β±41.02 / 480.10 ms β”‚  no change β”‚
   β”‚ QQuery 9  β”‚2790.95 / 2929.75 Β±77.42 / 3001.87 ms β”‚ 3147.97 / 
3190.25 Β±24.98 / 3218.54 ms β”‚   1.09x slower β”‚
   β”‚ QQuery 10 β”‚654.35 / 664.27 Β±5.41 / 670.71 ms β”‚651.01 / 
677.90 Β±38.17 / 753.66 ms β”‚  no change β”‚
   β”‚ QQuery 11 β”‚2157.94 / 2270.58 Β±59.79 / 2335.75 ms β”‚ 2321.41 / 
2383.76 Β±33.72 / 2422.34 ms β”‚  no change β”‚
   β”‚ QQuery 12 β”‚   193.89 / 212.85 Β±29.31 / 271.05 ms β”‚190.86 / 
208.33 Β±20.46 / 246.75 ms β”‚  no change β”‚
   β”‚ QQuery 13 β”‚   559.07 / 575.62 Β±13.58 / 594.20 ms β”‚13.31 / 
13.58 Β±0.23 / 13.97 ms β”‚

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-04 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4373370993

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4373338359)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4373338359-2015-4xqfq 6.12.68+ #1 SMP Wed Apr  1 02:23:28 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(87f71e95fc864a3dcd53ad8d943891b5465f1520) to ba038e9 (merge-base) 
[diff](https://github.com/apache/datafusion/compare/ba038e99c861ac4cb034c7159167c3a91a8ea740..87f71e95fc864a3dcd53ad8d943891b5465f1520)
 using: clickbench_extended
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-04 Thread via GitHub


alamb commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4373343682

   (I do think you'll have to merge up from main and likely resolve comments 
once this one is merged
   - https://github.com/apache/datafusion/pull/21907


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-04 Thread via GitHub


alamb commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3183489922


##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -1075,41 +1076,54 @@ impl RowGroupsPrunedParquetOpen {
 let file_metadata = Arc::clone(reader_metadata.metadata());
 let rg_metadata = file_metadata.row_groups();
 
-// Filter pushdown: evaluate predicates during scan
-let row_filter = if let Some(predicate) = prepared
+// Filter pushdown: evaluate predicates during scan.
+// Keep the predicate around so we can rebuild RowFilter per decoder 
run
+// when fully matched row groups split the scan into multiple decoders.
+let pushdown_predicate = prepared
 .pushdown_filters
 .then_some(prepared.predicate.clone())
-.flatten()
-{
-let row_filter = row_filter::build_row_filter(
-&predicate,
-&prepared.physical_file_schema,
-file_metadata.as_ref(),
-prepared.reorder_predicates,
-&prepared.file_metrics,
-);
+.flatten();
 
-match row_filter {
-Ok(Some(filter)) => Some(filter),
-Ok(None) => None,
-Err(e) => {
-debug!("Ignoring error building row filter for 
'{predicate:?}': {e}");
-None
+let try_build_row_filter =
+|predicate: &Arc| -> Option {
+match row_filter::build_row_filter(
+predicate,
+&prepared.physical_file_schema,
+file_metadata.as_ref(),
+prepared.reorder_predicates,
+&prepared.file_metrics,
+) {
+Ok(Some(filter)) => Some(filter),
+Ok(None) => None,
+Err(e) => {
+debug!(
+"Ignoring error building row filter for 
'{predicate:?}': {e}"
+);
+None
+}
 }
-}
-} else {
-None
-};
+};
+
+// Build the first RowFilter eagerly; it will be reused for the first

Review Comment:
   This is needed because the RowFilter must be owned, right? I think it migth 
make this code easier to understand if you pulled the RowFilter generator logic 
into its own structure rather than a closure and manually tracked Option
   
   like
   ```rust
   let row_filter_generator = RowFilterGenerator::new(predicate,  
&prepared.physical_file_schema, ...);
   ...
   ```
   
   Perhaps as a follow on PR



##
datafusion/datasource-parquet/src/opener.rs:
##
@@ -1139,33 +1155,69 @@ impl RowGroupsPrunedParquetOpen {
 reader_metadata.parquet_schema(),
 );
 
-let mut decoder_builder =
-ParquetPushDecoderBuilder::new_with_metadata(reader_metadata)
-.with_projection(read_plan.projection_mask)
+// Split into consecutive runs of row groups that share the same filter
+// requirement. Fully matched row groups skip the RowFilter; others 
need it.
+// Reverse the run order for reverse scans so the combined decoder 
stream
+// preserves the requested global row group order.
+let mut runs = access_plan.split_runs(has_row_filter);
+if prepared.reverse_row_groups {
+runs.reverse();
+}
+let run_count = runs.len();
+let decoder_limit = prepared.limit.filter(|_| run_count == 1);
+let remaining_limit = prepared.limit.filter(|_| run_count > 1);
+
+// Helper: configure a decoder builder with shared options from
+// the prepared plan.
+let build_decoder = |prepared_access_plan: PreparedAccessPlan,

Review Comment:
   likewise here it would be nice to see this as its own function rather than a 
closure, but we can do that as a follow on PR I think
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-04 Thread via GitHub


alamb commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4373338359

   run benchmark clickbench_extended
   
   ```
   env:
 DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
 DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-01 Thread via GitHub


xudong963 commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4362878551

   lol
   
   QQuery 13 β”‚ 546.46 / 549.37 Β±2.87 / 554.08 ms β”‚13.56 / 
13.72 Β±0.11 / 13.86 ms β”‚ +40.03x faster β”‚
   
the newly added query is exactly matched with the optimization in the pr


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-01 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4362788303

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4362734827)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark clickbench_extended.json
   
   
┏━━━┳━━━┳━━━┳┓
   ┃ Query ┃  HEAD ┃  
datafusion_issue-19028-benchmark ┃ Change ┃
   
┑━━━╇━━━╇━━━╇┩
   β”‚ QQuery 0  β”‚791.81 / 810.06 Β±13.41 / 832.22 ms β”‚778.49 / 
818.79 Β±24.50 / 847.08 ms β”‚  no change β”‚
   β”‚ QQuery 1  β”‚ 195.40 / 197.42 Β±3.09 / 203.52 ms β”‚ 190.40 / 
193.71 Β±5.64 / 204.97 ms β”‚  no change β”‚
   β”‚ QQuery 2  β”‚ 482.37 / 484.71 Β±2.19 / 487.63 ms β”‚ 465.28 / 
469.19 Β±4.10 / 476.66 ms β”‚  no change β”‚
   β”‚ QQuery 3  β”‚ 312.99 / 313.96 Β±1.43 / 316.78 ms β”‚ 309.21 / 
310.75 Β±1.19 / 312.48 ms β”‚  no change β”‚
   β”‚ QQuery 4  β”‚645.31 / 664.75 Β±13.51 / 682.62 ms β”‚ 667.47 / 
678.91 Β±7.49 / 687.36 ms β”‚  no change β”‚
   β”‚ QQuery 5  β”‚ 10276.29 / 10583.20 Β±257.34 / 11008.54 ms β”‚ 10075.40 / 
10380.57 Β±231.93 / 10733.22 ms β”‚  no change β”‚
   β”‚ QQuery 6  β”‚  29.45 / 56.17 Β±34.05 / 112.92 ms β”‚27.70 / 
28.44 Β±1.22 / 30.87 ms β”‚  +1.98x faster β”‚
   β”‚ QQuery 7  β”‚763.69 / 816.99 Β±43.76 / 885.36 ms β”‚742.30 / 
766.54 Β±19.62 / 801.60 ms β”‚  +1.07x faster β”‚
   β”‚ QQuery 8  β”‚376.57 / 388.00 Β±10.73 / 406.05 ms β”‚373.37 / 
392.63 Β±22.96 / 437.19 ms β”‚  no change β”‚
   β”‚ QQuery 9  β”‚ 2936.58 / 2986.44 Β±35.95 / 3041.68 ms β”‚ 3124.33 / 
3154.76 Β±15.64 / 3166.38 ms β”‚   1.06x slower β”‚
   β”‚ QQuery 10 β”‚648.39 / 662.47 Β±12.01 / 681.58 ms β”‚647.31 / 
686.85 Β±62.33 / 810.96 ms β”‚  no change β”‚
   β”‚ QQuery 11 β”‚ 2205.27 / 2244.62 Β±46.70 / 2335.37 ms β”‚ 2354.25 / 
2434.02 Β±55.64 / 2510.80 ms β”‚   1.08x slower β”‚
   β”‚ QQuery 12 β”‚189.47 / 219.42 Β±58.62 / 336.65 ms β”‚191.74 / 
205.18 Β±15.90 / 233.24 ms β”‚  +1.07x faster β”‚
   β”‚ QQuery 13 β”‚ 546.46 / 549.37 Β±2.87 / 554.08 ms β”‚13.56 / 
13.72 Β±

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-01 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4362744021

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4362734827)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4362734827-1969-fgdr4 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(87f71e95fc864a3dcd53ad8d943891b5465f1520) to ba038e9 (merge-base) 
[diff](https://github.com/apache/datafusion/compare/ba038e99c861ac4cb034c7159167c3a91a8ea740..87f71e95fc864a3dcd53ad8d943891b5465f1520)
 using: clickbench_extended
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-01 Thread via GitHub


github-actions[bot] commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4362741427

   
   Thank you for opening this pull request!
   
   Reviewer note: 
[cargo-semver-checks](https://github.com/obi1kenobi/cargo-semver-checks) 
reported the current version number is not SemVer-compatible with the changes 
in this pull request (compared against the base branch).
   
   
   Details
   
   ```
Cloning origin/main
   Building datafusion-datasource-parquet v53.1.0 (current)
  Built [  45.034s] (current)
Parsing datafusion-datasource-parquet v53.1.0 (current)
 Parsed [   0.025s] (current)
   Building datafusion-datasource-parquet v53.1.0 (baseline)
  Built [  43.069s] (baseline)
Parsing datafusion-datasource-parquet v53.1.0 (baseline)
 Parsed [   0.026s] (baseline)
   Checking datafusion-datasource-parquet v53.1.0 -> v53.1.0 (no change; 
assume patch)
Checked [   0.154s] 222 checks: 221 pass, 1 fail, 0 warn, 30 skip
   
   --- failure inherent_method_missing: pub method removed or renamed ---
   
   Description:
   A publicly-visible method or associated fn is no longer available under its 
prior name. It may have been renamed or removed entirely.
   ref: 
https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
  impl: 
https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/inherent_method_missing.ron
   
   Failed in:
 RowGroupAccessPlanFilter::is_fully_matched, previously in file 
/home/runner/work/datafusion/datafusion/target/semver-checks/git-origin_main/1b4aa23fc54dabface2da814e74fe26e0b84c6a8/datafusion/datasource-parquet/src/row_group_filter.rs:83
   
Summary semver requires new major version: 1 major and 0 minor checks 
failed
   Finished [  89.849s] datafusion-datasource-parquet
   Building datafusion-sqllogictest v53.1.0 (current)
  Built [ 135.401s] (current)
Parsing datafusion-sqllogictest v53.1.0 (current)
 Parsed [   0.022s] (current)
   Building datafusion-sqllogictest v53.1.0 (baseline)
  Built [ 134.160s] (baseline)
Parsing datafusion-sqllogictest v53.1.0 (baseline)
 Parsed [   0.023s] (baseline)
   Checking datafusion-sqllogictest v53.1.0 -> v53.1.0 (no change; assume 
patch)
Checked [   0.092s] 222 checks: 222 pass, 30 skip
Summary no semver update required
   Finished [ 273.003s] datafusion-sqllogictest
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-05-01 Thread via GitHub


xudong963 commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4362734827

   run benchmark clickbench_extended
   
   ```
   env:
 DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
 DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-29 Thread via GitHub


xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3165239136


##
datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt:
##
@@ -104,7 +104,7 @@ Plan with Metrics
 03)ProjectionExec: expr=[id@0 as id, value@1 as v, value@1 + id@0 as 
name], metrics=[output_rows=10, ]
 04)--FilterExec: value@1 > 3, metrics=[output_rows=10, , 
selectivity=100% (10/10)]
 05)RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1, metrics=[output_rows=10, ]
-06)--DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/dynamic_filter_pushdown_config/test_data.parquet]]},
 projection=[id, value], file_type=parquet, predicate=value@1 > 3 AND 
DynamicFilter [ value@1 IS NULL OR value@1 > 800 ], 
pruning_predicate=value_null_count@1 != row_count@2 AND value_max@0 > 3 AND 
(value_null_count@1 > 0 OR value_null_count@1 != row_count@2 AND value_max@0 > 
800), required_guarantees=[], metrics=[output_rows=10, 
elapsed_compute=, output_bytes=80.0 B, 
files_ranges_pruned_statistics=1 total β†’ 1 matched, 
row_groups_pruned_statistics=1 total β†’ 1 matched -> 1 fully matched, 
row_groups_pruned_bloom_filter=1 total β†’ 1 matched, page_index_pages_pruned=1 
total β†’ 1 matched, limit_pruned_row_groups=0 total β†’ 0 matched, 
bytes_scanned=210, metadata_load_time=, 
scan_efficiency_ratio=18.31% (210/1.15 K)]
+06)--DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/dynamic_filter_pushdown_config/test_data.parquet]]},
 projection=[id, value], file_type=parquet, predicate=value@1 > 3 AND 
DynamicFilter [ value@1 IS NULL OR value@1 > 800 ], 
pruning_predicate=value_null_count@1 != row_count@2 AND value_max@0 > 3 AND 
(value_null_count@1 > 0 OR value_null_count@1 != row_count@2 AND value_max@0 > 
800), required_guarantees=[], metrics=[output_rows=10, 
elapsed_compute=, output_bytes=80.0 B, 
files_ranges_pruned_statistics=1 total β†’ 1 matched, 
row_groups_pruned_statistics=1 total β†’ 1 matched -> 1 fully matched, 
row_groups_pruned_bloom_filter=1 total β†’ 1 matched, page_index_pages_pruned=0 
total β†’ 0 matched, limit_pruned_row_groups=0 total β†’ 0 matched, 
bytes_scanned=210, metadata_load_time=, 
scan_efficiency_ratio=18.31% (210/1.15 K)]

Review Comment:
   ~yes, fixed it~
   
   I found it's hard to fix without extra cost, investigating



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-29 Thread via GitHub


xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3165248469


##
datafusion/datasource-parquet/src/row_group_filter.rs:
##
@@ -357,13 +366,38 @@ impl RowGroupAccessPlanFilter {
 return;
 };
 
+// Collect unique column names referenced by the predicate so we can
+// check for NULLs. Rows with NULL predicate columns evaluate to NULL
+// (not true), so a row group with NULLs cannot be "fully matched."
+let predicate_columns =
+
datafusion_physical_expr::utils::collect_columns(predicate.orig_expr());
+
+let null_count_converters: Vec = predicate_columns
+.iter()
+.filter_map(|col| {
+StatisticsConverter::try_new(col.name(), arrow_schema, 
parquet_schema)

Review Comment:
   THe PR https://github.com/apache/datafusion/pull/21907 uses a different way 
by adding IS NULL checks for nullable columns referenced by the predicate 
before evaluating the inverted pruning predicate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-29 Thread via GitHub


xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3165244817


##
datafusion/datasource-parquet/src/row_group_filter.rs:
##
@@ -357,13 +366,38 @@ impl RowGroupAccessPlanFilter {
 return;
 };
 
+// Collect unique column names referenced by the predicate so we can

Review Comment:
   yes, the PR makes the bug surface. I opened a separate PR: 
https://github.com/apache/datafusion/pull/21907



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-29 Thread via GitHub


xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3165239136


##
datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt:
##
@@ -104,7 +104,7 @@ Plan with Metrics
 03)ProjectionExec: expr=[id@0 as id, value@1 as v, value@1 + id@0 as 
name], metrics=[output_rows=10, ]
 04)--FilterExec: value@1 > 3, metrics=[output_rows=10, , 
selectivity=100% (10/10)]
 05)RepartitionExec: partitioning=RoundRobinBatch(4), 
input_partitions=1, metrics=[output_rows=10, ]
-06)--DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/dynamic_filter_pushdown_config/test_data.parquet]]},
 projection=[id, value], file_type=parquet, predicate=value@1 > 3 AND 
DynamicFilter [ value@1 IS NULL OR value@1 > 800 ], 
pruning_predicate=value_null_count@1 != row_count@2 AND value_max@0 > 3 AND 
(value_null_count@1 > 0 OR value_null_count@1 != row_count@2 AND value_max@0 > 
800), required_guarantees=[], metrics=[output_rows=10, 
elapsed_compute=, output_bytes=80.0 B, 
files_ranges_pruned_statistics=1 total β†’ 1 matched, 
row_groups_pruned_statistics=1 total β†’ 1 matched -> 1 fully matched, 
row_groups_pruned_bloom_filter=1 total β†’ 1 matched, page_index_pages_pruned=1 
total β†’ 1 matched, limit_pruned_row_groups=0 total β†’ 0 matched, 
bytes_scanned=210, metadata_load_time=, 
scan_efficiency_ratio=18.31% (210/1.15 K)]
+06)--DataSourceExec: file_groups={1 group: 
[[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/dynamic_filter_pushdown_config/test_data.parquet]]},
 projection=[id, value], file_type=parquet, predicate=value@1 > 3 AND 
DynamicFilter [ value@1 IS NULL OR value@1 > 800 ], 
pruning_predicate=value_null_count@1 != row_count@2 AND value_max@0 > 3 AND 
(value_null_count@1 > 0 OR value_null_count@1 != row_count@2 AND value_max@0 > 
800), required_guarantees=[], metrics=[output_rows=10, 
elapsed_compute=, output_bytes=80.0 B, 
files_ranges_pruned_statistics=1 total β†’ 1 matched, 
row_groups_pruned_statistics=1 total β†’ 1 matched -> 1 fully matched, 
row_groups_pruned_bloom_filter=1 total β†’ 1 matched, page_index_pages_pruned=0 
total β†’ 0 matched, limit_pruned_row_groups=0 total β†’ 0 matched, 
bytes_scanned=210, metadata_load_time=, 
scan_efficiency_ratio=18.31% (210/1.15 K)]

Review Comment:
   yes, fixed it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-29 Thread via GitHub


xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3165227615


##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -3773,11 +3773,11 @@ impl PartialOrd for Aggregate {
 /// Returns 0 when no grouping set is duplicated.
 fn max_grouping_set_duplicate_ordinal(group_expr: &[Expr]) -> usize {
 if let Some(Expr::GroupingSet(GroupingSet::GroupingSets(sets))) = 
group_expr.first() {
-let mut counts: HashMap<&[Expr], usize> = HashMap::new();
-for set in sets {
-*counts.entry(set).or_insert(0) += 1;
-}
-counts.into_values().max().unwrap_or(0).saturating_sub(1)
+sets.iter()

Review Comment:
   yes, I reverted the changes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-29 Thread via GitHub


xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3165224353


##
benchmarks/src/bin/dfbench.rs:
##
@@ -20,16 +20,13 @@ use datafusion::error::Result;
 
 use clap::{Parser, Subcommand};
 
-#[cfg(all(feature = "snmalloc", feature = "mimalloc"))]
-compile_error!(
-"feature \"snmalloc\" and feature \"mimalloc\" cannot be enabled at the 
same time"
-);
-
 #[cfg(feature = "snmalloc")]
 #[global_allocator]
 static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc;
 
-#[cfg(feature = "mimalloc")]
+// `cargo clippy --all-features` enables both allocator features, so prefer

Review Comment:
   yes, opened a seperate PR: https://github.com/apache/datafusion/pull/21905



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-29 Thread via GitHub


xudong963 commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3165226806


##
datafusion/datasource-parquet/benches/parquet_fully_matched_filter.rs:
##
@@ -0,0 +1,292 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Benchmark for skipping filter evaluation on fully matched row groups.
+//!
+//! This benchmark measures the performance improvement from skipping
+//! RowFilter evaluation when row group statistics prove that all rows
+//! in a row group satisfy the predicate.
+//!
+//! Dataset layout:
+//! - 20 row groups, each with 50_000 rows
+//! - Column `x`: i64, values in range [0, 100) for all row groups
+//! - Column `payload`: Utf8, 1 KB string (makes filter column decoding cost 
visible)
+//!
+//! Predicate: `x < 200`
+//! - ALL row groups are fully matched (max(x) < 200 for every row group)
+//! - Without the optimization: RowFilter decodes `x` and evaluates predicate 
for every row
+//! - With the optimization: RowFilter is skipped entirely (statistics prove 
all rows match)
+//!
+//! Uses `ParquetPushDecoder` directly to exercise the exact code path
+//! that DataFusion's async opener uses.
+
+use std::path::PathBuf;
+use std::sync::{Arc, LazyLock};
+
+use arrow::array::{Int64Array, RecordBatch, StringBuilder};
+use arrow::datatypes::{DataType, Field, Schema, SchemaRef};
+use bytes::Bytes;
+use criterion::{Criterion, Throughput, criterion_group, criterion_main};
+use datafusion_common::ScalarValue;
+use datafusion_datasource_parquet::{ParquetFileMetrics, build_row_filter};
+use datafusion_expr::{Expr, col};
+use datafusion_physical_expr::planner::logical2physical;
+use datafusion_physical_plan::metrics::ExecutionPlanMetricsSet;
+use parquet::DecodeResult;
+use parquet::arrow::arrow_reader::ArrowReaderMetadata;
+use parquet::arrow::push_decoder::ParquetPushDecoderBuilder;
+use parquet::file::properties::WriterProperties;
+use parquet::{arrow::ArrowWriter, file::metadata::ParquetMetaData};
+use tempfile::TempDir;
+
+const ROW_GROUP_SIZE: usize = 50_000;
+const NUM_ROW_GROUPS: usize = 20;
+const TOTAL_ROWS: usize = ROW_GROUP_SIZE * NUM_ROW_GROUPS;
+const PAYLOAD_LEN: usize = 1024;
+
+struct BenchmarkDataset {

Review Comment:
   yes, this one https://github.com/apache/datafusion/pull/21945
   
   Do you think we should remove the current bench code in the PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-28 Thread via GitHub


xudong963 commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4340475327

   @alamb THanks for review, I marked the PR as draft, after I resolve all of 
them and the pre PRs are merged, I'll make it ready to review again


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-28 Thread via GitHub


alamb commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3156498869


##
datafusion/datasource-parquet/src/row_group_filter.rs:
##
@@ -357,13 +366,38 @@ impl RowGroupAccessPlanFilter {
 return;
 };
 
+// Collect unique column names referenced by the predicate so we can
+// check for NULLs. Rows with NULL predicate columns evaluate to NULL
+// (not true), so a row group with NULLs cannot be "fully matched."
+let predicate_columns =
+
datafusion_physical_expr::utils::collect_columns(predicate.orig_expr());
+
+let null_count_converters: Vec = predicate_columns
+.iter()
+.filter_map(|col| {
+StatisticsConverter::try_new(col.name(), arrow_schema, 
parquet_schema)

Review Comment:
   We should probably set this option to `false` (it defaults to true) to be 
super safe:
   ```
 pub fn with_missing_null_counts_as_zero(mut self, 
missing_null_counts_as_zero: bool) -> Self 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-28 Thread via GitHub


alamb commented on code in PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#discussion_r3156285145


##
benchmarks/src/bin/dfbench.rs:
##
@@ -20,16 +20,13 @@ use datafusion::error::Result;
 
 use clap::{Parser, Subcommand};
 
-#[cfg(all(feature = "snmalloc", feature = "mimalloc"))]
-compile_error!(
-"feature \"snmalloc\" and feature \"mimalloc\" cannot be enabled at the 
same time"
-);
-
 #[cfg(feature = "snmalloc")]
 #[global_allocator]
 static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc;
 
-#[cfg(feature = "mimalloc")]
+// `cargo clippy --all-features` enables both allocator features, so prefer

Review Comment:
   this seems unrelated to the rest of this PR -- perhaps we can pull it into 
its own PR for easier review and consideration



##
datafusion/datasource-parquet/benches/parquet_fully_matched_filter.rs:
##
@@ -0,0 +1,292 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Benchmark for skipping filter evaluation on fully matched row groups.
+//!
+//! This benchmark measures the performance improvement from skipping
+//! RowFilter evaluation when row group statistics prove that all rows
+//! in a row group satisfy the predicate.
+//!
+//! Dataset layout:
+//! - 20 row groups, each with 50_000 rows
+//! - Column `x`: i64, values in range [0, 100) for all row groups
+//! - Column `payload`: Utf8, 1 KB string (makes filter column decoding cost 
visible)
+//!
+//! Predicate: `x < 200`
+//! - ALL row groups are fully matched (max(x) < 200 for every row group)
+//! - Without the optimization: RowFilter decodes `x` and evaluates predicate 
for every row
+//! - With the optimization: RowFilter is skipped entirely (statistics prove 
all rows match)
+//!
+//! Uses `ParquetPushDecoder` directly to exercise the exact code path
+//! that DataFusion's async opener uses.
+
+use std::path::PathBuf;
+use std::sync::{Arc, LazyLock};
+
+use arrow::array::{Int64Array, RecordBatch, StringBuilder};
+use arrow::datatypes::{DataType, Field, Schema, SchemaRef};
+use bytes::Bytes;
+use criterion::{Criterion, Throughput, criterion_group, criterion_main};
+use datafusion_common::ScalarValue;
+use datafusion_datasource_parquet::{ParquetFileMetrics, build_row_filter};
+use datafusion_expr::{Expr, col};
+use datafusion_physical_expr::planner::logical2physical;
+use datafusion_physical_plan::metrics::ExecutionPlanMetricsSet;
+use parquet::DecodeResult;
+use parquet::arrow::arrow_reader::ArrowReaderMetadata;
+use parquet::arrow::push_decoder::ParquetPushDecoderBuilder;
+use parquet::file::properties::WriterProperties;
+use parquet::{arrow::ArrowWriter, file::metadata::ParquetMetaData};
+use tempfile::TempDir;
+
+const ROW_GROUP_SIZE: usize = 50_000;
+const NUM_ROW_GROUPS: usize = 20;
+const TOTAL_ROWS: usize = ROW_GROUP_SIZE * NUM_ROW_GROUPS;
+const PAYLOAD_LEN: usize = 1024;
+
+struct BenchmarkDataset {

Review Comment:
   Rather than a targeted benchmark like this that will likely not get run all 
that often, I recommend adding a new benchmark to the "clickbench_extended" 
   
   
https://github.com/apache/datafusion/tree/main/benchmarks/queries/clickbench#extended-queries
   
   I bet you could write a pretty good one with some substring match where this 
optimization would help a lot.
   
   I recommend making a separate PR to add such a query so we can show off this 
PR's performance improvement
   
   



##
benchmarks/src/bin/imdb.rs:
##
@@ -21,16 +21,13 @@ use clap::{Parser, Subcommand};
 use datafusion::error::Result;
 use datafusion_benchmarks::imdb;
 
-#[cfg(all(feature = "snmalloc", feature = "mimalloc"))]
-compile_error!(
-"feature \"snmalloc\" and feature \"mimalloc\" cannot be enabled at the 
same time"
-);
-
 #[cfg(feature = "snmalloc")]
 #[global_allocator]
 static ALLOC: snmalloc_rs::SnMalloc = snmalloc_rs::SnMalloc;
 
-#[cfg(feature = "mimalloc")]
+// `cargo clippy --all-features` enables both allocator features, so prefer

Review Comment:
   likewise here



##
datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt:
##
@@ -104,7 +104,7 @@ Plan with Metrics
 03)ProjectionExec: expr=[id@0 as id, value@1 as v, value@1 + id@0 as 
name], met

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-28 Thread via GitHub


alamb commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4337946630

   Checking this one out


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-24 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4316119761

   Benchmark for [this 
request](https://github.com/apache/datafusion/pull/21637#issuecomment-4315401410)
 hit the 7200s job deadline before finishing.
   
   Benchmarks requested: `tpch`
   
   Kubernetes message
   
   ```
   Job was active longer than specified deadline
   ```
   
   
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-24 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4315558192

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4315401410)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark tpcds_sf1.json
   
   
┏━━━┳━━━┳━━━┳━━━┓
   ┃ Query ┃  HEAD ┃  
datafusion_issue-19028-benchmark ┃Change ┃
   
┑━━━╇━━━╇━━━╇━━━┩
   β”‚ QQuery 1  β”‚   6.36 / 6.79 Β±0.77 / 8.33 ms β”‚   6.34 
/ 6.82 Β±0.72 / 8.27 ms β”‚ no change β”‚
   β”‚ QQuery 2  β”‚ 111.62 / 112.52 Β±0.84 / 114.02 ms β”‚ 112.17 / 
113.85 Β±2.07 / 117.48 ms β”‚ no change β”‚
   β”‚ QQuery 3  β”‚ 109.40 / 111.11 Β±1.03 / 112.42 ms β”‚ 109.52 / 
110.65 Β±1.11 / 112.08 ms β”‚ no change β”‚
   β”‚ QQuery 4  β”‚ 1063.85 / 1079.77 Β±11.64 / 1096.79 ms β”‚  1069.66 / 
1081.30 Β±6.57 / 1087.59 ms β”‚ no change β”‚
   β”‚ QQuery 5  β”‚ 190.68 / 196.08 Β±2.96 / 199.40 ms β”‚ 197.50 / 
199.03 Β±2.11 / 203.21 ms β”‚ no change β”‚
   β”‚ QQuery 6  β”‚ 256.22 / 263.97 Β±5.03 / 271.37 ms β”‚ 265.28 / 
269.52 Β±3.66 / 275.88 ms β”‚ no change β”‚
   β”‚ QQuery 7  β”‚ 331.83 / 336.36 Β±3.29 / 341.13 ms β”‚ 331.63 / 
337.42 Β±4.01 / 342.43 ms β”‚ no change β”‚
   β”‚ QQuery 8  β”‚ 158.03 / 163.27 Β±2.94 / 166.58 ms β”‚ 160.05 / 
163.64 Β±3.55 / 168.95 ms β”‚ no change β”‚
   β”‚ QQuery 9  β”‚223.97 / 240.95 Β±18.05 / 275.63 ms β”‚225.92 / 
246.17 Β±15.49 / 264.82 ms β”‚ no change β”‚
   β”‚ QQuery 10 β”‚ 168.84 / 171.94 Β±2.22 / 174.46 ms β”‚ 164.65 / 
173.05 Β±6.51 / 181.03 ms β”‚ no change β”‚
   β”‚ QQuery 11 β”‚ 708.27 / 717.29 Β±8.19 / 730.30 ms β”‚ 699.93 / 
713.92 Β±7.71 / 721.53 ms β”‚ no change β”‚
   β”‚ QQuery 12 β”‚37.36 / 40.07 Β±2.19 / 43.94 ms β”‚37.58 / 
39.99 Β±1.40 / 41.78 ms β”‚ no change β”‚
   β”‚ QQuery 13 β”‚ 569.29 / 577.80 Β±7.20 / 589.16 ms β”‚ 561.17 / 
575.21 Β±9.89 / 591.97 ms β”‚ no change β”‚
   β”‚ QQuery 14 β”‚904.33 / 918.83 Β±10.33 / 929.94 ms β”‚ 899.59 / 
912.64 Β±9.64 / 928.42 ms β”‚ no c

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-24 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4315511766

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4315401410)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark tpch_sf10.json
   
   
┏━━━┳━━━┳━━┳━━━┓
   ┃ Query ┃  HEAD ┃ 
datafusion_issue-19028-benchmark ┃Change ┃
   
┑━━━╇━━━╇━━╇━━━┩
   β”‚ QQuery 1  β”‚ 341.86 / 343.87 Β±1.57 / 345.89 ms β”‚341.55 / 342.83 
Β±1.50 / 345.60 ms β”‚ no change β”‚
   β”‚ QQuery 2  β”‚ 179.02 / 182.05 Β±1.67 / 184.13 ms β”‚178.22 / 180.43 
Β±1.81 / 182.75 ms β”‚ no change β”‚
   β”‚ QQuery 3  β”‚ 376.68 / 377.93 Β±0.92 / 379.08 ms β”‚372.42 / 376.12 
Β±2.91 / 379.53 ms β”‚ no change β”‚
   β”‚ QQuery 4  β”‚ 330.86 / 333.01 Β±1.95 / 335.80 ms β”‚323.24 / 327.00 
Β±3.56 / 333.50 ms β”‚ no change β”‚
   β”‚ QQuery 5  β”‚ 631.16 / 646.36 Β±9.78 / 659.99 ms β”‚618.86 / 630.04 
Β±7.42 / 640.48 ms β”‚ no change β”‚
   β”‚ QQuery 6  β”‚ 295.54 / 296.50 Β±0.87 / 298.02 ms β”‚292.67 / 294.98 
Β±1.30 / 296.27 ms β”‚ no change β”‚
   β”‚ QQuery 7  β”‚ 435.42 / 438.78 Β±2.54 / 442.85 ms β”‚432.02 / 434.27 
Β±1.86 / 437.54 ms β”‚ no change β”‚
   β”‚ QQuery 8  β”‚ 620.16 / 623.49 Β±3.23 / 627.85 ms β”‚614.57 / 618.60 
Β±2.10 / 620.58 ms β”‚ no change β”‚
   β”‚ QQuery 9  β”‚ 1427.74 / 1444.82 Β±11.23 / 1459.99 ms β”‚ 1420.76 / 1429.41 
Β±5.97 / 1437.76 ms β”‚ no change β”‚
   β”‚ QQuery 10 β”‚ 414.45 / 420.02 Β±3.60 / 424.13 ms β”‚409.96 / 412.91 
Β±2.35 / 415.56 ms β”‚ no change β”‚
   β”‚ QQuery 11 β”‚ 143.61 / 145.47 Β±3.02 / 151.49 ms β”‚142.75 / 145.97 
Β±3.38 / 151.37 ms β”‚ no change β”‚
   β”‚ QQuery 12 β”‚ 351.88 / 356.48 Β±3.60 / 362.79 ms β”‚350.62 / 352.53 
Β±2.15 / 356.43 ms β”‚ no change β”‚
   β”‚ QQuery 13 β”‚ 395.48 / 399.86 Β±2.41 / 402.69 ms β”‚402.20 / 410.77 
Β±9.07 / 423.63 ms β”‚ no change β”‚
   β”‚ QQuery 14 β”‚ 217.73 / 218.84 Β±0.93 / 220.14 ms β”‚215.12 / 217.76 
Β±2.21 / 220.83 ms β”‚ no change β”‚
   β”‚ QQuery 15 β”‚ 426.15 / 433.18 Β±4.47 / 438.50 ms β”‚424.73 / 430.62 
Β±3.77 / 435.54 ms β”‚ no change β”‚
   β”‚ QQuery 16 β”‚ 111.62 / 114.07 Β±1.71 / 116.46 ms β”‚111.56 / 115.66 
Β±2.74 / 118.63 ms β”‚ no chan

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-24 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4315431482

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4315401410)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4315401410-1829-s9qw7 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(0b1f441c84a3f62e783f0a5d050da478b5a64233) to a0dbbab (merge-base) 
[diff](https://github.com/apache/datafusion/compare/a0dbbab5849596ecb3db48d9e168f247155209e1..0b1f441c84a3f62e783f0a5d050da478b5a64233)
 using: tpch10
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-24 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4315418648

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4315401410)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4315401410-1827-q7smj 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(0b1f441c84a3f62e783f0a5d050da478b5a64233) to a0dbbab (merge-base) 
[diff](https://github.com/apache/datafusion/compare/a0dbbab5849596ecb3db48d9e168f247155209e1..0b1f441c84a3f62e783f0a5d050da478b5a64233)
 using: tpcds
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-24 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4315415411

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4315401410)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4315401410-1828-kdbct 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(0b1f441c84a3f62e783f0a5d050da478b5a64233) to a0dbbab (merge-base) 
[diff](https://github.com/apache/datafusion/compare/a0dbbab5849596ecb3db48d9e168f247155209e1..0b1f441c84a3f62e783f0a5d050da478b5a64233)
 using: tpch
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-24 Thread via GitHub


Dandandan commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4315401410

   
   run benchmarks tpcds tpch tpch10
   
   ```
   env:
 DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
 DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-16 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258571734

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4258430739)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark tpcds_sf1.json
   
   
┏━━━┳━━━┳━━━┳━━━┓
   ┃ Query ┃  HEAD ┃  
datafusion_issue-19028-benchmark ┃Change ┃
   
┑━━━╇━━━╇━━━╇━━━┩
   β”‚ QQuery 1  β”‚   6.78 / 7.24 Β±0.79 / 8.83 ms β”‚   6.26 
/ 6.73 Β±0.77 / 8.26 ms β”‚ +1.08x faster β”‚
   β”‚ QQuery 2  β”‚113.71 / 124.29 Β±18.52 / 161.25 ms β”‚112.31 / 
123.13 Β±18.74 / 160.53 ms β”‚ no change β”‚
   β”‚ QQuery 3  β”‚ 112.25 / 113.70 Β±0.77 / 114.45 ms β”‚ 111.22 / 
112.97 Β±1.16 / 114.69 ms β”‚ no change β”‚
   β”‚ QQuery 4  β”‚  1132.74 / 1143.41 Β±5.89 / 1148.64 ms β”‚ 1107.26 / 
1146.56 Β±21.33 / 1168.38 ms β”‚ no change β”‚
   β”‚ QQuery 5  β”‚ 193.18 / 198.73 Β±3.48 / 202.75 ms β”‚ 192.26 / 
198.74 Β±4.87 / 204.14 ms β”‚ no change β”‚
   β”‚ QQuery 6  β”‚268.47 / 283.49 Β±14.07 / 305.59 ms β”‚ 289.24 / 
296.08 Β±7.41 / 306.80 ms β”‚ no change β”‚
   β”‚ QQuery 7  β”‚ 329.27 / 334.93 Β±4.12 / 341.05 ms β”‚ 333.72 / 
345.13 Β±8.97 / 358.71 ms β”‚ no change β”‚
   β”‚ QQuery 8  β”‚ 164.06 / 167.16 Β±2.20 / 169.86 ms β”‚ 161.48 / 
164.40 Β±1.76 / 166.98 ms β”‚ no change β”‚
   β”‚ QQuery 9  β”‚249.71 / 257.13 Β±10.05 / 277.05 ms β”‚ 243.52 / 
255.37 Β±6.89 / 263.77 ms β”‚ no change β”‚
   β”‚ QQuery 10 β”‚ 173.92 / 179.44 Β±3.67 / 183.86 ms β”‚ 170.42 / 
176.28 Β±3.60 / 181.23 ms β”‚ no change β”‚
   β”‚ QQuery 11 β”‚731.39 / 743.51 Β±10.05 / 760.89 ms β”‚739.67 / 
750.42 Β±13.72 / 777.02 ms β”‚ no change β”‚
   β”‚ QQuery 12 β”‚39.06 / 41.90 Β±2.83 / 47.25 ms β”‚40.35 / 
42.34 Β±1.88 / 45.21 ms β”‚ no change β”‚
   β”‚ QQuery 13 β”‚ 564.43 / 571.92 Β±7.85 / 586.81 ms β”‚ 572.14 / 
580.99 Β±5.22 / 586.01 ms β”‚ no change β”‚
   β”‚ QQuery 14 β”‚ 920.93 / 931.52 Β±8.71 / 941.30 ms β”‚917.34 / 
937.33 Β±16.00 / 955.10 ms β”‚ no c

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-16 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258444222

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4258430739)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4258430739-1347-rltwq 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(0b1f441c84a3f62e783f0a5d050da478b5a64233) to a0dbbab (merge-base) 
[diff](https://github.com/apache/datafusion/compare/a0dbbab5849596ecb3db48d9e168f247155209e1..0b1f441c84a3f62e783f0a5d050da478b5a64233)
 using: tpcds
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-16 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258443837

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4258430739)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4258430739-1348-rbszv 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(0b1f441c84a3f62e783f0a5d050da478b5a64233) to a0dbbab (merge-base) 
[diff](https://github.com/apache/datafusion/compare/a0dbbab5849596ecb3db48d9e168f247155209e1..0b1f441c84a3f62e783f0a5d050da478b5a64233)
 using: tpch
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-16 Thread via GitHub


xudong963 commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258430739

   run benchmarks tpcds tpch
   
   ```
   env:
 DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
 DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-16 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258411232

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4258253391)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark tpcds_sf1.json
   
   
┏━━━┳━━━┳━━━┳━━━┓
   ┃ Query ┃  HEAD ┃  
datafusion_issue-19028-benchmark ┃Change ┃
   
┑━━━╇━━━╇━━━╇━━━┩
   β”‚ QQuery 1  β”‚   6.56 / 7.07 Β±0.72 / 8.50 ms β”‚   6.39 
/ 6.83 Β±0.75 / 8.32 ms β”‚ no change β”‚
   β”‚ QQuery 2  β”‚ 111.08 / 111.90 Β±0.60 / 112.85 ms β”‚112.56 / 
123.17 Β±18.94 / 160.92 ms β”‚  1.10x slower β”‚
   β”‚ QQuery 3  β”‚ 109.73 / 110.86 Β±1.14 / 112.61 ms β”‚ 109.83 / 
110.94 Β±0.63 / 111.46 ms β”‚ no change β”‚
   β”‚ QQuery 4  β”‚ 1107.32 / 1128.24 Β±17.71 / 1149.81 ms β”‚ 1094.31 / 
1122.25 Β±16.85 / 1138.95 ms β”‚ no change β”‚
   β”‚ QQuery 5  β”‚ 195.12 / 197.75 Β±2.90 / 203.16 ms β”‚ 191.09 / 
195.59 Β±2.92 / 199.56 ms β”‚ no change β”‚
   β”‚ QQuery 6  β”‚270.75 / 290.38 Β±12.06 / 304.92 ms β”‚ 264.99 / 
271.22 Β±5.06 / 279.74 ms β”‚ +1.07x faster β”‚
   β”‚ QQuery 7  β”‚ 333.39 / 341.33 Β±6.33 / 352.06 ms β”‚ 336.01 / 
341.39 Β±5.41 / 348.64 ms β”‚ no change β”‚
   β”‚ QQuery 8  β”‚ 163.92 / 165.53 Β±1.33 / 167.48 ms β”‚ 157.48 / 
162.72 Β±4.33 / 169.19 ms β”‚ no change β”‚
   β”‚ QQuery 9  β”‚210.53 / 243.02 Β±26.88 / 271.53 ms β”‚200.29 / 
247.70 Β±26.70 / 273.25 ms β”‚ no change β”‚
   β”‚ QQuery 10 β”‚ 174.16 / 179.08 Β±4.06 / 184.11 ms β”‚ 172.96 / 
182.10 Β±8.80 / 196.09 ms β”‚ no change β”‚
   β”‚ QQuery 11 β”‚ 749.49 / 756.26 Β±4.16 / 762.43 ms β”‚ 732.29 / 
740.02 Β±6.31 / 751.01 ms β”‚ no change β”‚
   β”‚ QQuery 12 β”‚37.94 / 40.63 Β±1.46 / 41.99 ms β”‚37.67 / 
39.33 Β±0.88 / 40.10 ms β”‚ no change β”‚
   β”‚ QQuery 13 β”‚ 564.75 / 575.46 Β±6.80 / 583.56 ms β”‚ 553.62 / 
557.32 Β±3.11 / 561.74 ms β”‚ no change β”‚
   β”‚ QQuery 14 β”‚902.52 / 919.84 Β±10.02 / 932.50 ms β”‚ 917.26 / 
923.35 Β±6.56 / 933.44 ms β”‚ no c

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-16 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258384364

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4258253391)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark clickbench_partitioned.json
   
   
┏━━━┳━━━┳━━━┳━━━┓
   ┃ Query ┃  HEAD ┃  
datafusion_issue-19028-benchmark ┃Change ┃
   
┑━━━╇━━━╇━━━╇━━━┩
   β”‚ QQuery 0  β”‚  1.19 / 4.48 Β±6.41 / 17.31 ms β”‚  1.21 / 4.53 
Β±6.45 / 17.43 ms β”‚ no change β”‚
   β”‚ QQuery 1  β”‚15.53 / 15.86 Β±0.20 / 16.13 ms β”‚15.48 / 16.04 
Β±0.37 / 16.56 ms β”‚ no change β”‚
   β”‚ QQuery 2  β”‚45.06 / 45.31 Β±0.28 / 45.84 ms β”‚44.29 / 44.47 
Β±0.17 / 44.74 ms β”‚ no change β”‚
   β”‚ QQuery 3  β”‚42.27 / 44.56 Β±1.49 / 45.96 ms β”‚41.94 / 45.97 
Β±2.32 / 48.58 ms β”‚ no change β”‚
   β”‚ QQuery 4  β”‚ 283.50 / 294.28 Β±7.03 / 305.72 ms β”‚329.85 / 365.17 
Β±41.65 / 444.83 ms β”‚  1.24x slower β”‚
   β”‚ QQuery 5  β”‚ 342.84 / 345.35 Β±3.13 / 351.35 ms β”‚374.97 / 395.60 
Β±19.29 / 427.23 ms β”‚  1.15x slower β”‚
   β”‚ QQuery 6  β”‚   5.59 / 6.12 Β±0.37 / 6.55 ms β”‚   5.43 / 6.96 
Β±1.13 / 8.44 ms β”‚  1.14x slower β”‚
   β”‚ QQuery 7  β”‚22.07 / 22.76 Β±0.61 / 23.76 ms β”‚22.08 / 23.25 
Β±1.23 / 25.53 ms β”‚ no change β”‚
   β”‚ QQuery 8  β”‚ 419.28 / 426.50 Β±3.99 / 429.99 ms β”‚439.54 / 471.62 
Β±34.51 / 534.88 ms β”‚  1.11x slower β”‚
   β”‚ QQuery 9  β”‚ 637.80 / 646.44 Β±6.72 / 658.14 ms β”‚ 669.57 / 677.05 
Β±6.41 / 685.61 ms β”‚ no change β”‚
   β”‚ QQuery 10 β”‚ 116.52 / 118.44 Β±1.50 / 120.88 ms β”‚ 118.71 / 121.93 
Β±2.15 / 125.30 ms β”‚ no change β”‚
   β”‚ QQuery 11 β”‚ 133.26 / 133.89 Β±0.61 / 135.01 ms β”‚ 131.75 / 133.61 
Β±1.99 / 137.12 ms β”‚ no change β”‚
   β”‚ QQuery 12 β”‚ 371.46 / 381.23 Β±7.11 / 392.22 ms β”‚ 387.30 / 401.70 
Β±9.11 / 411.47 ms β”‚  1.05x slower β”‚
   β”‚ QQuery 13 β”‚498.04 / 509.93 Β±11.14 / 528.73 ms β”‚536.76 / 577.98 
Β±31.48 / 619.60 ms β”‚  1.13x slower β”‚
   β”‚ QQuery 14 β”‚ 377.93 / 381.33 Β±3.52 / 387.57 ms β”‚ 388.16 / 396.09 
Β±5.96 / 406.11 ms β”‚ no change β”‚
 

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-16 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258264372

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4258253391)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4258253391-1344-qgksw 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(d6c387974b53610cd13c03a15d7b125fbad31eae) to a0dbbab (merge-base) 
[diff](https://github.com/apache/datafusion/compare/a0dbbab5849596ecb3db48d9e168f247155209e1..d6c387974b53610cd13c03a15d7b125fbad31eae)
 using: clickbench_partitioned
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-16 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258266017

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4258253391)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4258253391-1346-gz9gj 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(d6c387974b53610cd13c03a15d7b125fbad31eae) to a0dbbab (merge-base) 
[diff](https://github.com/apache/datafusion/compare/a0dbbab5849596ecb3db48d9e168f247155209e1..d6c387974b53610cd13c03a15d7b125fbad31eae)
 using: tpch
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-16 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258264405

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4258253391)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4258253391-1345-h2qr4 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(d6c387974b53610cd13c03a15d7b125fbad31eae) to a0dbbab (merge-base) 
[diff](https://github.com/apache/datafusion/compare/a0dbbab5849596ecb3db48d9e168f247155209e1..d6c387974b53610cd13c03a15d7b125fbad31eae)
 using: tpcds
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-16 Thread via GitHub


xudong963 commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4258253391

   run benchmarks
   
   ```
   env:
 DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
 DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-15 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4252030873

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4251909707)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark tpcds_sf1.json
   
   
┏━━━┳┳━━━┳━━━┓
   ┃ Query ┃   HEAD ┃  
datafusion_issue-19028-benchmark ┃Change ┃
   
┑━━━╇╇━━━╇━━━┩
   β”‚ QQuery 1  β”‚6.48 / 6.94 Β±0.78 / 8.50 ms β”‚   
6.28 / 6.77 Β±0.85 / 8.46 ms β”‚ no change β”‚
   β”‚ QQuery 2  β”‚  112.44 / 113.18 Β±0.62 / 114.19 ms β”‚ 111.54 / 
112.77 Β±0.99 / 114.35 ms β”‚ no change β”‚
   β”‚ QQuery 3  β”‚  111.25 / 112.13 Β±0.75 / 113.22 ms β”‚ 109.03 / 
110.61 Β±1.34 / 113.07 ms β”‚ no change β”‚
   β”‚ QQuery 4  β”‚   1077.60 / 1091.75 Β±8.46 / 1102.27 ms β”‚ 1082.02 / 
1100.65 Β±14.36 / 1119.07 ms β”‚ no change β”‚
   β”‚ QQuery 5  β”‚  191.56 / 194.21 Β±1.72 / 196.11 ms β”‚ 194.80 / 
197.60 Β±1.92 / 199.84 ms β”‚ no change β”‚
   β”‚ QQuery 6  β”‚  248.92 / 260.54 Β±7.02 / 268.19 ms β”‚ 259.99 / 
266.94 Β±3.97 / 270.83 ms β”‚ no change β”‚
   β”‚ QQuery 7  β”‚  331.88 / 338.40 Β±3.84 / 343.08 ms β”‚ 328.92 / 
336.27 Β±3.99 / 339.90 ms β”‚ no change β”‚
   β”‚ QQuery 8  β”‚  160.55 / 164.47 Β±2.55 / 166.74 ms β”‚ 158.41 / 
165.70 Β±3.79 / 169.06 ms β”‚ no change β”‚
   β”‚ QQuery 9  β”‚ 226.33 / 244.34 Β±11.59 / 262.42 ms β”‚232.22 / 
247.89 Β±12.22 / 264.39 ms β”‚ no change β”‚
   β”‚ QQuery 10 β”‚  169.16 / 177.09 Β±6.14 / 184.86 ms β”‚ 176.28 / 
179.23 Β±2.30 / 182.14 ms β”‚ no change β”‚
   β”‚ QQuery 11 β”‚ 699.75 / 718.20 Β±10.41 / 729.03 ms β”‚ 715.56 / 
720.36 Β±6.36 / 732.87 ms β”‚ no change β”‚
   β”‚ QQuery 12 β”‚ 37.91 / 39.73 Β±1.69 / 42.63 ms β”‚37.92 
/ 39.43 Β±1.18 / 41.54 ms β”‚ no change β”‚
   β”‚ QQuery 13 β”‚ 547.59 / 570.55 Β±16.21 / 589.20 ms β”‚ 566.78 / 
574.16 Β±5.48 / 582.10 ms β”‚ no change β”‚
   β”‚ QQuery 14 β”‚  898.55 / 913.18 Β±8.63 / 921.11 ms β”‚893.15 / 
908.07 Β±11.95 / 924

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-15 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4251925070

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4251909707)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4251909707-1287-bzczn 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(7cff519005e62b343c8be3f58fcf95ab4efde4a8) to dc973cc (merge-base) 
[diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..7cff519005e62b343c8be3f58fcf95ab4efde4a8)
 using: tpch
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-15 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4251912293

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4251909707)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4251909707-1286-42w8w 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(7cff519005e62b343c8be3f58fcf95ab4efde4a8) to dc973cc (merge-base) 
[diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..7cff519005e62b343c8be3f58fcf95ab4efde4a8)
 using: tpcds
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-15 Thread via GitHub


Dandandan commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4251909707

   run benchmarks tpcds tpch
   
   ```
   env:
 DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
 DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-15 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4250189791

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4250064780)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark tpcds_sf1.json
   
   
┏━━━┳━━━┳━━━┳━━━┓
   ┃ Query ┃  HEAD ┃  
datafusion_issue-19028-benchmark ┃Change ┃
   
┑━━━╇━━━╇━━━╇━━━┩
   β”‚ QQuery 1  β”‚   6.43 / 6.89 Β±0.82 / 8.53 ms β”‚   6.42 
/ 6.86 Β±0.75 / 8.35 ms β”‚ no change β”‚
   β”‚ QQuery 2  β”‚112.26 / 133.08 Β±24.59 / 164.16 ms β”‚111.97 / 
134.28 Β±24.27 / 164.01 ms β”‚ no change β”‚
   β”‚ QQuery 3  β”‚ 109.72 / 111.10 Β±1.35 / 113.17 ms β”‚ 108.94 / 
109.47 Β±0.52 / 110.44 ms β”‚ no change β”‚
   β”‚ QQuery 4  β”‚ 1101.37 / 1120.63 Β±21.62 / 1154.49 ms β”‚ 1083.91 / 
1117.06 Β±20.26 / 1145.73 ms β”‚ no change β”‚
   β”‚ QQuery 5  β”‚ 194.04 / 196.44 Β±1.93 / 199.35 ms β”‚ 193.48 / 
198.82 Β±3.05 / 202.47 ms β”‚ no change β”‚
   β”‚ QQuery 6  β”‚ 276.05 / 283.55 Β±6.72 / 295.55 ms β”‚ 259.52 / 
264.28 Β±4.27 / 270.83 ms β”‚ +1.07x faster β”‚
   β”‚ QQuery 7  β”‚ 339.13 / 344.97 Β±7.00 / 357.81 ms β”‚ 334.97 / 
338.71 Β±4.50 / 346.55 ms β”‚ no change β”‚
   β”‚ QQuery 8  β”‚ 154.97 / 162.83 Β±5.43 / 169.26 ms β”‚ 157.10 / 
163.80 Β±3.57 / 167.09 ms β”‚ no change β”‚
   β”‚ QQuery 9  β”‚212.58 / 243.40 Β±16.47 / 258.99 ms β”‚220.92 / 
234.46 Β±11.91 / 249.86 ms β”‚ no change β”‚
   β”‚ QQuery 10 β”‚ 174.79 / 180.39 Β±3.95 / 186.71 ms β”‚ 178.04 / 
181.39 Β±2.50 / 185.13 ms β”‚ no change β”‚
   β”‚ QQuery 11 β”‚721.40 / 736.38 Β±11.06 / 749.75 ms β”‚ 713.05 / 
723.96 Β±5.92 / 729.55 ms β”‚ no change β”‚
   β”‚ QQuery 12 β”‚38.04 / 40.22 Β±1.85 / 42.98 ms β”‚36.84 / 
39.78 Β±1.87 / 42.32 ms β”‚ no change β”‚
   β”‚ QQuery 13 β”‚ 560.99 / 573.09 Β±6.18 / 577.57 ms β”‚ 556.10 / 
564.05 Β±7.09 / 576.52 ms β”‚ no change β”‚
   β”‚ QQuery 14 β”‚ 900.42 / 910.47 Β±8.89 / 926.69 ms β”‚895.98 / 
908.53 Β±10.03 / 920.02 ms β”‚ no c

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-15 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4250163361

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4250064780)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark clickbench_partitioned.json
   
   
┏━━━┳━━━┳━━━┳━━━┓
   ┃ Query ┃  HEAD ┃  
datafusion_issue-19028-benchmark ┃Change ┃
   
┑━━━╇━━━╇━━━╇━━━┩
   β”‚ QQuery 0  β”‚  1.22 / 4.56 Β±6.51 / 17.57 ms β”‚  1.20 / 4.41 
Β±6.37 / 17.15 ms β”‚ no change β”‚
   β”‚ QQuery 1  β”‚15.66 / 16.27 Β±0.37 / 16.61 ms β”‚15.56 / 15.84 
Β±0.17 / 16.07 ms β”‚ no change β”‚
   β”‚ QQuery 2  β”‚44.40 / 44.97 Β±0.48 / 45.76 ms β”‚41.67 / 42.03 
Β±0.20 / 42.24 ms β”‚ +1.07x faster β”‚
   β”‚ QQuery 3  β”‚43.26 / 45.25 Β±1.10 / 46.30 ms β”‚39.90 / 42.02 
Β±2.10 / 44.89 ms β”‚ +1.08x faster β”‚
   β”‚ QQuery 4  β”‚ 292.95 / 297.72 Β±3.64 / 301.93 ms β”‚ 343.96 / 351.63 
Β±5.86 / 358.86 ms β”‚  1.18x slower β”‚
   β”‚ QQuery 5  β”‚342.48 / 355.45 Β±11.55 / 373.20 ms β”‚358.60 / 377.58 
Β±12.48 / 393.81 ms β”‚  1.06x slower β”‚
   β”‚ QQuery 6  β”‚   5.79 / 6.67 Β±0.78 / 8.12 ms β”‚   5.35 / 6.56 
Β±0.84 / 7.59 ms β”‚ no change β”‚
   β”‚ QQuery 7  β”‚23.57 / 24.78 Β±1.71 / 28.16 ms β”‚21.17 / 21.85 
Β±0.46 / 22.59 ms β”‚ +1.13x faster β”‚
   β”‚ QQuery 8  β”‚433.73 / 449.23 Β±10.95 / 466.52 ms β”‚ 431.86 / 443.96 
Β±8.36 / 455.92 ms β”‚ no change β”‚
   β”‚ QQuery 9  β”‚ 696.57 / 705.90 Β±6.71 / 716.77 ms β”‚647.40 / 662.69 
Β±15.12 / 684.72 ms β”‚ +1.07x faster β”‚
   β”‚ QQuery 10 β”‚ 124.78 / 129.26 Β±4.24 / 136.50 ms β”‚ 120.73 / 123.99 
Β±2.90 / 128.90 ms β”‚ no change β”‚
   β”‚ QQuery 11 β”‚ 134.73 / 136.85 Β±1.62 / 139.55 ms β”‚ 134.13 / 135.40 
Β±1.09 / 136.71 ms β”‚ no change β”‚
   β”‚ QQuery 12 β”‚ 392.49 / 396.53 Β±4.92 / 406.13 ms β”‚375.90 / 401.28 
Β±13.81 / 415.70 ms β”‚ no change β”‚
   β”‚ QQuery 13 β”‚486.25 / 505.93 Β±18.83 / 531.11 ms β”‚495.55 / 516.01 
Β±14.69 / 534.80 ms β”‚ no change β”‚
   β”‚ QQuery 14 β”‚ 379.48 / 388.55 Β±9.95 / 401.10 ms β”‚ 392.04 / 405.22 
Β±7.95 / 415.55 ms β”‚ no change β”‚
 

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-15 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4250078989

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4250064780)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4250064780-1275-p6dsj 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(7cff519005e62b343c8be3f58fcf95ab4efde4a8) to dc973cc (merge-base) 
[diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..7cff519005e62b343c8be3f58fcf95ab4efde4a8)
 using: clickbench_partitioned
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-15 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4250078401

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4250064780)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4250064780-1276-485dd 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(7cff519005e62b343c8be3f58fcf95ab4efde4a8) to dc973cc (merge-base) 
[diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..7cff519005e62b343c8be3f58fcf95ab4efde4a8)
 using: tpcds
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-15 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4250078226

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4250064780)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4250064780-1277-bn57j 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(7cff519005e62b343c8be3f58fcf95ab4efde4a8) to dc973cc (merge-base) 
[diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..7cff519005e62b343c8be3f58fcf95ab4efde4a8)
 using: tpch
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-15 Thread via GitHub


Dandandan commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4250064780

   run benchmarks
   
   ```
   env:
 DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
 DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-14 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4249657574

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4249531952)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark clickbench_partitioned.json
   
   
┏━━━┳━━━┳━━━┳━━━┓
   ┃ Query ┃  HEAD ┃  
datafusion_issue-19028-benchmark ┃Change ┃
   
┑━━━╇━━━╇━━━╇━━━┩
   β”‚ QQuery 0  β”‚  1.21 / 4.57 Β±6.54 / 17.64 ms β”‚  1.19 / 4.53 
Β±6.50 / 17.53 ms β”‚ no change β”‚
   β”‚ QQuery 1  β”‚14.60 / 15.09 Β±0.28 / 15.35 ms β”‚14.18 / 14.54 
Β±0.21 / 14.72 ms β”‚ no change β”‚
   β”‚ QQuery 2  β”‚44.67 / 45.36 Β±0.41 / 45.95 ms β”‚42.47 / 42.68 
Β±0.16 / 42.95 ms β”‚ +1.06x faster β”‚
   β”‚ QQuery 3  β”‚45.03 / 45.92 Β±0.68 / 47.13 ms β”‚38.79 / 39.49 
Β±0.36 / 39.76 ms β”‚ +1.16x faster β”‚
   β”‚ QQuery 4  β”‚ 289.32 / 295.98 Β±3.58 / 300.05 ms β”‚ 292.13 / 297.11 
Β±2.76 / 299.59 ms β”‚ no change β”‚
   β”‚ QQuery 5  β”‚ 345.56 / 349.97 Β±3.42 / 355.25 ms β”‚ 345.69 / 351.59 
Β±4.12 / 358.14 ms β”‚ no change β”‚
   β”‚ QQuery 6  β”‚   5.94 / 6.96 Β±1.12 / 8.92 ms β”‚   5.64 / 7.14 
Β±0.86 / 8.03 ms β”‚ no change β”‚
   β”‚ QQuery 7  β”‚16.99 / 17.17 Β±0.13 / 17.37 ms β”‚17.01 / 17.15 
Β±0.10 / 17.28 ms β”‚ no change β”‚
   β”‚ QQuery 8  β”‚ 416.42 / 420.65 Β±4.94 / 429.53 ms β”‚ 423.96 / 432.43 
Β±5.73 / 439.81 ms β”‚ no change β”‚
   β”‚ QQuery 9  β”‚ 670.51 / 675.82 Β±2.96 / 678.97 ms β”‚664.02 / 680.86 
Β±13.11 / 699.96 ms β”‚ no change β”‚
   β”‚ QQuery 10 β”‚   93.78 / 96.79 Β±3.92 / 104.07 ms β”‚   90.09 / 94.33 
Β±4.02 / 101.56 ms β”‚ no change β”‚
   β”‚ QQuery 11 β”‚ 107.42 / 108.60 Β±0.83 / 110.01 ms β”‚ 103.27 / 104.63 
Β±1.20 / 106.15 ms β”‚ no change β”‚
   β”‚ QQuery 12 β”‚ 340.86 / 348.17 Β±4.58 / 354.86 ms β”‚ 347.42 / 353.78 
Β±6.55 / 365.94 ms β”‚ no change β”‚
   β”‚ QQuery 13 β”‚458.87 / 479.04 Β±15.10 / 503.79 ms β”‚471.89 / 494.71 
Β±15.40 / 513.63 ms β”‚ no change β”‚
   β”‚ QQuery 14 β”‚ 344.75 / 352.07 Β±3.77 / 355.18 ms β”‚ 353.25 / 362.30 
Β±5.53 / 368.55 ms β”‚ no change β”‚
 

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-14 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4249652693

   πŸ€– Benchmark completed (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4249531952)
   
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB)
   
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Details
   
   
   ```
   Comparing HEAD and datafusion_issue-19028-benchmark
   
   Benchmark tpcds_sf1.json
   
   
┏━━━┳━━┳━━┳━━━┓
   ┃ Query ┃ HEAD ┃ 
datafusion_issue-19028-benchmark ┃Change ┃
   
┑━━━╇━━╇━━╇━━━┩
   β”‚ QQuery 1  β”‚  6.82 / 7.29 Β±0.79 / 8.85 ms β”‚  6.74 / 
7.14 Β±0.70 / 8.54 ms β”‚ no change β”‚
   β”‚ QQuery 2  β”‚146.98 / 147.42 Β±0.35 / 147.79 ms β”‚145.96 / 
147.02 Β±0.81 / 147.93 ms β”‚ no change β”‚
   β”‚ QQuery 3  β”‚113.51 / 114.64 Β±1.05 / 116.56 ms β”‚113.47 / 
114.28 Β±0.44 / 114.73 ms β”‚ no change β”‚
   β”‚ QQuery 4  β”‚1386.09 / 1399.21 Β±10.38 / 1411.77 ms β”‚1352.19 / 
1407.26 Β±35.55 / 1444.22 ms β”‚ no change β”‚
   β”‚ QQuery 5  β”‚172.46 / 174.50 Β±1.35 / 175.76 ms β”‚173.06 / 
175.55 Β±2.07 / 177.88 ms β”‚ no change β”‚
   β”‚ QQuery 6  β”‚   860.53 / 877.99 Β±13.21 / 895.55 ms β”‚   849.00 / 
884.78 Β±31.41 / 930.63 ms β”‚ no change β”‚
   β”‚ QQuery 7  β”‚350.78 / 352.55 Β±1.44 / 354.25 ms β”‚341.24 / 
343.02 Β±2.99 / 348.99 ms β”‚ no change β”‚
   β”‚ QQuery 8  β”‚116.16 / 117.82 Β±1.05 / 118.93 ms β”‚118.75 / 
120.05 Β±1.25 / 122.19 ms β”‚ no change β”‚
   β”‚ QQuery 9  β”‚   103.17 / 111.09 Β±10.38 / 131.32 ms β”‚103.31 / 
106.20 Β±1.68 / 108.15 ms β”‚ no change β”‚
   β”‚ QQuery 10 β”‚109.43 / 111.50 Β±1.39 / 113.23 ms β”‚102.24 / 
103.69 Β±0.87 / 104.62 ms β”‚ +1.08x faster β”‚
   β”‚ QQuery 11 β”‚  992.86 / 1000.18 Β±5.18 / 1006.68 ms β”‚ 1003.62 / 
1020.01 Β±9.58 / 1033.23 ms β”‚ no change β”‚
   β”‚ QQuery 12 β”‚   45.38 / 48.41 Β±1.77 / 50.72 ms β”‚   45.25 / 
47.00 Β±1.02 / 48.22 ms β”‚ no change β”‚
   β”‚ QQuery 13 β”‚402.64 / 405.27 Β±1.45 / 407.07 ms β”‚401.73 / 
406.37 Β±3.14 / 410.21 ms β”‚ no change β”‚
   β”‚ QQuery 14 β”‚  999.85 / 1012.43 Β±6.34 / 1016.71 ms β”‚ 1016.34 / 
1025.63 Β±5.29 / 1031.40 ms β”‚ no change β”‚
   β”‚ QQuery 15 β”‚  

Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-14 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4249548180

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4249531952)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4249531952-1270-d8c7t 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(5da11eab0b2d8feabb51a3f7a08ee6855029f363) to dc973cc (merge-base) 
[diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..5da11eab0b2d8feabb51a3f7a08ee6855029f363)
 using: tpcds
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-14 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4249548216

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4249531952)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4249531952-1269-r4t4b 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(5da11eab0b2d8feabb51a3f7a08ee6855029f363) to dc973cc (merge-base) 
[diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..5da11eab0b2d8feabb51a3f7a08ee6855029f363)
 using: clickbench_partitioned
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-14 Thread via GitHub


adriangbot commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4249548411

   πŸ€– Benchmark running (GKE) | 
[trigger](https://github.com/apache/datafusion/pull/21637#issuecomment-4249531952)
   **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux 
bench-c4249531952-1271-fbkqb 6.12.55+ #1 SMP Sun Feb  1 08:59:41 UTC 2026 
aarch64 GNU/Linux`
   CPU Details (lscpu)
   
   ```
   Architecture:aarch64
   CPU op-mode(s):  64-bit
   Byte Order:  Little Endian
   CPU(s):  16
   On-line CPU(s) list: 0-15
   Vendor ID:   ARM
   Model name:  Neoverse-V2
   Model:   1
   Thread(s) per core:  1
   Core(s) per cluster: 16
   Socket(s):   -
   Cluster(s):  1
   Stepping:r0p1
   BogoMIPS:2000.00
   Flags:   fp asimd evtstrm aes pmull sha1 
sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 
sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 
sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm 
bf16 dgh rng bti
   L1d cache:   1 MiB (16 instances)
   L1i cache:   1 MiB (16 instances)
   L2 cache:32 MiB (16 instances)
   L3 cache:80 MiB (1 instance)
   NUMA node(s):1
   NUMA node0 CPU(s):   0-15
   Vulnerability Gather data sampling:  Not affected
   Vulnerability Indirect target selection: Not affected
   Vulnerability Itlb multihit: Not affected
   Vulnerability L1tf:  Not affected
   Vulnerability Mds:   Not affected
   Vulnerability Meltdown:  Not affected
   Vulnerability Mmio stale data:   Not affected
   Vulnerability Reg file data sampling:Not affected
   Vulnerability Retbleed:  Not affected
   Vulnerability Spec rstack overflow:  Not affected
   Vulnerability Spec store bypass: Mitigation; Speculative Store 
Bypass disabled via prctl
   Vulnerability Spectre v1:Mitigation; __user pointer 
sanitization
   Vulnerability Spectre v2:Mitigation; CSV2, BHB
   Vulnerability Srbds: Not affected
   Vulnerability Tsa:   Not affected
   Vulnerability Tsx async abort:   Not affected
   Vulnerability Vmscape:   Not affected
   ```
   
   
   
   Comparing datafusion/issue-19028-benchmark 
(5da11eab0b2d8feabb51a3f7a08ee6855029f363) to dc973cc (merge-base) 
[diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..5da11eab0b2d8feabb51a3f7a08ee6855029f363)
 using: tpch
   Results will be posted here when complete
   
   ---
   [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) 
against this benchmark runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Skip RowFilter and page pruning for fully matched row groups [datafusion]

2026-04-14 Thread via GitHub


Dandandan commented on PR #21637:
URL: https://github.com/apache/datafusion/pull/21637#issuecomment-4249531952

   run benchmarks
   
   ```
   env:
   PUSHDOWN_FILTERS: true
   REORDER_FILTERS: true
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]