Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
ariel-miculas commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r3169360091 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## Review Comment: Could be outside of this PR's scope, but another question I have is why isn't there a transition to `ProducingBlocks` as soon as we have a batch ready. Instead, all the input is consumed first before producing the first batch (assuming there's no group ordering) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
ariel-miculas commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r3169039554 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## Review Comment: should there also be a transition to `ProducingBlocks` here instead of `ProducingOutput` if enable_blocked_groups is set? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4346040055 Refactoring for zero cost when disabling the blocked approach -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4343775983 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4343604373) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and intermeidate-result-blocked-approach Benchmark clickbench_partitioned.json βββββ³ββββ³ββββ³ββββ β Query β HEAD β intermeidate-result-blocked-approach βChange β β‘ββββββββββββββββ© β QQuery 0 β 1.20 / 4.69 Β±6.81 / 18.31 ms β 1.24 / 4.81 Β±6.91 / 18.63 ms β no change β β QQuery 1 β12.31 / 12.90 Β±0.32 / 13.19 ms β13.31 / 13.63 Β±0.24 / 13.98 ms β 1.06x slower β β QQuery 2 β37.22 / 37.65 Β±0.25 / 37.95 ms β36.69 / 36.97 Β±0.33 / 37.61 ms β no change β β QQuery 3 β32.18 / 33.68 Β±1.39 / 35.77 ms β32.71 / 33.30 Β±0.64 / 34.13 ms β no change β β QQuery 4 β 248.07 / 258.84 Β±8.10 / 268.82 ms β 273.71 / 282.89 Β±9.01 / 299.80 ms β 1.09x slower β β QQuery 5 β 286.89 / 297.68 Β±9.46 / 312.31 ms β 286.98 / 298.35 Β±6.28 / 305.42 ms β no change β β QQuery 6 β 7.52 / 7.75 Β±0.24 / 8.16 ms β 6.82 / 7.50 Β±0.36 / 7.82 ms β no change β β QQuery 7 β14.90 / 15.08 Β±0.12 / 15.20 ms β14.12 / 14.32 Β±0.12 / 14.46 ms β +1.05x faster β β QQuery 8 β 337.53 / 345.55 Β±9.01 / 362.21 ms β 333.37 / 351.74 Β±9.95 / 363.07 ms β no change β β QQuery 9 β464.60 / 483.70 Β±16.12 / 510.06 ms β461.18 / 474.85 Β±16.45 / 505.45 ms β no change β β QQuery 10 β75.17 / 76.56 Β±0.87 / 77.89 ms β74.67 / 76.74 Β±3.02 / 82.73 ms β no change β β QQuery 11 β86.06 / 87.43 Β±1.05 / 88.94 ms β87.27 / 88.03 Β±0.48 / 88.76 ms β no change β β QQuery 12 β283.93 / 296.27 Β±12.32 / 317.43 ms β 299.81 / 303.52 Β±3.50 / 308.05 ms β no change β β QQuery 13 β 408.64 / 420.85 Β±9.93 / 437.20 ms β 433.94 / 444.02 Β±9.34 / 459.35 ms β 1.06x slower β β QQuery 14 β 291.08 / 300.86 Β±7.46 / 307.85 ms β286.68 / 306.20 Β±10.34 / 316.37 ms β no change
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4343768069 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4343604373) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and intermeidate-result-blocked-approach Benchmark tpcds_sf1.json βββββ³βββ³ββββ³ββββ β Query β HEAD β intermeidate-result-blocked-approach βChange β β‘βββββββββββββββ© β QQuery 1 β 7.00 / 7.68 Β±0.86 / 9.38 ms β 6.89 / 7.34 Β±0.80 / 8.94 ms β no change β β QQuery 2 β147.94 / 148.31 Β±0.33 / 148.87 ms β 150.27 / 150.81 Β±0.35 / 151.34 ms β no change β β QQuery 3 β113.77 / 114.52 Β±1.04 / 116.44 ms β 112.57 / 114.06 Β±0.93 / 115.49 ms β no change β β QQuery 4 β 1272.35 / 1289.93 Β±9.47 / 1297.51 ms β 1279.06 / 1287.61 Β±7.47 / 1300.99 ms β no change β β QQuery 5 β172.36 / 173.77 Β±1.11 / 175.21 ms β 172.17 / 173.72 Β±0.99 / 175.27 ms β no change β β QQuery 6 β132.86 / 135.21 Β±1.96 / 138.81 ms β 134.27 / 135.56 Β±1.48 / 138.38 ms β no change β β QQuery 7 β331.97 / 336.06 Β±2.26 / 338.52 ms β 329.35 / 332.45 Β±2.18 / 334.35 ms β no change β β QQuery 8 β114.21 / 115.08 Β±0.89 / 116.78 ms β 113.60 / 114.44 Β±0.68 / 115.53 ms β no change β β QQuery 9 β 96.72 / 100.31 Β±2.88 / 103.05 ms β 96.54 / 100.04 Β±4.64 / 108.63 ms β no change β β QQuery 10 β103.15 / 103.86 Β±0.59 / 104.90 ms β 102.04 / 102.96 Β±0.66 / 103.86 ms β no change β β QQuery 11 β879.95 / 890.92 Β±6.51 / 898.80 ms β 873.33 / 885.27 Β±7.12 / 891.83 ms β no change β β QQuery 12 β 43.37 / 44.06 Β±0.50 / 44.85 ms β43.27 / 44.66 Β±1.27 / 46.61 ms β no change β β QQuery 13 β387.44 / 390.62 Β±3.05 / 394.81 ms β 385.70 / 390.07 Β±2.94 / 394.16 ms β no change β β QQuery 14 β 1035.29 / 1041.95 Β±3.73 / 1046.72 ms β 1033.68 / 1036.02 Β±2.80 / 1041.51 ms β no change β β
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4343733161 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4343604373) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and intermeidate-result-blocked-approach Benchmark tpch_sf1.json βββββ³β³βββ³βββ β Query β HEAD β intermeidate-result-blocked-approach β Change β β‘βββββββββββ© β QQuery 1 β 40.80 / 42.01 Β±1.13 / 43.54 ms β 40.34 / 41.03 Β±1.08 / 43.17 ms βno change β β QQuery 2 β 21.45 / 21.62 Β±0.09 / 21.73 ms β 21.46 / 21.80 Β±0.51 / 22.81 ms βno change β β QQuery 3 β 38.57 / 40.03 Β±1.59 / 42.27 ms β 38.29 / 40.24 Β±1.18 / 41.68 ms βno change β β QQuery 4 β 18.47 / 18.90 Β±0.75 / 20.40 ms β 18.43 / 18.76 Β±0.40 / 19.53 ms βno change β β QQuery 5 β 49.41 / 52.85 Β±2.22 / 55.90 ms β 49.47 / 52.69 Β±2.74 / 56.93 ms βno change β β QQuery 6 β 18.25 / 18.62 Β±0.57 / 19.75 ms β 17.95 / 18.23 Β±0.20 / 18.55 ms βno change β β QQuery 7 β 54.57 / 55.57 Β±0.56 / 56.16 ms β 53.37 / 55.60 Β±2.05 / 58.28 ms βno change β β QQuery 8 β 48.55 / 48.94 Β±0.23 / 49.17 ms β 47.93 / 48.38 Β±0.28 / 48.78 ms βno change β β QQuery 9 β 54.81 / 57.25 Β±2.08 / 60.30 ms β 54.04 / 55.16 Β±1.41 / 57.89 ms βno change β β QQuery 10 β 66.09 / 68.44 Β±1.77 / 70.86 ms β 66.52 / 68.64 Β±1.75 / 71.53 ms βno change β β QQuery 11 β 14.20 / 14.35 Β±0.12 / 14.53 ms β 14.32 / 14.52 Β±0.25 / 14.96 ms βno change β β QQuery 12 β 27.75 / 28.23 Β±0.38 / 28.68 ms β 27.29 / 27.56 Β±0.23 / 27.83 ms βno change β β QQuery 13 β 37.73 / 40.20 Β±3.16 / 46.41 ms β 38.48 / 39.24 Β±0.54 / 40.10 ms βno change β β QQuery 14 β 28.02 / 28.48 Β±0.46 / 29.34 ms β 28.09 / 28.28 Β±0.13 / 28.47 ms βno change β β QQuery 15 β 35.84 / 36.57 Β±0.53 / 37.18 ms β 35.61 / 36.11 Β±0.32 / 36.46 ms βno change β β QQuery 16 β 15.53 / 16.27 Β±0.41 / 16.69 ms β 16.05 / 16.40 Β±0.22 / 16.71 ms βno change β β QQuery 17 β 78.74 / 80.64 Β±1.77 / 83.42 ms β 84.26 / 85.2
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4343636635 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4343604373) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4343604373-1910-rgfv2 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing intermeidate-result-blocked-approach (47b69765e44dbac7ec4173315b64ca1d55f259dc) to 8f033e4 (merge-base) [diff](https://github.com/apache/datafusion/compare/8f033e411aa4e097a146718e61d90897fe6e9c3d..47b69765e44dbac7ec4173315b64ca1d55f259dc) using: tpch Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4343630821 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4343604373) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4343604373-1908-2hjmh 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing intermeidate-result-blocked-approach (47b69765e44dbac7ec4173315b64ca1d55f259dc) to 8f033e4 (merge-base) [diff](https://github.com/apache/datafusion/compare/8f033e411aa4e097a146718e61d90897fe6e9c3d..47b69765e44dbac7ec4173315b64ca1d55f259dc) using: clickbench_partitioned Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4343626246 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4343604373) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4343604373-1909-n4jjp 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing intermeidate-result-blocked-approach (47b69765e44dbac7ec4173315b64ca1d55f259dc) to 8f033e4 (merge-base) [diff](https://github.com/apache/datafusion/compare/8f033e411aa4e097a146718e61d90897fe6e9c3d..47b69765e44dbac7ec4173315b64ca1d55f259dc) using: tpcds Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Dandandan commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4343604373 run benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4341852652 @Dandandan hi, can help trigger the benchmark again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4335192244 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4335018412) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and intermeidate-result-blocked-approach Benchmark clickbench_partitioned.json βββββ³ββββ³ββββ³ββββ β Query β HEAD β intermeidate-result-blocked-approach βChange β β‘ββββββββββββββββ© β QQuery 0 β 1.21 / 4.69 Β±6.82 / 18.34 ms β 1.25 / 4.74 Β±6.87 / 18.48 ms β no change β β QQuery 1 β12.42 / 12.98 Β±0.31 / 13.28 ms β13.04 / 13.41 Β±0.24 / 13.74 ms β no change β β QQuery 2 β36.77 / 37.00 Β±0.28 / 37.55 ms β36.72 / 37.01 Β±0.21 / 37.27 ms β no change β β QQuery 3 β31.82 / 32.55 Β±0.61 / 33.29 ms β31.66 / 32.03 Β±0.21 / 32.24 ms β no change β β QQuery 4 β 246.13 / 252.06 Β±5.38 / 261.04 ms β 288.46 / 293.26 Β±2.85 / 296.74 ms β 1.16x slower β β QQuery 5 β 287.55 / 289.80 Β±1.47 / 291.18 ms β 290.72 / 294.74 Β±2.80 / 298.27 ms β no change β β QQuery 6 β 6.44 / 8.32 Β±2.41 / 13.07 ms β 6.69 / 7.41 Β±0.44 / 7.79 ms β +1.12x faster β β QQuery 7 β13.76 / 13.95 Β±0.13 / 14.16 ms β14.49 / 14.70 Β±0.20 / 15.07 ms β 1.05x slower β β QQuery 8 β 337.06 / 340.72 Β±2.75 / 344.90 ms β 339.33 / 345.93 Β±4.24 / 351.57 ms β no change β β QQuery 9 β 467.45 / 471.13 Β±4.79 / 480.04 ms β 482.13 / 493.24 Β±9.53 / 509.78 ms β no change β β QQuery 10 β73.96 / 75.46 Β±0.77 / 76.18 ms β75.11 / 75.84 Β±0.72 / 77.15 ms β no change β β QQuery 11 β85.74 / 86.49 Β±1.01 / 88.48 ms β87.51 / 88.88 Β±2.41 / 93.69 ms β no change β β QQuery 12 β 283.78 / 286.86 Β±3.08 / 291.09 ms β 284.25 / 292.10 Β±6.74 / 302.54 ms β no change β β QQuery 13 β 406.46 / 420.25 Β±8.86 / 430.15 ms β 413.33 / 423.15 Β±8.11 / 432.17 ms β no change β β QQuery 14 β 292.45 / 301.14 Β±7.98 / 316.27 ms β 295.63 / 299.22 Β±2.46 / 303.32 ms β no change
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4335192214 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4335018412) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and intermeidate-result-blocked-approach Benchmark tpcds_sf1.json βββββ³βββ³ββββ³ββββ β Query β HEAD β intermeidate-result-blocked-approach βChange β β‘βββββββββββββββ© β QQuery 1 β 7.01 / 7.51 Β±0.83 / 9.16 ms β 6.97 / 7.43 Β±0.80 / 9.02 ms β no change β β QQuery 2 β146.03 / 146.75 Β±0.56 / 147.75 ms β 149.58 / 149.91 Β±0.24 / 150.32 ms β no change β β QQuery 3 β113.71 / 114.62 Β±0.73 / 115.67 ms β 114.49 / 115.04 Β±0.61 / 115.95 ms β no change β β QQuery 4 β 1320.29 / 1328.53 Β±5.14 / 1335.27 ms β 1293.09 / 1302.21 Β±6.87 / 1312.53 ms β no change β β QQuery 5 β172.47 / 174.00 Β±1.21 / 175.96 ms β 175.76 / 176.60 Β±1.20 / 178.98 ms β no change β β QQuery 6 β142.02 / 144.92 Β±3.01 / 148.60 ms β 135.06 / 137.22 Β±1.47 / 138.97 ms β +1.06x faster β β QQuery 7 β335.95 / 337.13 Β±1.06 / 338.50 ms β 338.12 / 340.12 Β±1.55 / 342.53 ms β no change β β QQuery 8 β114.08 / 115.03 Β±0.61 / 115.97 ms β 114.34 / 115.20 Β±0.71 / 116.48 ms β no change β β QQuery 9 β 96.41 / 101.14 Β±2.42 / 103.21 ms β 97.70 / 103.88 Β±5.33 / 112.35 ms β no change β β QQuery 10 β102.58 / 102.88 Β±0.22 / 103.25 ms β 103.94 / 104.66 Β±0.69 / 105.57 ms β no change β β QQuery 11 β928.89 / 934.32 Β±5.24 / 944.08 ms β 915.01 / 923.02 Β±6.61 / 933.42 ms β no change β β QQuery 12 β 43.37 / 43.66 Β±0.25 / 44.07 ms β43.62 / 43.96 Β±0.31 / 44.43 ms β no change β β QQuery 13 β389.52 / 391.27 Β±1.70 / 393.96 ms β 390.28 / 392.62 Β±1.85 / 395.57 ms β no change β β QQuery 14 β 1036.94 / 1047.09 Β±8.48 / 1056.90 ms β 1040.20 / 1042.69 Β±2.62 / 1046.02 ms β no change β β
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4335145434 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4335018412) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and intermeidate-result-blocked-approach Benchmark tpch_sf1.json βββββ³β³βββ³βββ β Query β HEAD β intermeidate-result-blocked-approach β Change β β‘βββββββββββ© β QQuery 1 β 40.72 / 41.57 Β±1.06 / 43.63 ms β 40.37 / 41.16 Β±1.00 / 43.12 ms βno change β β QQuery 2 β 20.72 / 21.51 Β±0.62 / 22.55 ms β 21.04 / 21.34 Β±0.25 / 21.62 ms βno change β β QQuery 3 β 38.35 / 40.09 Β±1.37 / 41.87 ms β 38.43 / 39.34 Β±1.06 / 41.26 ms βno change β β QQuery 4 β 18.38 / 18.58 Β±0.20 / 18.95 ms β 18.27 / 18.60 Β±0.20 / 18.86 ms βno change β β QQuery 5 β 47.70 / 50.67 Β±2.39 / 53.39 ms β 48.15 / 50.52 Β±1.69 / 52.22 ms βno change β β QQuery 6 β 17.39 / 17.67 Β±0.19 / 17.90 ms β 17.48 / 17.60 Β±0.11 / 17.78 ms βno change β β QQuery 7 β 54.37 / 55.31 Β±0.66 / 56.38 ms β 53.61 / 55.88 Β±1.27 / 57.34 ms βno change β β QQuery 8 β 48.41 / 48.93 Β±0.52 / 49.84 ms β 48.46 / 48.71 Β±0.27 / 49.18 ms βno change β β QQuery 9 β 53.68 / 55.26 Β±1.43 / 57.73 ms β 54.43 / 54.68 Β±0.32 / 55.29 ms βno change β β QQuery 10 β 65.52 / 66.01 Β±0.36 / 66.56 ms β 65.57 / 66.14 Β±0.63 / 67.30 ms βno change β β QQuery 11 β 14.25 / 14.47 Β±0.12 / 14.60 ms β 13.93 / 14.20 Β±0.19 / 14.40 ms βno change β β QQuery 12 β 28.14 / 28.25 Β±0.10 / 28.39 ms β 27.65 / 28.07 Β±0.34 / 28.64 ms βno change β β QQuery 13 β 37.26 / 38.28 Β±0.65 / 39.13 ms β 37.99 / 38.44 Β±0.55 / 39.43 ms βno change β β QQuery 14 β 27.86 / 28.24 Β±0.21 / 28.44 ms β 28.04 / 28.13 Β±0.06 / 28.19 ms βno change β β QQuery 15 β 33.52 / 33.86 Β±0.20 / 34.06 ms β 33.62 / 34.41 Β±0.83 / 35.81 ms βno change β β QQuery 16 β 15.25 / 15.41 Β±0.13 / 15.65 ms β 15.33 / 15.41 Β±0.07 / 15.50 ms βno change β β QQuery 17 β 77.75 / 78.94 Β±0.78 / 80.14 ms β 82.48 / 83.0
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4335043404 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4335018412) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4335018412-1866-rzm7g 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing intermeidate-result-blocked-approach (a818cd8923a8b19ce6befad53e0b052aa47be9bf) to 22bb4e6 (merge-base) [diff](https://github.com/apache/datafusion/compare/22bb4e6b752c7a62b677d94a63bcf08b68e8d5ec..a818cd8923a8b19ce6befad53e0b052aa47be9bf) using: tpch Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4335040166 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4335018412) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4335018412-1864-w5tw2 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing intermeidate-result-blocked-approach (a818cd8923a8b19ce6befad53e0b052aa47be9bf) to 22bb4e6 (merge-base) [diff](https://github.com/apache/datafusion/compare/22bb4e6b752c7a62b677d94a63bcf08b68e8d5ec..a818cd8923a8b19ce6befad53e0b052aa47be9bf) using: clickbench_partitioned Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4335050316 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4335018412) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4335018412-1865-d62wc 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing intermeidate-result-blocked-approach (a818cd8923a8b19ce6befad53e0b052aa47be9bf) to 22bb4e6 (merge-base) [diff](https://github.com/apache/datafusion/compare/22bb4e6b752c7a62b677d94a63bcf08b68e8d5ec..a818cd8923a8b19ce6befad53e0b052aa47be9bf) using: tpcds Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Dandandan commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4335018412 run benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4334682132 I think it may be ready now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4334409210 Some build problems introduced by count distinct pr, fixing. And after that, the pr seems can be ready. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4334382892
Benchmark for [this
request](https://github.com/apache/datafusion/pull/15591#issuecomment-4334353557)
failed.
Last 20 lines of output:
Click to expand
```
| ^^^ pattern `EmitTo::NextBlock`
not covered
|
note: `EmitTo` defined here
--> datafusion/expr-common/src/groups_accumulator.rs:25:1
|
25 | pub enum EmitTo {
| ^^^
...
37 | NextBlock,
| - not covered
= note: the matched value is of type `EmitTo`
help: ensure that all possible cases are being handled by adding a match arm
with a wildcard pattern or an explicit pattern as shown
|
108 ~ EmitTo::First(n) => n,
109 ~ EmitTo::NextBlock => todo!(),
|
For more information about this error, try `rustc --explain E0004`.
error: could not compile `datafusion-functions-aggregate-common` (lib) due
to 2 previous errors
warning: build failed, waiting for other jobs to finish...
```
---
[File an issue](https://github.com/adriangb/datafusion-benchmarking/issues)
against this benchmark runner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4334382489 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4334353557) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4334353557-1862-xv64r 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing intermeidate-result-blocked-approach (a84d71cf9cc8161aca8398c329749d00137965d3) to bbf67d9 (merge-base) [diff](https://github.com/apache/datafusion/compare/bbf67d999c0dcbaf47956233a5d8e78458c13ffa..a84d71cf9cc8161aca8398c329749d00137965d3) using: tpch Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4334373967 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4334353557) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4334353557-1860-dvb85 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing intermeidate-result-blocked-approach (a84d71cf9cc8161aca8398c329749d00137965d3) to bbf67d9 (merge-base) [diff](https://github.com/apache/datafusion/compare/bbf67d999c0dcbaf47956233a5d8e78458c13ffa..a84d71cf9cc8161aca8398c329749d00137965d3) using: clickbench_partitioned Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4334377709
Benchmark for [this
request](https://github.com/apache/datafusion/pull/15591#issuecomment-4334353557)
failed.
Last 20 lines of output:
Click to expand
```
106 | let num_emitted = match emit_to {
| ^^^ pattern `EmitTo::NextBlock`
not covered
|
note: `EmitTo` defined here
--> datafusion/expr-common/src/groups_accumulator.rs:25:1
|
25 | pub enum EmitTo {
| ^^^
...
37 | NextBlock,
| - not covered
= note: the matched value is of type `EmitTo`
help: ensure that all possible cases are being handled by adding a match arm
with a wildcard pattern or an explicit pattern as shown
|
108 ~ EmitTo::First(n) => n,
109 ~ EmitTo::NextBlock => todo!(),
|
For more information about this error, try `rustc --explain E0004`.
error: could not compile `datafusion-functions-aggregate-common` (lib) due
to 2 previous errors
```
---
[File an issue](https://github.com/adriangb/datafusion-benchmarking/issues)
against this benchmark runner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4334377486
Benchmark for [this
request](https://github.com/apache/datafusion/pull/15591#issuecomment-4334353557)
failed.
Last 20 lines of output:
Click to expand
```
106 | let num_emitted = match emit_to {
| ^^^ pattern `EmitTo::NextBlock`
not covered
|
note: `EmitTo` defined here
--> datafusion/expr-common/src/groups_accumulator.rs:25:1
|
25 | pub enum EmitTo {
| ^^^
...
37 | NextBlock,
| - not covered
= note: the matched value is of type `EmitTo`
help: ensure that all possible cases are being handled by adding a match arm
with a wildcard pattern or an explicit pattern as shown
|
108 ~ EmitTo::First(n) => n,
109 ~ EmitTo::NextBlock => todo!(),
|
For more information about this error, try `rustc --explain E0004`.
error: could not compile `datafusion-functions-aggregate-common` (lib) due
to 2 previous errors
```
---
[File an issue](https://github.com/adriangb/datafusion-benchmarking/issues)
against this benchmark runner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4334374251 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4334353557) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4334353557-1861-gzv2h 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing intermeidate-result-blocked-approach (a84d71cf9cc8161aca8398c329749d00137965d3) to bbf67d9 (merge-base) [diff](https://github.com/apache/datafusion/compare/bbf67d999c0dcbaf47956233a5d8e78458c13ffa..a84d71cf9cc8161aca8398c329749d00137965d3) using: tpcds Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Dandandan commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4334353557 run benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4321818488 I think the regression possibly comes from the introducing of `VecDeque` in `GroupValuesPrimitive`, I am removing it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4276111051 > These look like some regression. OKοΌwill inspect them after some final small refactors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Dandandan commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4276103930 For clickbench: ``` β QQuery 4 β 294.74 / 301.67 Β±9.80 / 321.09 ms β 367.14 / 374.88 Β±7.17 / 385.68 ms β 1.24x slower β β QQuery 15 β358.58 / 373.13 Β±17.99 / 408.00 ms β419.77 / 441.75 Β±20.14 / 471.01 ms β 1.18x slower β ``` These look like some regression. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Dandandan commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4276096427 Very nice, I think we're getting closer! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4276095992 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4276055952) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and intermeidate-result-blocked-approach Benchmark clickbench_partitioned.json βββββ³ββββ³ββββ³ββββ β Query β HEAD β intermeidate-result-blocked-approach βChange β β‘ββββββββββββββββ© β QQuery 0 β 1.21 / 4.50 Β±6.43 / 17.37 ms β 1.20 / 4.46 Β±6.36 / 17.19 ms β no change β β QQuery 1 β14.12 / 14.68 Β±0.31 / 15.03 ms β14.13 / 14.68 Β±0.29 / 14.94 ms β no change β β QQuery 2 β44.01 / 44.52 Β±0.30 / 44.79 ms β45.03 / 45.31 Β±0.40 / 46.09 ms β no change β β QQuery 3 β44.15 / 45.42 Β±1.29 / 47.57 ms β44.51 / 46.98 Β±1.45 / 48.80 ms β no change β β QQuery 4 β 294.74 / 301.67 Β±9.80 / 321.09 ms β 367.14 / 374.88 Β±7.17 / 385.68 ms β 1.24x slower β β QQuery 5 β 345.54 / 349.64 Β±2.64 / 353.57 ms β 346.92 / 352.98 Β±4.75 / 358.08 ms β no change β β QQuery 6 β 5.88 / 6.58 Β±0.42 / 7.13 ms β 5.96 / 9.31 Β±3.29 / 14.27 ms β 1.41x slower β β QQuery 7 β16.86 / 17.39 Β±0.40 / 17.89 ms β17.24 / 17.99 Β±0.77 / 19.45 ms β no change β β QQuery 8 β 412.54 / 420.67 Β±4.81 / 426.00 ms β422.41 / 439.77 Β±10.14 / 453.22 ms β no change β β QQuery 9 β651.79 / 672.54 Β±12.42 / 686.69 ms β 698.72 / 712.77 Β±9.32 / 723.81 ms β 1.06x slower β β QQuery 10 β 94.34 / 96.64 Β±3.38 / 103.28 ms β 95.64 / 97.99 Β±3.36 / 104.63 ms β no change β β QQuery 11 β 107.00 / 108.71 Β±1.31 / 110.14 ms β 107.78 / 109.36 Β±1.54 / 112.15 ms β no change β β QQuery 12 β 341.81 / 346.09 Β±3.18 / 350.39 ms β 353.63 / 361.21 Β±6.34 / 372.47 ms β no change β β QQuery 13 β 459.64 / 466.33 Β±5.80 / 475.57 ms β461.89 / 482.61 Β±15.33 / 502.49 ms β no change β β QQuery 14 β 350.52 / 354.24 Β±3.72 / 360.43 ms β 354.29 / 358.51 Β±3.71 / 364.87 ms β no change
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4276095798 π€ Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4276055952) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Details ``` Comparing HEAD and intermeidate-result-blocked-approach Benchmark tpcds_sf1.json βββββ³βββ³βββ³ββββ β Query β HEAD β intermeidate-result-blocked-approach βChange β β‘ββββββββββββββ© β QQuery 1 β 6.67 / 7.14 Β±0.77 / 8.68 ms β 6.48 / 6.97 Β±0.82 / 8.60 ms β no change β β QQuery 2 β146.44 / 147.39 Β±0.58 / 148.19 ms β151.41 / 152.34 Β±0.68 / 153.33 ms β no change β β QQuery 3 β115.14 / 115.64 Β±0.33 / 116.09 ms β112.82 / 113.52 Β±0.65 / 114.74 ms β no change β β QQuery 4 β1301.26 / 1338.21 Β±27.48 / 1380.06 ms β1305.55 / 1350.93 Β±22.91 / 1368.05 ms β no change β β QQuery 5 β171.37 / 172.92 Β±0.86 / 173.76 ms β171.80 / 173.29 Β±1.63 / 175.75 ms β no change β β QQuery 6 β 844.90 / 881.64 Β±18.39 / 892.22 ms β 881.15 / 897.61 Β±15.87 / 919.14 ms β no change β β QQuery 7 β343.11 / 348.56 Β±3.57 / 354.06 ms β342.42 / 346.58 Β±3.76 / 353.09 ms β no change β β QQuery 8 β113.91 / 115.45 Β±1.17 / 117.35 ms β113.50 / 115.79 Β±1.53 / 117.46 ms β no change β β QQuery 9 β100.62 / 102.68 Β±2.86 / 108.31 ms β100.83 / 105.93 Β±5.02 / 114.26 ms β no change β β QQuery 10 β106.74 / 107.71 Β±0.65 / 108.54 ms β106.48 / 108.28 Β±1.66 / 111.00 ms β no change β β QQuery 11 β 936.53 / 949.11 Β±14.84 / 977.12 ms β 937.76 / 957.06 Β±13.41 / 975.39 ms β no change β β QQuery 12 β 44.75 / 46.81 Β±1.48 / 48.22 ms β 45.62 / 47.73 Β±1.17 / 48.88 ms β no change β β QQuery 13 β400.09 / 402.23 Β±1.41 / 404.05 ms β399.55 / 402.86 Β±1.80 / 404.47 ms β no change β β QQuery 14 β 1009.96 / 1018.33 Β±5.75 / 1025.12 ms β 1006.86 / 1015.89 Β±6.07 / 1022.71 ms β no change β β QQuery 15 β
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4276086769 Benchmark for [this request](https://github.com/apache/datafusion/pull/15591#issuecomment-4276055952) failed. Last 20 lines of output: Click to expand ``` BENCHMARK: tpch QUERY: All DATAFUSION_DIR: /workspace/datafusion-base BRANCH_NAME: HEAD DATA_DIR: /workspace/datafusion-bench/benchmarks/data RESULTS_DIR: /workspace/datafusion-bench/benchmarks/results/HEAD CARGO_COMMAND: cargo run --release PREFER_HASH_JOIN: true SIMULATE_LATENCY: false *** RESULTS_FILE: /workspace/datafusion-bench/benchmarks/results/HEAD/tpch_sf1.json Running tpch benchmark... + cargo run --release --bin dfbench -- tpch --iterations 5 --path /workspace/datafusion-bench/benchmarks/data/tpch_sf1 --scale-factor 1 --prefer_hash_join true --format parquet -o /workspace/datafusion-bench/benchmarks/results/HEAD/tpch_sf1.json Finished `release` profile [optimized] target(s) in 0.22s Running `/workspace/datafusion-base/target/release/dfbench tpch --iterations 5 --path /workspace/datafusion-bench/benchmarks/data/tpch_sf1 --scale-factor 1 --prefer_hash_join true --format parquet -o /workspace/datafusion-bench/benchmarks/results/HEAD/tpch_sf1.json` error: unexpected argument '--scale-factor' found Usage: dfbench tpch --path --iterations For more information, try '--help'. ``` --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4276059575 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4276055952) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4276055952-1549-r78jh 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing intermeidate-result-blocked-approach (a68716b1b99123fef41259899c0c67adda870dbb) to dc973cc (merge-base) [diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..a68716b1b99123fef41259899c0c67adda870dbb) using: tpcds Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4276059580 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4276055952) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4276055952-1548-ccn4x 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing intermeidate-result-blocked-approach (a68716b1b99123fef41259899c0c67adda870dbb) to dc973cc (merge-base) [diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..a68716b1b99123fef41259899c0c67adda870dbb) using: clickbench_partitioned Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4276059331 π€ Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/15591#issuecomment-4276055952) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-c4276055952-1550-b9fqh 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux` CPU Details (lscpu) ``` Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 16 Socket(s): - Cluster(s): 1 Stepping:r0p1 BogoMIPS:2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 1 MiB (16 instances) L1i cache: 1 MiB (16 instances) L2 cache:32 MiB (16 instances) L3 cache:80 MiB (1 instance) NUMA node(s):1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling:Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1:Mitigation; __user pointer sanitization Vulnerability Spectre v2:Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected ``` Comparing intermeidate-result-blocked-approach (a68716b1b99123fef41259899c0c67adda870dbb) to dc973cc (merge-base) [diff](https://github.com/apache/datafusion/compare/dc973cc9e67e3519ec4bc5bd15962e2692debb3e..a68716b1b99123fef41259899c0c67adda870dbb) using: tpch Results will be posted here when complete --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4276055952 run benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
adriangbot commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4275998435 Hi @Rachelint, thanks for the request (https://github.com/apache/datafusion/pull/15591#issuecomment-4275998373). Only whitelisted users can trigger benchmarks. Allowed users: [Dandandan](https://github.com/Dandandan), [Fokko](https://github.com/Fokko), [Jefffrey](https://github.com/Jefffrey), [Omega359](https://github.com/Omega359), [adriangb](https://github.com/adriangb), [alamb](https://github.com/alamb), [asubiotto](https://github.com/asubiotto), [brunal](https://github.com/brunal), [buraksenn](https://github.com/buraksenn), [cetra3](https://github.com/cetra3), [codephage2020](https://github.com/codephage2020), [comphead](https://github.com/comphead), [erenavsarogullari](https://github.com/erenavsarogullari), [etseidl](https://github.com/etseidl), [friendlymatthew](https://github.com/friendlymatthew), [gabotechs](https://github.com/gabotechs), [geoffreyclaude](https://github.com/geoffreyclaude), [grtlr](https://github.com/grtlr), [haohuaijin](https://github.com/haohuaijin), [jo nathanc-n](https://github.com/jonathanc-n), [kevinjqliu](https://github.com/kevinjqliu), [klion26](https://github.com/klion26), [kosiew](https://github.com/kosiew), [kumarUjjawal](https://github.com/kumarUjjawal), [kunalsinghdadhwal](https://github.com/kunalsinghdadhwal), [liamzwbao](https://github.com/liamzwbao), [mbutrovich](https://github.com/mbutrovich), [mkleen](https://github.com/mkleen), [mzabaluev](https://github.com/mzabaluev), [neilconway](https://github.com/neilconway), [rluvaton](https://github.com/rluvaton), [sdf-jkl](https://github.com/sdf-jkl), [timsaucer](https://github.com/timsaucer), [xudong963](https://github.com/xudong963), [zhuqi-lucas](https://github.com/zhuqi-lucas). --- [File an issue](https://github.com/adriangb/datafusion-benchmarking/issues) against this benchmark runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4275998373 run benchmark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4273979782 ci passed and no conflict again, rest things before ready: - improve tests - improve `Blocks` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on code in PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#discussion_r3104443249
##
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/blocks.rs:
##
@@ -0,0 +1,308 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Aggregation intermediate results blocks in blocked approach
+
+use std::{
+collections::VecDeque,
+fmt::Debug,
+iter,
+ops::{Index, IndexMut},
+};
+
+use datafusion_expr_common::groups_accumulator::EmitTo;
+
+/// Structure used to store aggregation intermediate results in `blocked
approach`
+///
+/// Aggregation intermediate results will be stored as multiple [`Block`]s
+/// (simply you can think a [`Block`] as a `Vec`). And `Blocks` is the
structure
+/// to represent such multiple [`Block`]s.
+///
+/// More details about `blocked approach` can see in:
[`GroupsAccumulator::supports_blocked_groups`].
+///
+/// [`GroupsAccumulator::supports_blocked_groups`]:
datafusion_expr_common::groups_accumulator::GroupsAccumulator::supports_blocked_groups
+///
+#[derive(Debug)]
+pub struct Blocks {
+inner: VecDeque,
Review Comment:
Make sense, I am switching it to Vec.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Dandandan commented on code in PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#discussion_r3084249604
##
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/blocks.rs:
##
@@ -0,0 +1,308 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Aggregation intermediate results blocks in blocked approach
+
+use std::{
+collections::VecDeque,
+fmt::Debug,
+iter,
+ops::{Index, IndexMut},
+};
+
+use datafusion_expr_common::groups_accumulator::EmitTo;
+
+/// Structure used to store aggregation intermediate results in `blocked
approach`
+///
+/// Aggregation intermediate results will be stored as multiple [`Block`]s
+/// (simply you can think a [`Block`] as a `Vec`). And `Blocks` is the
structure
+/// to represent such multiple [`Block`]s.
+///
+/// More details about `blocked approach` can see in:
[`GroupsAccumulator::supports_blocked_groups`].
+///
+/// [`GroupsAccumulator::supports_blocked_groups`]:
datafusion_expr_common::groups_accumulator::GroupsAccumulator::supports_blocked_groups
+///
+#[derive(Debug)]
+pub struct Blocks {
+inner: VecDeque,
Review Comment:
Ok, but I think the `Vec` approach is relatively simple as well?
Not to pin you down, but I think when it will be used more it is problably
coming up later anyway.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on code in PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#discussion_r3084020685
##
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/blocks.rs:
##
@@ -0,0 +1,308 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Aggregation intermediate results blocks in blocked approach
+
+use std::{
+collections::VecDeque,
+fmt::Debug,
+iter,
+ops::{Index, IndexMut},
+};
+
+use datafusion_expr_common::groups_accumulator::EmitTo;
+
+/// Structure used to store aggregation intermediate results in `blocked
approach`
+///
+/// Aggregation intermediate results will be stored as multiple [`Block`]s
+/// (simply you can think a [`Block`] as a `Vec`). And `Blocks` is the
structure
+/// to represent such multiple [`Block`]s.
+///
+/// More details about `blocked approach` can see in:
[`GroupsAccumulator::supports_blocked_groups`].
+///
+/// [`GroupsAccumulator::supports_blocked_groups`]:
datafusion_expr_common::groups_accumulator::GroupsAccumulator::supports_blocked_groups
+///
+#[derive(Debug)]
+pub struct Blocks {
+inner: VecDeque,
Review Comment:
Yes, it is better to use `Vec` and I tried it when I still see this a
performance improvement feature.
However, after many tries, I found it actually can't help dafafusion run
faster (it is only something can help to better memory management)... And I
finally switch to use `VecDeque` for simplicity...
The experiments can be saw in this archived branch:
https://github.com/Rachelint/arrow-datafusion/compare/intermeidate-result-blocked-approach-bak
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on code in PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#discussion_r3084020685
##
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/blocks.rs:
##
@@ -0,0 +1,308 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Aggregation intermediate results blocks in blocked approach
+
+use std::{
+collections::VecDeque,
+fmt::Debug,
+iter,
+ops::{Index, IndexMut},
+};
+
+use datafusion_expr_common::groups_accumulator::EmitTo;
+
+/// Structure used to store aggregation intermediate results in `blocked
approach`
+///
+/// Aggregation intermediate results will be stored as multiple [`Block`]s
+/// (simply you can think a [`Block`] as a `Vec`). And `Blocks` is the
structure
+/// to represent such multiple [`Block`]s.
+///
+/// More details about `blocked approach` can see in:
[`GroupsAccumulator::supports_blocked_groups`].
+///
+/// [`GroupsAccumulator::supports_blocked_groups`]:
datafusion_expr_common::groups_accumulator::GroupsAccumulator::supports_blocked_groups
+///
+#[derive(Debug)]
+pub struct Blocks {
+inner: VecDeque,
Review Comment:
Yes, it is better to use `Vec` and I tried it when I still see it a
performance improvement feature.
However, after many tries, I found it actually can't help dafafusion run
faster (it is only something can help to better memory management)... And I
finally switch to use `VecDeque` for simplicity...
The experiments can be saw in this archived branch:
https://github.com/Rachelint/arrow-datafusion/compare/intermeidate-result-blocked-approach-bak
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Dandandan commented on code in PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#discussion_r3080688241
##
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs:
##
@@ -93,21 +94,31 @@ where
opt_filter: Option<&BooleanArray>,
total_num_groups: usize,
) -> Result<()> {
+const DEFAULT_BLOCK_CAP: usize = 128;
+
assert_eq!(values.len(), 1, "single argument to update_batch");
let values = values[0].as_primitive::();
-// update values
-self.values.resize(total_num_groups, self.starting_value);
+// Expand to ensure values are large enough
+let new_block = |block_size: Option| {
+let cap = block_size.unwrap_or(DEFAULT_BLOCK_CAP);
+Vec::with_capacity(cap)
+};
+self.values
+.resize(total_num_groups, new_block, self.starting_value);
// NullState dispatches / handles tracking nulls and groups that saw
no values
self.null_state.accumulate(
group_indices,
values,
opt_filter,
total_num_groups,
-|group_index, new_value| {
-// SAFETY: group_index is guaranteed to be in bounds
-let value = unsafe {
self.values.get_unchecked_mut(group_index) };
+|block_id, block_offset, new_value| {
+// SAFETY: `block_id` and `block_offset` are guaranteed to be
in bounds
+let value = unsafe {
+self.values[block_id as usize]
Review Comment:
this can use unsafe index as well (with a plain `Vec` it would certainly be
faster)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Dandandan commented on code in PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#discussion_r3080567065
##
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/blocks.rs:
##
@@ -0,0 +1,308 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Aggregation intermediate results blocks in blocked approach
+
+use std::{
+collections::VecDeque,
+fmt::Debug,
+iter,
+ops::{Index, IndexMut},
+};
+
+use datafusion_expr_common::groups_accumulator::EmitTo;
+
+/// Structure used to store aggregation intermediate results in `blocked
approach`
+///
+/// Aggregation intermediate results will be stored as multiple [`Block`]s
+/// (simply you can think a [`Block`] as a `Vec`). And `Blocks` is the
structure
+/// to represent such multiple [`Block`]s.
+///
+/// More details about `blocked approach` can see in:
[`GroupsAccumulator::supports_blocked_groups`].
+///
+/// [`GroupsAccumulator::supports_blocked_groups`]:
datafusion_expr_common::groups_accumulator::GroupsAccumulator::supports_blocked_groups
+///
+#[derive(Debug)]
+pub struct Blocks {
+inner: VecDeque,
Review Comment:
See changes in
https://github.com/apache/datafusion/pull/21622/commits/baa7755a200f5c82998cdc3718280226b12ec70c#diff-c8207420967623630914abf198f93c8b6b2ccb0ba30aa6d1b5b641643789b92fR39
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Dandandan commented on code in PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#discussion_r3080550313
##
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/blocks.rs:
##
@@ -0,0 +1,308 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Aggregation intermediate results blocks in blocked approach
+
+use std::{
+collections::VecDeque,
+fmt::Debug,
+iter,
+ops::{Index, IndexMut},
+};
+
+use datafusion_expr_common::groups_accumulator::EmitTo;
+
+/// Structure used to store aggregation intermediate results in `blocked
approach`
+///
+/// Aggregation intermediate results will be stored as multiple [`Block`]s
+/// (simply you can think a [`Block`] as a `Vec`). And `Blocks` is the
structure
+/// to represent such multiple [`Block`]s.
+///
+/// More details about `blocked approach` can see in:
[`GroupsAccumulator::supports_blocked_groups`].
+///
+/// [`GroupsAccumulator::supports_blocked_groups`]:
datafusion_expr_common::groups_accumulator::GroupsAccumulator::supports_blocked_groups
+///
+#[derive(Debug)]
+pub struct Blocks {
+inner: VecDeque,
Review Comment:
I think it would be nice to avoid the `VecDeque` as I believe it is
relatively slow to index (because of the `%`).
I think we can use a start offset instead during pop (and increment it),
replace the block with an empty one to "pop" it and reclaim the memory.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Dandandan commented on code in PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#discussion_r3080550313
##
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/blocks.rs:
##
@@ -0,0 +1,308 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Aggregation intermediate results blocks in blocked approach
+
+use std::{
+collections::VecDeque,
+fmt::Debug,
+iter,
+ops::{Index, IndexMut},
+};
+
+use datafusion_expr_common::groups_accumulator::EmitTo;
+
+/// Structure used to store aggregation intermediate results in `blocked
approach`
+///
+/// Aggregation intermediate results will be stored as multiple [`Block`]s
+/// (simply you can think a [`Block`] as a `Vec`). And `Blocks` is the
structure
+/// to represent such multiple [`Block`]s.
+///
+/// More details about `blocked approach` can see in:
[`GroupsAccumulator::supports_blocked_groups`].
+///
+/// [`GroupsAccumulator::supports_blocked_groups`]:
datafusion_expr_common::groups_accumulator::GroupsAccumulator::supports_blocked_groups
+///
+#[derive(Debug)]
+pub struct Blocks {
+inner: VecDeque,
Review Comment:
I think it would be nice to avoid the `VecDeque` as I believe it is
relatively slow to index (because of the `%`).
I think we can use a start offset instead during pop (and increase it),
replace the block with an empty one to "pop" it and reclaim the memory.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Dandandan commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4245046381 I was checking it out as well, and playing around with it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4244928294 Busy in work recent days... Fixing the last conflicts now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4195598208 > > This branch still have conflicts. > > Yes, mainly solved the correctness0 problems in `row_hash.rs` and `accumulate.rs`. Will clean up codes and solve other small problems today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4195597389 > This branch still have conflicts. Yes, mainly solved the correctness in `row_hash.rs` and `accumulate.rs`. Will clean up codes and solve other small problems today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
alchemist51 commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4195565002 This branch still has conflicts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4193494034 have fixed all correctness problems, can be ready again today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4146909778 > Hi @Rachelint π just wanted to check in on this! The last commit was about a month ago, any update on where things stand? Also worth noting that this work could help fix or mitigate #19906, so there's renewed interest in getting it over the line. > > Thanks for all the effort you've put into this. It's really appreciated! Sorry for long delay for some private reasons, will try to make it ready this weekend: - already fixed bugs in accumulate - I am porting this pr in the new spilling logic in `row_hash.rs` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
ahmed-mez commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-4133351121 Hi @Rachelint π just wanted to check in on this! The last commit was about a month ago, any update on where things stand? Also worth noting that this work could help fix or mitigate #19906, so there's renewed interest in getting it over the line. Thanks for all the effort you've put into this. It's really appreciated! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-3830479590 > Hi @Rachelint any update on this? Continue working today... A bit busy this week, and sorry for delay for the pr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
alchemist51 commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-3830440962 Hi @Rachelint any update on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
alchemist51 commented on code in PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#discussion_r2726532346
##
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/accumulate.rs:
##
@@ -37,12 +47,10 @@ use datafusion_expr_common::groups_accumulator::EmitTo;
#[derive(Debug)]
pub enum SeenValues {
/// All groups seen so far have seen at least one non-null value
-All {
Review Comment:
probably unintended change
##
datafusion/common/src/config.rs:
##
@@ -647,6 +647,16 @@ config_namespace! {
/// the remote end point.
pub objectstore_writer_buffer_size: usize, default = 10 * 1024 * 1024
+/// Should DataFusion use a blocked approach to manage grouping state.
+/// By default, the blocked approach is used which
+/// allocates capacity based on a predefined block size firstly.
+/// When the block reaches its limit, we allocate a new block (also
with
+/// the same predefined block size based capacity) instead of expanding
+/// the current one and copying the data.
+/// If `false`, a single allocation approach is used, where
+/// values are managed within a single large memory block.
+/// As this block grows, it often triggers numerous copies, resulting
in poor performance.
+pub enable_aggregation_blocked_groups: bool, default = true
Review Comment:
It should be false by default for now?
##
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/accumulate.rs:
##
@@ -121,26 +136,29 @@ pub struct NullState {
///
/// If `seen_values` is `SeenValues::All`, all groups have seen at least
one non null value
seen_values: SeenValues,
-}
-impl Default for NullState {
-fn default() -> Self {
-Self::new()
-}
+/// Size of one seen values block, can be None if only desire single block
+block_size: Option,
+
+/// phantom data for required type ``
+_phantom: PhantomData,
}
-impl NullState {
-pub fn new() -> Self {
+impl NullState {
+/// Create a new `NullState`
+pub fn new(block_size: Option) -> Self {
Self {
seen_values: SeenValues::All { num_values: 0 },
+block_size,
+_phantom: PhantomData,
Review Comment:
Didn't quite get the phantom usage
##
datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/accumulate.rs:
##
@@ -60,27 +68,34 @@ impl SeenValues {
///
/// The builder is then ensured to have at least `total_num_groups` length,
/// with any new entries initialized to false.
-fn get_builder(&mut self, total_num_groups: usize) -> &mut
BooleanBufferBuilder {
-match self {
+fn get_big_enough_builder(
+&mut self,
+total_num_groups: usize,
+block_size: Option,
+) -> &mut Blocks {
+// If `self` is `SeenValues::All`, transition it to `SeenValues::Some`
with `num_values trues` firstly,
+// then return mutable reference to the builder.
+// If `self` is `SeenValues::Some`, just directly return mutable
reference to the builder.
+let new_block = |block_size: Option| {
Review Comment:
should we move it to SeenValues::All match call rather?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-3797570070 alreay resolve most conflicts(mainly `row_hash.rs` and `accumulate.rs`) and pass tests, near to be ready again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-3749856254 > any ETA you are targeting for making it ready for review? I plan to make it this week -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-3749255055 My personal suggestion here is to try and find some way to incrementally move this forward (as we were discussing elsewhere). Our ability to replace the entire group by hash machinery in one go will be very hard, both to review, as well as to track down potential regressions So I personally recommend focusing on some core usecases first (e.g. single column hash aggregates for Primtive and String/StringViews) and see what you can get working. We can then slowly add more functionality (multi column grouping, more types, spilling, etc) incrementally -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
alchemist51 commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-3747616141 @Rachelint any ETA you are targeting for making it ready for review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-3731202925 > Maybe someday we'll get back to it π’ Plan to reopen to push it forward again, due to it has been proved to be helpful, and have free time again recently -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
github-actions[bot] closed pull request #15591: Intermediate result blocked approach to aggregation memory management URL: https://github.com/apache/datafusion/pull/15591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-3405892443 Maybe someday we'll get back to it π’ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
github-actions[bot] commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-3379306551 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-3154660341 > > It is unfortunate we never figured out how to get this over the line π’ > > Thank you for the effort anyways @Rachelint > > It is sorry... but actually I still want to continue push it forward... However it is too busy recent few months... Maybe if/when you are able to return to it with a fresh set of eyes after a break we'll make progress No worries at all -- I totally understand -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-3154598833 > It is unfortunate we never figured out how to get this over the line π’ > > Thank you for the effort anyways @Rachelint It is sorry... but actually I still want to continue push it forward... However it is too busy recent few months... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-3154589330 It is unfortunate we never figured out how to get this over the line -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
github-actions[bot] commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-3153036085 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Intermediate result blocked approach to aggregation memory management [datafusion]
Dandandan commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2945391975 thanks @Rachelint and congratulations! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
