Re: [PR] [Parquet] perf: Create `PrimitiveArray`s directly rather than via `ArrayData` [arrow-rs]
alamb commented on PR #9122: URL: https://github.com/apache/arrow-rs/pull/9122#issuecomment-3744059185 Thanks again @scovich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [Parquet] perf: Create `PrimitiveArray`s directly rather than via `ArrayData` [arrow-rs]
alamb merged PR #9122: URL: https://github.com/apache/arrow-rs/pull/9122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [Parquet] perf: Create `PrimitiveArray`s directly rather than via `ArrayData` [arrow-rs]
alamb commented on PR #9122: URL: https://github.com/apache/arrow-rs/pull/9122#issuecomment-3734481041 Thanks for the review @scovich -- I will plan to merge this PR on Monday unless there are any other comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [Parquet] perf: Create `PrimitiveArray`s directly rather than via `ArrayData` [arrow-rs]
alamb commented on PR #9122: URL: https://github.com/apache/arrow-rs/pull/9122#issuecomment-379234 There is a lot of noise in the benchmark, but I think this PR shows an improvement for primitive arrays -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [Parquet] perf: Create `PrimitiveArray`s directly rather than via `ArrayData` [arrow-rs]
alamb-ghbot commented on PR #9122:
URL: https://github.com/apache/arrow-rs/pull/9122#issuecomment-3732744797
🤖: Benchmark completed
Details
```
group
alamb_less_primitive_allocations main
-
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no
NULLs 1.00 1178.3±6.17µs? ?/sec1.09
1286.9±9.23µs? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half
NULLs 1.00 1257.8±23.44µs? ?/sec1.05
1323.8±6.15µs? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no
NULLs1.00 1193.6±18.34µs? ?/sec1.09
1297.7±16.83µs? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs
1.00489.4±5.98µs? ?/sec1.05
511.7±14.12µs? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs
1.00663.1±7.21µs? ?/sec1.05
694.7±4.88µs? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs
1.00494.3±6.43µs? ?/sec1.04
514.3±4.65µs? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs
1.01544.1±4.88µs? ?/sec1.00
539.8±5.41µs? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs
1.00730.8±5.41µs? ?/sec1.04
763.4±8.57µs? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs
1.01561.6±7.94µs? ?/sec1.00
555.1±9.05µs? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs
1.20282.5±4.57µs? ?/sec1.00
235.5±3.45µs? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs
1.08261.4±2.60µs? ?/sec1.00
241.8±7.33µs? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs
1.18277.3±4.40µs? ?/sec1.00
234.1±4.06µs? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs
1.22351.5±1.77µs? ?/sec1.00
288.3±2.22µs? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short
string1.15326.0±1.43µs? ?/sec1.00
283.9±6.81µs? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs
1.11316.6±8.83µs? ?/sec1.00
285.3±5.03µs? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs
1.21361.0±5.22µs? ?/sec1.00
297.8±6.52µs? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split
encoded, mandatory, no NULLs 1.06 1090.3±19.41µs? ?/sec1.00
1029.8±21.99µs? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split
encoded, optional, half NULLs1.09 940.0±10.09µs? ?/sec1.00
866.1±19.08µs? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split
encoded, optional, no NULLs 1.06 1100.1±17.68µs? ?/sec1.00
1036.8±6.15µs? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded,
mandatory, no NULLs 1.04415.3±7.83µs? ?/sec1.00
399.9±10.50µs? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded,
optional, half NULLs1.12605.6±7.52µs? ?/sec1.00
541.4±16.74µs? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded,
optional, no NULLs 1.05 425.7±10.45µs? ?/sec1.00
407.2±12.73µs? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split
encoded, mandatory, no NULLs1.03202.7±4.59µs? ?/sec1.00
196.1±5.13µs? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split
encoded, optional, half NULLs 1.00320.3±6.59µs? ?/sec1.05
335.2±1.86µs? ?/sec
arrow_array_reader/FIXED_L
Re: [PR] [Parquet] perf: Create `PrimitiveArray`s directly rather than via `ArrayData` [arrow-rs]
alamb commented on PR #9122: URL: https://github.com/apache/arrow-rs/pull/9122#issuecomment-3732518789 It seems like this may be worth a few percent in performance. I have restarted the benchmarks to see if I can reproduce the numbers ``` arrow_array_reader/UInt32Array/plain encoded, mandatory, no NULLs 1.00 19.9±0.62µs? ?/sec1.06 21.1±1.12µs? ?/sec arrow_array_reader/UInt32Array/plain encoded, optional, half NULLs 1.00117.2±0.80µs? ?/sec1.02 120.0±1.06µs? ?/sec arrow_array_reader/UInt32Array/plain encoded, optional, no NULLs 1.00 25.4±1.08µs? ?/sec1.01 25.8±1.02µs? ?/sec ... arrow_array_reader/UInt8Array/plain encoded, mandatory, no NULLs 1.00 28.9±0.34µs? ?/sec1.04 30.1±0.19µs? ?/sec arrow_array_reader/UInt8Array/plain encoded, optional, half NULLs 1.00127.6±2.70µs? ?/sec1.01 129.2±2.33µs? ?/sec arrow_array_reader/UInt8Array/plain encoded, optional, no NULLs 1.00 33.6±0.32µs? ?/sec1.03 34.5±0.11µs? ?/sec ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [Parquet] perf: Create `PrimitiveArray`s directly rather than via `ArrayData` [arrow-rs]
alamb-ghbot commented on PR #9122: URL: https://github.com/apache/arrow-rs/pull/9122#issuecomment-3732514874 🤖 `./gh_compare_arrow.sh` [gh_compare_arrow.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_arrow.sh) Running Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing alamb/less_primitive_allocations (4d06746addd102452f8041358c025a0b3260ae54) to cfba3ccc0c9460dba65ca000c34e6491c8043abc [diff](https://github.com/apache/arrow-rs/compare/cfba3ccc0c9460dba65ca000c34e6491c8043abc..4d06746addd102452f8041358c025a0b3260ae54) BENCH_NAME=arrow_reader BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader BENCH_FILTER= BENCH_BRANCH_NAME=alamb_less_primitive_allocations Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [Parquet] perf: Create `PrimitiveArray`s directly rather than via `ArrayData` [arrow-rs]
alamb commented on PR #9122: URL: https://github.com/apache/arrow-rs/pull/9122#issuecomment-3732514799 run benchmark arrow_reader -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [Parquet] perf: Create `PrimitiveArray`s directly rather than via `ArrayData` [arrow-rs]
alamb-ghbot commented on PR #9122: URL: https://github.com/apache/arrow-rs/pull/9122#issuecomment-3730706905 🤖: Benchmark completed Details ``` groupalamb_less_primitive_allocations main - arrow_reader_clickbench/async/Q1 1.03 2.4±0.02ms? ?/sec 1.00 2.3±0.04ms? ?/sec arrow_reader_clickbench/async/Q101.05 13.1±0.37ms? ?/sec 1.00 12.5±0.28ms? ?/sec arrow_reader_clickbench/async/Q111.02 14.7±0.34ms? ?/sec 1.00 14.3±0.48ms? ?/sec arrow_reader_clickbench/async/Q121.03 25.4±0.47ms? ?/sec 1.00 24.7±0.41ms? ?/sec arrow_reader_clickbench/async/Q131.02 30.6±0.97ms? ?/sec 1.00 29.9±0.70ms? ?/sec arrow_reader_clickbench/async/Q141.01 27.8±0.83ms? ?/sec 1.00 27.5±0.52ms? ?/sec arrow_reader_clickbench/async/Q191.04 5.4±0.12ms? ?/sec 1.00 5.2±0.15ms? ?/sec arrow_reader_clickbench/async/Q201.02114.6±1.26ms? ?/sec 1.00112.0±1.09ms? ?/sec arrow_reader_clickbench/async/Q211.20155.7±2.13ms? ?/sec 1.00129.7±1.62ms? ?/sec arrow_reader_clickbench/async/Q221.12 303.4±11.75ms? ?/sec 1.00269.9±6.31ms? ?/sec arrow_reader_clickbench/async/Q231.02409.9±4.55ms? ?/sec 1.00400.2±3.37ms? ?/sec arrow_reader_clickbench/async/Q241.02 33.7±0.49ms? ?/sec 1.00 33.1±0.68ms? ?/sec arrow_reader_clickbench/async/Q271.03100.8±1.10ms? ?/sec 1.00 98.1±1.19ms? ?/sec arrow_reader_clickbench/async/Q281.03 99.2±0.79ms? ?/sec 1.00 96.6±0.82ms? ?/sec arrow_reader_clickbench/async/Q301.04 30.9±0.48ms? ?/sec 1.00 29.8±0.58ms? ?/sec arrow_reader_clickbench/async/Q361.02109.5±1.29ms? ?/sec 1.00106.8±1.30ms? ?/sec arrow_reader_clickbench/async/Q371.02 85.4±0.73ms? ?/sec 1.00 83.6±0.58ms? ?/sec arrow_reader_clickbench/async/Q381.03 33.2±0.81ms? ?/sec 1.00 32.3±0.45ms? ?/sec arrow_reader_clickbench/async/Q391.04 46.8±0.75ms? ?/sec 1.00 45.2±0.48ms? ?/sec arrow_reader_clickbench/async/Q401.05 28.1±0.96ms? ?/sec 1.00 26.8±0.49ms? ?/sec arrow_reader_clickbench/async/Q411.02 23.0±0.58ms? ?/sec 1.00 22.6±0.81ms? ?/sec arrow_reader_clickbench/async/Q421.02 11.1±0.19ms? ?/sec 1.00 11.0±0.24ms? ?/sec arrow_reader_clickbench/sync/Q1 1.00 2.1±0.06ms? ?/sec 1.00 2.1±0.06ms? ?/sec arrow_reader_clickbench/sync/Q10 1.04 10.0±0.14ms? ?/sec 1.00 9.6±0.32ms? ?/sec arrow_reader_clickbench/sync/Q11 1.03 11.5±0.22ms? ?/sec 1.00 11.1±0.16ms? ?/sec arrow_reader_clickbench/sync/Q12 1.01 32.5±0.81ms? ?/sec 1.00 32.2±0.40ms? ?/sec arrow_reader_clickbench/sync/Q13 1.00 43.9±0.93ms? ?/sec 1.05 46.0±1.45ms? ?/sec arrow_reader_clickbench/sync/Q14 1.00 35.6±0.53ms? ?/sec 1.20 42.8±1.31ms? ?/sec arrow_reader_clickbench/sync/Q19 1.02 4.3±0.08ms? ?/sec 1.00 4.2±0.03ms? ?/sec arrow_reader_clickbench/sync/Q20 1.02176.1±2.46ms? ?/sec 1.00172.8±1.20ms? ?/sec arrow_reader_clickbench/sync/Q21 1.00230.3±1.95ms? ?/sec 1.00229.8±3.05ms? ?/sec arrow_reader_clickbench/sync/Q22 1.03479.6±5.21ms? ?/sec 1.00467.8±3.44ms? ?/sec arrow_reader_clickbench/sync/Q23 1.07 446.4±12.42ms? ?/sec 1.00416.7±4.94ms? ?/sec arrow_reader_clickbench/sync/Q24 1.03 45.5±0.99ms? ?/sec 1.00 44.2±0.90ms? ?/sec arrow_reader_clickbench/sync/Q27 1.03154.0±1.59ms? ?/sec 1.00150.1±2.04ms? ?/sec arrow_reader_clickbench/sync/Q28 1.02148.7±1.26ms? ?/sec 1.00145.7±1.81ms? ?/sec arrow_reader_clickbench/sync/Q30 1.02 30.4±0.40ms? ?/sec 1.00 29.7±0.49ms? ?/sec arrow_reader_clickbench/sync/Q36 1.02153.4±1.45ms? ?/sec 1.00149.8±1.53ms? ?/sec arrow_reader_clickbench/sync/Q37 1.02 88.3±0.83ms? ?/sec 1.00 86.7±1.17ms? ?/sec arrow_reader_clickbench/sync/Q38 1.02 29.3±0.58ms? ?/sec 1.00 28.7±0.57ms? ?/sec arrow_reader_c
Re: [PR] [Parquet] perf: Create `PrimitiveArray`s directly rather than via `ArrayData` [arrow-rs]
alamb-ghbot commented on PR #9122: URL: https://github.com/apache/arrow-rs/pull/9122#issuecomment-3730647562 🤖 `./gh_compare_arrow.sh` [gh_compare_arrow.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_arrow.sh) Running Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing alamb/less_primitive_allocations (4d06746addd102452f8041358c025a0b3260ae54) to cfba3ccc0c9460dba65ca000c34e6491c8043abc [diff](https://github.com/apache/arrow-rs/compare/cfba3ccc0c9460dba65ca000c34e6491c8043abc..4d06746addd102452f8041358c025a0b3260ae54) BENCH_NAME=arrow_reader_clickbench BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench BENCH_FILTER= BENCH_BRANCH_NAME=alamb_less_primitive_allocations Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [Parquet] perf: Create `PrimitiveArray`s directly rather than via `ArrayData` [arrow-rs]
alamb-ghbot commented on PR #9122:
URL: https://github.com/apache/arrow-rs/pull/9122#issuecomment-3730647413
🤖: Benchmark completed
Details
```
group
alamb_less_primitive_allocations main
-
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no
NULLs 1.00 1258.5±3.91µs? ?/sec1.01
1270.2±5.87µs? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half
NULLs 1.00 1276.0±9.32µs? ?/sec1.01
1293.1±10.94µs? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no
NULLs1.00 1268.7±4.45µs? ?/sec1.01
1279.9±15.89µs? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs
1.06515.1±5.35µs? ?/sec1.00
488.2±3.41µs? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs
1.02672.6±8.53µs? ?/sec1.00
660.3±4.00µs? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs
1.06518.3±3.75µs? ?/sec1.00
490.1±11.70µs? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs
1.00525.4±5.20µs? ?/sec1.09
570.6±8.15µs? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs
1.00 727.3±13.56µs? ?/sec1.01
735.4±10.37µs? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs
1.00537.5±5.94µs? ?/sec1.08
581.1±9.25µs? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs
1.12279.9±5.84µs? ?/sec1.00
250.9±4.27µs? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs
1.15263.0±3.54µs? ?/sec1.00
228.5±4.16µs? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs
1.09278.6±3.70µs? ?/sec1.00
255.1±5.35µs? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs
1.02297.7±1.89µs? ?/sec1.00
292.8±5.43µs? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short
string1.00284.1±1.25µs? ?/sec1.08
305.8±2.25µs? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs
1.11290.1±3.34µs? ?/sec1.00
262.3±5.63µs? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs
1.02306.7±3.30µs? ?/sec1.00
301.0±3.57µs? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split
encoded, mandatory, no NULLs 1.03 1116.1±8.02µs? ?/sec1.00
1081.3±28.43µs? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split
encoded, optional, half NULLs1.05970.3±6.28µs? ?/sec1.00
924.2±7.82µs? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split
encoded, optional, no NULLs 1.03 1125.3±18.95µs? ?/sec1.00
1090.3±13.50µs? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded,
mandatory, no NULLs 1.10443.8±5.83µs? ?/sec1.00
403.0±5.72µs? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded,
optional, half NULLs1.06632.5±4.54µs? ?/sec1.00
595.8±9.30µs? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded,
optional, no NULLs 1.10451.2±5.23µs? ?/sec1.00
410.8±4.79µs? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split
encoded, mandatory, no NULLs1.00160.1±0.77µs? ?/sec1.27
202.9±3.12µs? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split
encoded, optional, half NULLs 1.00286.9±4.98µs? ?/sec1.11
317.6±2.82µs? ?/sec
arrow_array_reader/FIXED_L
Re: [PR] [Parquet] perf: Create `PrimitiveArray`s directly rather than via `ArrayData` [arrow-rs]
alamb commented on code in PR #9122:
URL: https://github.com/apache/arrow-rs/pull/9122#discussion_r2676635354
##
parquet/src/arrow/array_reader/primitive_array.rs:
##
@@ -212,41 +211,69 @@ where
.consume_record_data()
.into_buffer(target_type);
-let array_data = ArrayDataBuilder::new(arrow_data_type)
-.len(self.record_reader.num_values())
-.add_buffer(record_data)
-.null_bit_buffer(self.record_reader.consume_bitmap_buffer());
+let len = self.record_reader.num_values();
+let nulls = self
+.record_reader
+.consume_bitmap_buffer()
+.map(|b| NullBuffer::new(BooleanBuffer::new(b, 0, len)));
-let array_data = unsafe { array_data.build_unchecked() };
Review Comment:
the point is to avoid this ArrayData (and its `Vec`s) as we know here what
types of arrays will be built
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
