Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-17 Thread via GitHub


alamb merged PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-17 Thread via GitHub


alamb commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3764073142

   Thank you for double checking @jhorstmann and for the reviews @jhorstmann 
and @scovich 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-16 Thread via GitHub


jhorstmann commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3760650115

   I ran the benchmark for "arrow_array_reader/BinaryViewArray/dictionary 
encoded, optional, half NULLs" locally with `samply` and can confirm that the 
code in this PR is an improvement. The difference was small, 0.5% for 
`ViewBuffer::into_array` on `main` vs 0.2% on this branch, so it is likely that 
some benchmark runs will show random fluctuation in either direction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-16 Thread via GitHub


alamb commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3759412364

   The benchmakrs now show a small but repeatable improvement for the array 
types. I am not sure how real it is but I am now convinced this PR is an 
improvement
   
   
   ```
   group
  alamb_less_parquet_view_allocationsmain
   -
  ---
   ...
   arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs   
  1.00270.3±4.43µs? ?/sec1.04
280.3±4.12µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs  
  1.04240.2±2.87µs? ?/sec1.00
231.5±3.62µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs
  1.00274.9±4.35µs? ?/sec1.01
278.2±3.97µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs
  1.00   352.1±10.76µs? ?/sec1.07
376.8±5.89µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short 
string1.00324.5±3.75µs? ?/sec1.08
350.0±3.34µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs   
  1.00290.8±3.70µs? ?/sec1.03
298.8±2.92µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs 
  1.00360.0±3.41µs? ?/sec1.06
382.9±1.37µs? ?/sec
   ...
   arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs   
  1.01274.2±2.40µs? ?/sec1.00
271.2±4.02µs? ?/sec
   arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs  
  1.05240.4±0.94µs? ?/sec1.00
230.1±1.69µs? ?/sec
   arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs
  1.00277.6±2.41µs? ?/sec1.00
276.7±3.68µs? ?/sec
   arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs
  1.01506.5±6.99µs? ?/sec1.00
503.3±5.39µs? ?/sec
   arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs   
  1.04364.8±9.48µs? ?/sec1.00
351.6±3.12µs? ?/sec
   arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs 
  1.01   519.1±36.53µs? ?/sec1.00   
512.2±11.52µs? ?/sec
   ...
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-15 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3755743798

   🤖: Benchmark completed
   
   Details
   
   
   
   ```
   groupalamb_less_parquet_view_allocations
main
   ----

   arrow_reader_clickbench/async/Q1 1.01  2.3±0.04ms? ?/sec
1.00  2.3±0.01ms? ?/sec
   arrow_reader_clickbench/async/Q101.00 12.7±0.33ms? ?/sec
1.02 13.0±0.28ms? ?/sec
   arrow_reader_clickbench/async/Q111.00 14.5±0.30ms? ?/sec
1.02 14.7±0.39ms? ?/sec
   arrow_reader_clickbench/async/Q121.00 24.9±0.29ms? ?/sec
1.02 25.4±0.36ms? ?/sec
   arrow_reader_clickbench/async/Q131.00 30.1±0.39ms? ?/sec
1.03 31.0±0.69ms? ?/sec
   arrow_reader_clickbench/async/Q141.00 27.4±0.55ms? ?/sec
1.03 28.2±0.48ms? ?/sec
   arrow_reader_clickbench/async/Q191.02  5.4±0.14ms? ?/sec
1.00  5.3±0.08ms? ?/sec
   arrow_reader_clickbench/async/Q201.06117.8±4.46ms? ?/sec
1.00111.4±0.82ms? ?/sec
   arrow_reader_clickbench/async/Q211.06154.1±2.51ms? ?/sec
1.00145.6±0.91ms? ?/sec
   arrow_reader_clickbench/async/Q221.06305.5±4.16ms? ?/sec
1.00   288.5±15.13ms? ?/sec
   arrow_reader_clickbench/async/Q231.00406.8±3.50ms? ?/sec
1.01410.3±5.85ms? ?/sec
   arrow_reader_clickbench/async/Q241.00 35.0±0.42ms? ?/sec
1.02 35.6±0.98ms? ?/sec
   arrow_reader_clickbench/async/Q271.01102.2±1.74ms? ?/sec
1.00101.6±1.13ms? ?/sec
   arrow_reader_clickbench/async/Q281.00100.2±0.95ms? ?/sec
1.01101.2±2.06ms? ?/sec
   arrow_reader_clickbench/async/Q301.02 31.6±0.29ms? ?/sec
1.00 31.0±1.08ms? ?/sec
   arrow_reader_clickbench/async/Q361.00107.9±1.46ms? ?/sec
1.01109.2±1.74ms? ?/sec
   arrow_reader_clickbench/async/Q371.00 83.9±0.62ms? ?/sec
1.00 84.2±0.93ms? ?/sec
   arrow_reader_clickbench/async/Q381.00 32.6±0.35ms? ?/sec
1.00 32.8±0.95ms? ?/sec
   arrow_reader_clickbench/async/Q391.00 45.7±0.81ms? ?/sec
1.03 46.9±0.73ms? ?/sec
   arrow_reader_clickbench/async/Q401.00 27.2±0.47ms? ?/sec
1.05 28.5±1.39ms? ?/sec
   arrow_reader_clickbench/async/Q411.00 21.9±0.48ms? ?/sec
1.05 23.0±0.88ms? ?/sec
   arrow_reader_clickbench/async/Q421.00 10.7±0.10ms? ?/sec
1.01 10.7±0.44ms? ?/sec
   arrow_reader_clickbench/sync/Q1  1.00  2.0±0.03ms? ?/sec
1.01  2.1±0.04ms? ?/sec
   arrow_reader_clickbench/sync/Q10 1.00  9.8±0.08ms? ?/sec
1.01  9.9±0.20ms? ?/sec
   arrow_reader_clickbench/sync/Q11 1.00 11.4±0.13ms? ?/sec
1.00 11.5±0.16ms? ?/sec
   arrow_reader_clickbench/sync/Q12 1.04 37.4±2.22ms? ?/sec
1.00 36.0±1.99ms? ?/sec
   arrow_reader_clickbench/sync/Q13 1.02 45.9±1.26ms? ?/sec
1.00 45.2±2.19ms? ?/sec
   arrow_reader_clickbench/sync/Q14 1.03 42.5±0.63ms? ?/sec
1.00 41.3±0.64ms? ?/sec
   arrow_reader_clickbench/sync/Q19 1.01  4.2±0.05ms? ?/sec
1.00  4.2±0.07ms? ?/sec
   arrow_reader_clickbench/sync/Q20 1.03182.5±4.89ms? ?/sec
1.00176.5±1.64ms? ?/sec
   arrow_reader_clickbench/sync/Q21 1.04242.8±5.93ms? ?/sec
1.00234.6±1.95ms? ?/sec
   arrow_reader_clickbench/sync/Q22 1.00479.6±4.36ms? ?/sec
1.04   497.4±13.81ms? ?/sec
   arrow_reader_clickbench/sync/Q23 1.03   446.3±17.22ms? ?/sec
1.00   432.6±17.26ms? ?/sec
   arrow_reader_clickbench/sync/Q24 1.00 42.2±0.48ms? ?/sec
1.13 47.6±2.02ms? ?/sec
   arrow_reader_clickbench/sync/Q27 1.00152.9±0.90ms? ?/sec
1.04159.0±4.02ms? ?/sec
   arrow_reader_clickbench/sync/Q28 1.00149.5±1.31ms? ?/sec
1.01150.8±3.71ms? ?/sec
   arrow_reader_clickbench/sync/Q30 1.00 30.5±0.46ms? ?/sec
1.00 30.6±0.70ms? ?/sec
   arrow_reader_clickbench/sync/Q36 1.00153.4±1.15ms? ?/sec
1.00152.9±1.67ms? ?/sec
   arrow_reader_clickbench/sync/Q37 1.01 89.2±2.06ms? ?/sec
1.00 88.3±1.38ms? ?/sec
   arrow_reader_clickbench/sync/Q38 1.03 29.9±0.38ms? ?/sec
1.00 28.9±0.64ms? ?/sec
   arrow_reader_c

Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-15 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3755619078

   🤖: Benchmark completed
   
   Details
   
   
   
   ```
   group
  alamb_less_parquet_view_allocationsmain
   -
  ---
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no 
NULLs   1.06   1069.2±7.88µs? ?/sec1.00   
1006.0±4.01µs? ?/sec
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half 
NULLs  1.00  1183.3±15.68µs? ?/sec1.07   
1268.5±6.88µs? ?/sec
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no 
NULLs1.06  1077.3±11.36µs? ?/sec1.00  
1013.6±10.88µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs   
  1.00478.6±4.31µs? ?/sec1.08
515.3±4.11µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs  
  1.00662.4±8.18µs? ?/sec1.05
698.7±3.75µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs
  1.00494.8±5.18µs? ?/sec1.03
509.8±5.10µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs
  1.01   546.9±10.67µs? ?/sec1.00
543.4±5.55µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs   
  1.00733.0±5.70µs? ?/sec1.02
747.2±5.61µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs 
  1.00556.6±3.09µs? ?/sec1.00
558.7±8.08µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs   
  1.00272.7±4.63µs? ?/sec1.00
272.0±5.67µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs  
  1.00238.5±5.52µs? ?/sec1.00
237.8±3.59µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs
  1.00277.5±3.48µs? ?/sec1.04   
287.4±23.05µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs
  1.00348.1±3.33µs? ?/sec1.08   
377.5±10.42µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short 
string1.00322.9±1.57µs? ?/sec1.08
348.2±8.59µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs   
  1.00289.8±1.56µs? ?/sec1.00
289.4±2.06µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs 
  1.00357.6±4.66µs? ?/sec1.07
382.9±6.66µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, mandatory, no NULLs 1.08  1009.7±13.57µs? ?/sec1.00   
935.3±50.00µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, optional, half NULLs1.10870.3±4.53µs? ?/sec1.00   
791.8±10.07µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, optional, no NULLs  1.09  1022.8±28.48µs? ?/sec1.00
935.7±7.80µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
mandatory, no NULLs 1.08326.4±4.36µs? ?/sec1.00 
   301.2±5.42µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
optional, half NULLs1.14519.2±2.91µs? ?/sec1.00 
   457.3±6.90µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
optional, no NULLs  1.08   335.0±10.95µs? ?/sec1.00 
   310.6±4.20µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split 
encoded, mandatory, no NULLs1.00161.0±1.39µs? ?/sec1.24 
  198.9±17.35µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split 
encoded, optional, half NULLs   1.00278.0±1.47µs? ?/sec1.22 
   338.5±5.97µs? ?/sec
   arrow_array_reader/FIXED_L

Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-15 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3755619336

   🤖 `./gh_compare_arrow.sh` 
[gh_compare_arrow.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_arrow.sh)
 Running
   Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing alamb/less_parquet_view_allocations 
(9e4bbb81f4ee35bd2d0d4ba4070b8ac3e8a9c33f) to 
d81d6c37ae9ef0f4971657371c0ebf1cf46a67bc 
[diff](https://github.com/apache/arrow-rs/compare/d81d6c37ae9ef0f4971657371c0ebf1cf46a67bc..9e4bbb81f4ee35bd2d0d4ba4070b8ac3e8a9c33f)
   BENCH_NAME=arrow_reader_clickbench
   BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental 
--bench arrow_reader_clickbench 
   BENCH_FILTER=
   BENCH_BRANCH_NAME=alamb_less_parquet_view_allocations
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-15 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3755255410

   🤖 `./gh_compare_arrow.sh` 
[gh_compare_arrow.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_arrow.sh)
 Running
   Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing alamb/less_parquet_view_allocations 
(9e4bbb81f4ee35bd2d0d4ba4070b8ac3e8a9c33f) to 
d81d6c37ae9ef0f4971657371c0ebf1cf46a67bc 
[diff](https://github.com/apache/arrow-rs/compare/d81d6c37ae9ef0f4971657371c0ebf1cf46a67bc..9e4bbb81f4ee35bd2d0d4ba4070b8ac3e8a9c33f)
   BENCH_NAME=arrow_reader
   BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental 
--bench arrow_reader 
   BENCH_FILTER=
   BENCH_BRANCH_NAME=alamb_less_parquet_view_allocations
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-15 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3755255175

   🤖: Benchmark completed
   
   Details
   
   
   
   ```
   group
  alamb_less_parquet_view_allocationsmain
   -
  ---
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no 
NULLs   1.06   1070.2±6.02µs? ?/sec1.00   
1005.8±4.72µs? ?/sec
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half 
NULLs  1.00   1180.5±3.78µs? ?/sec1.07  
1264.1±13.94µs? ?/sec
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no 
NULLs1.06   1077.0±4.94µs? ?/sec1.00   
1017.3±7.77µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs   
  1.00   485.0±11.00µs? ?/sec1.04
502.7±5.71µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs  
  1.00666.4±8.20µs? ?/sec1.05
697.3±4.76µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs
  1.00498.1±4.19µs? ?/sec1.04
515.8±4.64µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs
  1.01   552.4±13.37µs? ?/sec1.00
545.3±7.70µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs   
  1.00   729.9±11.78µs? ?/sec1.03
751.8±6.84µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs 
  1.00558.0±6.34µs? ?/sec1.00
557.6±4.20µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs   
  1.00270.3±4.43µs? ?/sec1.04
280.3±4.12µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs  
  1.04240.2±2.87µs? ?/sec1.00
231.5±3.62µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs
  1.00274.9±4.35µs? ?/sec1.01
278.2±3.97µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs
  1.00   352.1±10.76µs? ?/sec1.07
376.8±5.89µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short 
string1.00324.5±3.75µs? ?/sec1.08
350.0±3.34µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs   
  1.00290.8±3.70µs? ?/sec1.03
298.8±2.92µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs 
  1.00360.0±3.41µs? ?/sec1.06
382.9±1.37µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, mandatory, no NULLs 1.09   1013.0±5.81µs? ?/sec1.00
929.1±2.90µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, optional, half NULLs1.14875.9±7.31µs? ?/sec1.00   
768.5±10.64µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, optional, no NULLs  1.09  1024.6±18.13µs? ?/sec1.00   
938.9±10.14µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
mandatory, no NULLs 1.09339.9±2.81µs? ?/sec1.00 
   312.4±4.76µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
optional, half NULLs1.14524.4±2.56µs? ?/sec1.00 
   459.8±5.66µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
optional, no NULLs  1.10345.9±3.94µs? ?/sec1.00 
   314.6±7.73µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split 
encoded, mandatory, no NULLs1.00160.6±0.97µs? ?/sec1.21 
   195.0±1.61µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split 
encoded, optional, half NULLs   1.00276.8±3.77µs? ?/sec1.22 
   336.5±2.26µs? ?/sec
   arrow_array_reader/FIXED_L

Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-15 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3754893317

   🤖 `./gh_compare_arrow.sh` 
[gh_compare_arrow.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_arrow.sh)
 Running
   Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing alamb/less_parquet_view_allocations 
(9e4bbb81f4ee35bd2d0d4ba4070b8ac3e8a9c33f) to 
d81d6c37ae9ef0f4971657371c0ebf1cf46a67bc 
[diff](https://github.com/apache/arrow-rs/compare/d81d6c37ae9ef0f4971657371c0ebf1cf46a67bc..9e4bbb81f4ee35bd2d0d4ba4070b8ac3e8a9c33f)
   BENCH_NAME=arrow_reader
   BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental 
--bench arrow_reader 
   BENCH_FILTER=
   BENCH_BRANCH_NAME=alamb_less_parquet_view_allocations
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-15 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3754893013

   🤖: Benchmark completed
   
   Details
   
   
   
   ```
   group
  alamb_less_parquet_view_allocationsmain
   -
  ---
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no 
NULLs   1.00   1007.2±6.04µs? ?/sec1.00   
1007.4±5.46µs? ?/sec
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half 
NULLs  1.00   1241.8±5.22µs? ?/sec1.02  
1261.2±11.51µs? ?/sec
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no 
NULLs1.00  1014.6±14.09µs? ?/sec1.00  
1017.6±19.81µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs   
  1.00   502.1±29.06µs? ?/sec1.01
509.2±3.22µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs  
  1.00   661.4±17.65µs? ?/sec1.06
699.8±6.51µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs
  1.00488.9±8.17µs? ?/sec1.06
518.9±3.66µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs
  1.00549.0±3.28µs? ?/sec1.01
553.7±6.72µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs   
  1.00723.1±6.29µs? ?/sec1.04   
749.7±20.46µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs 
  1.01   563.1±28.07µs? ?/sec1.00
558.0±4.11µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs   
  1.01280.1±2.11µs? ?/sec1.00
277.4±3.74µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs  
  1.10262.1±5.11µs? ?/sec1.00
239.2±3.27µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs
  1.00276.9±2.46µs? ?/sec1.03   
286.4±26.14µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs
  1.00350.1±1.87µs? ?/sec1.07
374.6±6.98µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short 
string1.00344.7±4.78µs? ?/sec1.01
348.5±2.09µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs   
  1.05313.8±3.08µs? ?/sec1.00
299.5±3.04µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs 
  1.00358.1±3.96µs? ?/sec1.06
380.8±2.87µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, mandatory, no NULLs 1.06   972.6±13.08µs? ?/sec1.00
917.6±8.06µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, optional, half NULLs1.09824.5±3.74µs? ?/sec1.00
759.0±8.68µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, optional, no NULLs  1.06978.1±7.28µs? ?/sec1.00
926.8±9.95µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
mandatory, no NULLs 1.00295.1±2.98µs? ?/sec1.02 
   300.0±8.43µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
optional, half NULLs1.07487.6±3.97µs? ?/sec1.00 
   455.7±2.79µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
optional, no NULLs  1.00304.2±8.06µs? ?/sec1.01 
   307.2±7.12µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split 
encoded, mandatory, no NULLs1.04202.6±1.49µs? ?/sec1.00 
   195.4±3.97µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split 
encoded, optional, half NULLs   1.00319.4±1.42µs? ?/sec1.06 
   337.6±1.52µs? ?/sec
   arrow_array_reader/FIXED_L

Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-15 Thread via GitHub


alamb commented on code in PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#discussion_r2694185842


##
parquet/src/arrow/buffer/view_buffer.rs:
##
@@ -70,26 +70,18 @@ impl ViewBuffer {
 /// Converts this into an [`ArrayRef`] with the provided `data_type` and 
`null_buffer`
 pub fn into_array(self, null_buffer: Option, data_type: 
&ArrowType) -> ArrayRef {
 let len = self.views.len();
-let views = Buffer::from_vec(self.views);
+let views = ScalarBuffer::from(self.views);
+let nulls = null_buffer
+.map(|b| NullBuffer::new(BooleanBuffer::new(b, 0, len)))
+.filter(|n| n.null_count() != 0);

Review Comment:
   replicates the behavior of ArrayDataBuilder in 
   
   
https://github.com/apache/arrow-rs/blob/237065b2bbf9b0f249321828acd91f07387669a1/arrow-data/src/data.rs#L2120-L2119



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-15 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3754467526

   🤖 `./gh_compare_arrow.sh` 
[gh_compare_arrow.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_arrow.sh)
 Running
   Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing alamb/less_parquet_view_allocations 
(8b46523f9058a0df41528a91f1533bfd36eefc08) to 
d81d6c37ae9ef0f4971657371c0ebf1cf46a67bc 
[diff](https://github.com/apache/arrow-rs/compare/d81d6c37ae9ef0f4971657371c0ebf1cf46a67bc..8b46523f9058a0df41528a91f1533bfd36eefc08)
   BENCH_NAME=arrow_reader
   BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental 
--bench arrow_reader 
   BENCH_FILTER=
   BENCH_BRANCH_NAME=alamb_less_parquet_view_allocations
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-13 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3744844420

   🤖: Benchmark completed
   
   Details
   
   
   
   ```
   group
  alamb_less_parquet_view_allocationsmain
   -
  ---
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no 
NULLs   1.04  1227.2±12.79µs? ?/sec1.00   
1184.1±9.85µs? ?/sec
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half 
NULLs  1.04  1301.3±12.15µs? ?/sec1.00   
1249.9±9.17µs? ?/sec
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no 
NULLs1.03  1232.5±23.52µs? ?/sec1.00  
1195.5±31.54µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs   
  1.04505.4±3.99µs? ?/sec1.00
485.6±4.28µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs  
  1.05697.4±5.61µs? ?/sec1.00
665.2±4.28µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs
  1.04510.2±4.35µs? ?/sec1.00
492.5±4.48µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs
  1.01555.2±5.47µs? ?/sec1.00
551.6±9.70µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs   
  1.05758.8±8.99µs? ?/sec1.00   
723.7±11.91µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs 
  1.01568.2±5.90µs? ?/sec1.00
561.7±4.92µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs   
  1.00252.4±4.30µs? ?/sec1.04
263.0±3.68µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs  
  1.06   253.9±20.21µs? ?/sec1.00
240.6±1.12µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs
  1.00254.6±3.17µs? ?/sec1.05
267.9±7.36µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs
  1.05347.7±3.72µs? ?/sec1.00
332.3±3.63µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short 
string1.02344.0±3.17µs? ?/sec1.00
337.6±1.81µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs   
  1.11314.7±8.05µs? ?/sec1.00
283.1±2.39µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs 
  1.05356.0±2.65µs? ?/sec1.00
340.6±2.04µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, mandatory, no NULLs 1.00   1078.2±4.42µs? ?/sec1.00  
1081.3±11.36µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, optional, half NULLs1.00   924.0±33.43µs? ?/sec1.02
943.1±6.44µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, optional, no NULLs  1.00   1086.1±6.79µs? ?/sec1.00  
1091.1±22.78µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
mandatory, no NULLs 1.13   457.5±11.52µs? ?/sec1.00 
   404.9±7.80µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
optional, half NULLs1.01605.6±7.77µs? ?/sec1.00 
   597.1±9.31µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
optional, no NULLs  1.12460.5±4.82µs? ?/sec1.00 
   411.2±3.99µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split 
encoded, mandatory, no NULLs1.21194.7±1.88µs? ?/sec1.00 
   160.9±1.73µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split 
encoded, optional, half NULLs   1.17336.4±3.83µs? ?/sec1.00 
   288.0±5.62µs? ?/sec
   arrow_array_reader/FIXED_L

Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-13 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3744433691

   🤖 `./gh_compare_arrow.sh` 
[gh_compare_arrow.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_arrow.sh)
 Running
   Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing alamb/less_parquet_view_allocations 
(5949120cb5c2cc8be4f56449228e8ac3c7a58101) to 
7ecef6e5f01910e2c0f6bbd8c4591115fabca8e8 
[diff](https://github.com/apache/arrow-rs/compare/7ecef6e5f01910e2c0f6bbd8c4591115fabca8e8..5949120cb5c2cc8be4f56449228e8ac3c7a58101)
   BENCH_NAME=arrow_reader
   BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental 
--bench arrow_reader 
   BENCH_FILTER=
   BENCH_BRANCH_NAME=alamb_less_parquet_view_allocations
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-13 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3744433419

   🤖: Benchmark completed
   
   Details
   
   
   
   ```
   group
  alamb_less_parquet_view_allocationsmain
   -
  ---
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no 
NULLs   1.04  1228.3±25.70µs? ?/sec1.00  
1181.5±11.87µs? ?/sec
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half 
NULLs  1.04   1298.0±9.22µs? ?/sec1.00  
1249.3±15.50µs? ?/sec
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no 
NULLs1.04   1232.2±3.77µs? ?/sec1.00  
1186.6±15.05µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs   
  1.03505.2±5.86µs? ?/sec1.00   
489.7±13.30µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs  
  1.05   700.6±12.12µs? ?/sec1.00
667.1±7.76µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs
  1.05521.1±4.56µs? ?/sec1.00   
494.6±11.40µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs
  1.01   556.6±15.96µs? ?/sec1.00
551.6±7.93µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs   
  1.03750.9±7.73µs? ?/sec1.00
725.6±5.00µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs 
  1.00564.9±6.68µs? ?/sec1.01   
569.2±10.25µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs   
  1.00258.7±2.57µs? ?/sec1.01
260.1±6.11µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs  
  1.09252.2±2.08µs? ?/sec1.00
231.3±4.46µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs
  1.00256.2±4.14µs? ?/sec1.01
258.0±6.07µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs
  1.06353.1±8.07µs? ?/sec1.00
331.6±1.48µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short 
string1.02346.6±4.96µs? ?/sec1.00
338.3±5.34µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs   
  1.13316.3±2.25µs? ?/sec1.00
279.9±4.27µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs 
  1.06360.0±2.42µs? ?/sec1.00
340.7±1.94µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, mandatory, no NULLs 1.00  1082.9±19.46µs? ?/sec1.00  
1085.8±27.48µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, optional, half NULLs1.00905.9±7.25µs? ?/sec1.02
926.3±4.57µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, optional, no NULLs  1.00  1089.4±20.48µs? ?/sec1.00  
1092.6±21.44µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
mandatory, no NULLs 1.11   459.2±12.04µs? ?/sec1.00 
   412.2±6.12µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
optional, half NULLs1.00596.1±8.09µs? ?/sec1.02 
   605.2±7.95µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
optional, no NULLs  1.10464.9±8.13µs? ?/sec1.00 
   421.3±7.22µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split 
encoded, mandatory, no NULLs1.22195.2±1.31µs? ?/sec1.00 
   160.3±1.21µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split 
encoded, optional, half NULLs   1.17337.5±1.64µs? ?/sec1.00 
   287.9±3.27µs? ?/sec
   arrow_array_reader/FIXED_L

Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-13 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3744077821

   🤖 `./gh_compare_arrow.sh` 
[gh_compare_arrow.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_arrow.sh)
 Running
   Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing alamb/less_parquet_view_allocations 
(5949120cb5c2cc8be4f56449228e8ac3c7a58101) to 
7ecef6e5f01910e2c0f6bbd8c4591115fabca8e8 
[diff](https://github.com/apache/arrow-rs/compare/7ecef6e5f01910e2c0f6bbd8c4591115fabca8e8..5949120cb5c2cc8be4f56449228e8ac3c7a58101)
   BENCH_NAME=arrow_reader
   BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental 
--bench arrow_reader 
   BENCH_FILTER=
   BENCH_BRANCH_NAME=alamb_less_parquet_view_allocations
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-13 Thread via GitHub


alamb commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3744076718

   
   run benchmark arrow_reader
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-13 Thread via GitHub


alamb commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3744076359

   
   run benchmark arrow_reader
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-13 Thread via GitHub


alamb commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3744047879

   I could not reproduce the benchmark results locally
   
   ```
Running benches/arrow_reader.rs 
(target/release/deps/arrow_reader-0c5ec49ee5cbea6a)
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short 
string
   time:   [160.36 µs 160.64 µs 161.02 µs]
   change: [−0.4158% −0.0942% +0.2111%] (p = 0.56 > 
0.05)
   No change in performance detected.
   Found 11 outliers among 100 measurements (11.00%)
 3 (3.00%) high mild
 8 (8.00%) high severe
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs
   time:   [147.70 µs 148.37 µs 149.07 µs]
   change: [+0.7348% +1.2863% +1.8236%] (p = 0.00 < 
0.05)
   Change within noise threshold.
   Found 25 outliers among 100 measurements (25.00%)
 19 (19.00%) low mild
 4 (4.00%) high mild
 2 (2.00%) high severe
   arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs
   time:   [147.02 µs 147.41 µs 147.88 µs]
   change: [−1.3623% −0.8303% −0.2939%] (p = 0.00 < 
0.05)
   Change within noise threshold.
   arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs
   time:   [142.34 µs 142.93 µs 143.58 µs]
   change: [−1.6259% −0.8845% −0.2890%] (p = 0.01 < 
0.05)
   Change within noise threshold.
   Found 24 outliers among 100 measurements (24.00%)
 9 (9.00%) low severe
 2 (2.00%) low mild
 6 (6.00%) high mild
 7 (7.00%) high severe
   ```
   
   However, note the number of outliers found
   
   I will see if the benchmark needs adjusting (perhaps with more rows, etc)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-13 Thread via GitHub


alamb commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3744017662

   🤔  looks like this maybe made things slower -- will investigate
   
   ```
   group
  alamb_less_parquet_view_allocationsmain
   -
  ---
   
   arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs   
  1.00247.2±3.07µs? ?/sec1.05
259.0±3.79µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs  
  1.08248.4±3.81µs? ?/sec1.00
229.5±2.43µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs
  1.00256.8±5.74µs? ?/sec1.00
256.0±3.58µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs
  1.27370.5±3.94µs? ?/sec1.00
292.5±1.63µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short 
string1.13345.4±3.14µs? ?/sec1.00
305.8±1.22µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs   
  1.20319.2±7.52µs? ?/sec1.00
265.4±1.64µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs 
  1.25378.6±4.48µs? ?/sec1.00
301.7±5.25µs? ?/sec
   
   ...
   
   arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs   
  1.00252.8±3.56µs? ?/sec1.03
261.6±5.36µs? ?/sec
   arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs  
  1.09251.1±2.55µs? ?/sec1.00
229.8±1.92µs? ?/sec
   arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs
  1.00256.7±3.02µs? ?/sec1.00
256.1±2.58µs? ?/sec
   arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs
  1.13500.0±7.89µs? ?/sec1.00
444.1±2.78µs? ?/sec
   arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs   
  1.15   386.8±22.22µs? ?/sec1.00
337.2±3.66µs? ?/sec
   arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs 
  1.12508.2±8.79µs? ?/sec1.00   
453.9±10.55µs? ?/sec
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-10 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3732899308

   🤖: Benchmark completed
   
   Details
   
   
   
   ```
   group
  alamb_less_parquet_view_allocationsmain
   -
  ---
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no 
NULLs   1.00  1242.4±12.77µs? ?/sec1.02   
1269.3±8.70µs? ?/sec
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half 
NULLs  1.00  1298.8±12.21µs? ?/sec1.00  
1294.5±18.86µs? ?/sec
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no 
NULLs1.00  1250.1±20.41µs? ?/sec1.03  
1281.6±12.16µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs   
  1.06   498.6±14.62µs? ?/sec1.00
471.9±5.91µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs  
  1.07   696.3±10.19µs? ?/sec1.00
650.2±8.79µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs
  1.08516.2±6.73µs? ?/sec1.00
477.9±6.91µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs
  1.00   550.0±11.38µs? ?/sec1.05
575.9±8.06µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs   
  1.01742.8±6.35µs? ?/sec1.00
738.4±7.37µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs 
  1.00556.2±8.77µs? ?/sec1.05
585.9±7.26µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs   
  1.00249.7±4.60µs? ?/sec1.04
258.8±6.82µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs  
  1.06   251.3±19.79µs? ?/sec1.00
237.8±1.27µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs
  1.00254.4±6.48µs? ?/sec1.03
261.2±3.26µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs
  1.27371.2±7.44µs? ?/sec1.00
292.7±5.69µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short 
string1.13346.1±5.71µs? ?/sec1.00
305.7±4.26µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs   
  1.21320.4±4.82µs? ?/sec1.00
265.5±3.50µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs 
  1.26378.0±4.65µs? ?/sec1.00
299.4±5.31µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, mandatory, no NULLs 1.00   1053.6±5.23µs? ?/sec1.02   
1070.1±9.63µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, optional, half NULLs1.00   898.0±28.22µs? ?/sec1.05
945.3±9.24µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, optional, no NULLs  1.00  1063.8±16.97µs? ?/sec1.02  
1086.1±55.42µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
mandatory, no NULLs 1.08431.0±3.10µs? ?/sec1.00 
   399.5±5.06µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
optional, half NULLs1.00   581.7±13.59µs? ?/sec1.02 
   592.0±5.64µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
optional, no NULLs  1.07443.3±3.49µs? ?/sec1.00 
   413.0±6.87µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split 
encoded, mandatory, no NULLs1.00194.3±1.44µs? ?/sec1.05 
   203.0±4.61µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split 
encoded, optional, half NULLs   1.06336.8±1.96µs? ?/sec1.00 
   317.2±5.77µs? ?/sec
   arrow_array_reader/FIXED_L

Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-10 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3732745297

   🤖 `./gh_compare_arrow.sh` 
[gh_compare_arrow.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_arrow.sh)
 Running
   Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing alamb/less_parquet_view_allocations 
(2ed62c2f7737210912a95c584a1b459387d3be31) to 
96637fc8b928a94de53bbec3501337c0ecfbf936 
[diff](https://github.com/apache/arrow-rs/compare/96637fc8b928a94de53bbec3501337c0ecfbf936..2ed62c2f7737210912a95c584a1b459387d3be31)
   BENCH_NAME=arrow_reader
   BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental 
--bench arrow_reader 
   BENCH_FILTER=
   BENCH_BRANCH_NAME=alamb_less_parquet_view_allocations
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-10 Thread via GitHub


alamb commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3732535494

   run benchmark arrow_reader


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-10 Thread via GitHub


alamb commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3732535751

   This appears to show reading StringView getting slower. I will try and 
reproduce


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-09 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3730447473

   🤖: Benchmark completed
   
   Details
   
   
   
   ```
   groupalamb_less_parquet_view_allocations
main
   ----

   arrow_reader_clickbench/async/Q1 1.01  2.4±0.04ms? ?/sec
1.00  2.3±0.04ms? ?/sec
   arrow_reader_clickbench/async/Q101.04 13.5±0.49ms? ?/sec
1.00 12.9±0.40ms? ?/sec
   arrow_reader_clickbench/async/Q111.02 15.3±0.63ms? ?/sec
1.00 15.0±0.47ms? ?/sec
   arrow_reader_clickbench/async/Q121.02 26.5±0.64ms? ?/sec
1.00 25.9±1.00ms? ?/sec
   arrow_reader_clickbench/async/Q131.01 31.6±0.63ms? ?/sec
1.00 31.3±0.79ms? ?/sec
   arrow_reader_clickbench/async/Q141.02 29.3±0.75ms? ?/sec
1.00 28.8±1.00ms? ?/sec
   arrow_reader_clickbench/async/Q191.00  5.3±0.10ms? ?/sec
1.00  5.3±0.14ms? ?/sec
   arrow_reader_clickbench/async/Q201.00114.0±0.97ms? ?/sec
1.08123.3±1.00ms? ?/sec
   arrow_reader_clickbench/async/Q211.00131.9±1.22ms? ?/sec
1.19157.2±2.51ms? ?/sec
   arrow_reader_clickbench/async/Q221.00268.8±9.01ms? ?/sec
1.17313.9±6.71ms? ?/sec
   arrow_reader_clickbench/async/Q231.00404.0±4.15ms? ?/sec
1.01409.4±2.85ms? ?/sec
   arrow_reader_clickbench/async/Q241.00 34.6±1.09ms? ?/sec
1.01 34.8±0.64ms? ?/sec
   arrow_reader_clickbench/async/Q271.00 98.9±0.96ms? ?/sec
1.03101.5±0.87ms? ?/sec
   arrow_reader_clickbench/async/Q281.00 97.8±1.15ms? ?/sec
1.02 99.8±0.94ms? ?/sec
   arrow_reader_clickbench/async/Q301.00 30.9±0.66ms? ?/sec
1.00 31.0±0.77ms? ?/sec
   arrow_reader_clickbench/async/Q361.00107.9±0.82ms? ?/sec
1.02110.1±0.80ms? ?/sec
   arrow_reader_clickbench/async/Q371.00 84.7±0.61ms? ?/sec
1.01 85.9±0.69ms? ?/sec
   arrow_reader_clickbench/async/Q381.00 32.7±0.52ms? ?/sec
1.03 33.6±0.60ms? ?/sec
   arrow_reader_clickbench/async/Q391.00 46.0±0.66ms? ?/sec
1.02 47.0±1.46ms? ?/sec
   arrow_reader_clickbench/async/Q401.01 27.8±0.57ms? ?/sec
1.00 27.5±0.79ms? ?/sec
   arrow_reader_clickbench/async/Q411.03 22.7±0.48ms? ?/sec
1.00 22.0±0.61ms? ?/sec
   arrow_reader_clickbench/async/Q421.01 11.2±0.24ms? ?/sec
1.00 11.1±0.33ms? ?/sec
   arrow_reader_clickbench/sync/Q1  1.00  2.1±0.04ms? ?/sec
1.00  2.1±0.08ms? ?/sec
   arrow_reader_clickbench/sync/Q10 1.03 10.2±0.07ms? ?/sec
1.00  9.9±0.12ms? ?/sec
   arrow_reader_clickbench/sync/Q11 1.04 12.0±0.33ms? ?/sec
1.00 11.5±0.13ms? ?/sec
   arrow_reader_clickbench/sync/Q12 1.01 34.4±1.86ms? ?/sec
1.00 34.2±0.72ms? ?/sec
   arrow_reader_clickbench/sync/Q13 1.00 38.5±0.68ms? ?/sec
1.25 48.0±1.08ms? ?/sec
   arrow_reader_clickbench/sync/Q14 1.00 36.3±0.65ms? ?/sec
1.26 45.8±1.25ms? ?/sec
   arrow_reader_clickbench/sync/Q19 1.00  4.3±0.09ms? ?/sec
1.00  4.3±0.10ms? ?/sec
   arrow_reader_clickbench/sync/Q20 1.00174.0±1.17ms? ?/sec
1.02177.6±1.17ms? ?/sec
   arrow_reader_clickbench/sync/Q21 1.00231.0±1.72ms? ?/sec
1.02235.8±2.93ms? ?/sec
   arrow_reader_clickbench/sync/Q22 1.00470.9±3.30ms? ?/sec
1.02482.3±4.82ms? ?/sec
   arrow_reader_clickbench/sync/Q23 1.01   439.1±16.98ms? ?/sec
1.00   435.0±14.24ms? ?/sec
   arrow_reader_clickbench/sync/Q24 1.00 43.9±0.92ms? ?/sec
1.04 45.8±0.74ms? ?/sec
   arrow_reader_clickbench/sync/Q27 1.00150.8±1.43ms? ?/sec
1.03155.3±2.00ms? ?/sec
   arrow_reader_clickbench/sync/Q28 1.00146.7±1.24ms? ?/sec
1.02149.1±1.10ms? ?/sec
   arrow_reader_clickbench/sync/Q30 1.00 31.0±0.70ms? ?/sec
1.00 31.0±0.89ms? ?/sec
   arrow_reader_clickbench/sync/Q36 1.00151.8±1.95ms? ?/sec
1.02155.2±2.27ms? ?/sec
   arrow_reader_clickbench/sync/Q37 1.00 89.1±2.01ms? ?/sec
1.01 90.2±1.14ms? ?/sec
   arrow_reader_clickbench/sync/Q38 1.00 29.1±0.52ms? ?/sec
1.01 29.4±0.88ms? ?/sec
   arrow_reader_c

Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-09 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3730379031

   🤖 `./gh_compare_arrow.sh` 
[gh_compare_arrow.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_arrow.sh)
 Running
   Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing alamb/less_parquet_view_allocations 
(2ed62c2f7737210912a95c584a1b459387d3be31) to 
96637fc8b928a94de53bbec3501337c0ecfbf936 
[diff](https://github.com/apache/arrow-rs/compare/96637fc8b928a94de53bbec3501337c0ecfbf936..2ed62c2f7737210912a95c584a1b459387d3be31)
   BENCH_NAME=arrow_reader_clickbench
   BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental 
--bench arrow_reader_clickbench 
   BENCH_FILTER=
   BENCH_BRANCH_NAME=alamb_less_parquet_view_allocations
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-09 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3730378837

   🤖: Benchmark completed
   
   Details
   
   
   
   ```
   group
  alamb_less_parquet_view_allocationsmain
   -
  ---
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no 
NULLs   1.00  1240.5±15.33µs? ?/sec1.02  
1270.1±10.95µs? ?/sec
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half 
NULLs  1.00  1296.1±10.11µs? ?/sec1.00   
1293.4±9.27µs? ?/sec
   arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no 
NULLs1.00  1247.4±11.35µs? ?/sec1.03  
1283.2±10.19µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs   
  1.02496.6±5.61µs? ?/sec1.00
486.1±2.98µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs  
  1.05695.8±5.40µs? ?/sec1.00
661.6±5.40µs? ?/sec
   arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs
  1.07   523.0±21.80µs? ?/sec1.00
489.5±6.67µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs
  1.00546.0±5.17µs? ?/sec1.05
572.3±4.59µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs   
  1.01741.2±3.93µs? ?/sec1.00
730.8±3.48µs? ?/sec
   arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs 
  1.00558.0±4.10µs? ?/sec1.04
581.8±2.36µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs   
  1.00247.2±3.07µs? ?/sec1.05
259.0±3.79µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs  
  1.08248.4±3.81µs? ?/sec1.00
229.5±2.43µs? ?/sec
   arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs
  1.00256.8±5.74µs? ?/sec1.00
256.0±3.58µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs
  1.27370.5±3.94µs? ?/sec1.00
292.5±1.63µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short 
string1.13345.4±3.14µs? ?/sec1.00
305.8±1.22µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs   
  1.20319.2±7.52µs? ?/sec1.00
265.4±1.64µs? ?/sec
   arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs 
  1.25378.6±4.48µs? ?/sec1.00
301.7±5.25µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, mandatory, no NULLs 1.00   1055.0±8.26µs? ?/sec1.02   
1077.6±9.02µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, optional, half NULLs1.00891.8±5.71µs? ?/sec1.03
922.6±5.53µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split 
encoded, optional, no NULLs  1.00  1063.1±18.15µs? ?/sec1.02  
1089.2±22.06µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
mandatory, no NULLs 1.07433.4±4.65µs? ?/sec1.00 
   405.1±3.52µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
optional, half NULLs1.00582.2±8.22µs? ?/sec1.03 
  599.1±10.85µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, 
optional, no NULLs  1.06444.0±4.48µs? ?/sec1.00 
   418.4±9.49µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split 
encoded, mandatory, no NULLs1.00194.4±0.76µs? ?/sec1.04 
   202.9±4.88µs? ?/sec
   arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split 
encoded, optional, half NULLs   1.05335.5±0.94µs? ?/sec1.00 
   318.4±2.91µs? ?/sec
   arrow_array_reader/FIXED_L

Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-09 Thread via GitHub


alamb-ghbot commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3730131673

   🤖 `./gh_compare_arrow.sh` 
[gh_compare_arrow.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/gh_compare_arrow.sh)
 Running
   Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing alamb/less_parquet_view_allocations 
(2ed62c2f7737210912a95c584a1b459387d3be31) to 
96637fc8b928a94de53bbec3501337c0ecfbf936 
[diff](https://github.com/apache/arrow-rs/compare/96637fc8b928a94de53bbec3501337c0ecfbf936..2ed62c2f7737210912a95c584a1b459387d3be31)
   BENCH_NAME=arrow_reader
   BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental 
--bench arrow_reader 
   BENCH_FILTER=
   BENCH_BRANCH_NAME=alamb_less_parquet_view_allocations
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-09 Thread via GitHub


alamb commented on PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#issuecomment-3729351883

   
   run benchmark arrow_reader arrow_reader_clickbench
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [Parquet] perf: Create Utf8/BinaryViewArray directly rather than via `ArrayData` [arrow-rs]

2026-01-09 Thread via GitHub


alamb commented on code in PR #9121:
URL: https://github.com/apache/arrow-rs/pull/9121#discussion_r2676581345


##
parquet/src/arrow/buffer/view_buffer.rs:
##
@@ -70,26 +70,16 @@ impl ViewBuffer {
 /// Converts this into an [`ArrayRef`] with the provided `data_type` and 
`null_buffer`
 pub fn into_array(self, null_buffer: Option, data_type: 
&ArrowType) -> ArrayRef {
 let len = self.views.len();
-let views = Buffer::from_vec(self.views);
+let views = ScalarBuffer::from(self.views);

Review Comment:
   The new formulation is simpler too, which is a nice side effect



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]