Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]

2025-07-09 Thread via GitHub


conbench-apache-arrow[bot] commented on PR #47013:
URL: https://github.com/apache/arrow/pull/47013#issuecomment-3052960622

   After merging your PR, Conbench analyzed the 4 benchmarking runs that have 
been run so far on merge-commit d2a171805c63caa27f05232695b753e07c32cb1d.
   
   There were no benchmark performance regressions. 🎉
   
   The [full Conbench report](https://github.com/apache/arrow/runs/45649935569) 
has more details. It also includes information about 91 possible false 
positives for unstable benchmarks that are known to sometimes produce them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]

2025-07-09 Thread via GitHub


pitrou merged PR #47013:
URL: https://github.com/apache/arrow/pull/47013


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]

2025-07-08 Thread via GitHub


pitrou commented on PR #47013:
URL: https://github.com/apache/arrow/pull/47013#issuecomment-3051272031

   > So the problem is that, although the `builder->Reserve(num_decode_values)` 
is being set in decoder, it's batch size might be much smaller ( maybe page 
num-values ) than values to decode, causing re-allocating?
   
   Yes, because the decoder is working on a single page at a time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]

2025-07-08 Thread via GitHub


mapleFU commented on PR #47013:
URL: https://github.com/apache/arrow/pull/47013#issuecomment-3050730164

   So the problem is that, although the `builder->Reserve(num_decode_values)` 
is being set in decoder, it's batch size might be much smaller ( maybe page 
num-values ) than values to decode, causing re-allocating?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]

2025-07-08 Thread via GitHub


conbench-apache-arrow[bot] commented on PR #47013:
URL: https://github.com/apache/arrow/pull/47013#issuecomment-3049567297

   Thanks for your patience. Conbench analyzed the 4 benchmarking runs that 
have been run so far on PR commit 6067fe6c355aa88e202daa7ef2c538a291c3ad11.
   
   There were 128 benchmark results indicating a performance regression:
   
   - Pull Request Run on `amd64-c6a-4xlarge-linux` at [2025-07-08 
10:39:44Z](https://conbench.ursa.dev/compare/runs/7be1072c93124af997dfb2d3e0b6d97f...770ee47d46dd45caae25a3bb46232b1b/)
 - [`BM_PlainEncodingSpacedBoolean` (C++) with params=32768/100, 
source=cpp-micro, 
suite=parquet-encoding-benchmark](https://conbench.ursa.dev/compare/benchmarks/0686cdffe51f78228000b67bc336ee3e...0686cf61f2b37ee0800016105304fda6)
 - [`BM_BatchComputeHash` (C++) with params=, source=cpp-micro, 
suite=parquet-bloom-filter-benchmark](https://conbench.ursa.dev/compare/benchmarks/0686ce0025667df58000843292ca29c6...0686cf6243ae7cdb8000e1eddff6b430)
   - and 126 more (see the report linked below)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/45574526798) 
has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]

2025-07-08 Thread via GitHub


pitrou commented on PR #47013:
URL: https://github.com/apache/arrow/pull/47013#issuecomment-3049528862

   cc @adamreeve @mapleFU 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]

2025-07-08 Thread via GitHub


ursabot commented on PR #47013:
URL: https://github.com/apache/arrow/pull/47013#issuecomment-3048127270

   Benchmark runs are scheduled for commit 
6067fe6c355aa88e202daa7ef2c538a291c3ad11. Watch 
https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A 
comment will be posted here when the runs are complete.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]

2025-07-08 Thread via GitHub


pitrou commented on PR #47013:
URL: https://github.com/apache/arrow/pull/47013#issuecomment-3048126927

   @ursabot please benchmark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]

2025-07-07 Thread via GitHub


conbench-apache-arrow[bot] commented on PR #47013:
URL: https://github.com/apache/arrow/pull/47013#issuecomment-3046416673

   Thanks for your patience. Conbench analyzed the 4 benchmarking runs that 
have been run so far on PR commit 78fdb621dac8a04be04569ab9624eba36ad64c88.
   
   There were 31 benchmark results indicating a performance regression:
   
   - Pull Request Run on `amd64-c6a-4xlarge-linux` at [2025-07-07 
14:37:08Z](https://conbench.ursa.dev/compare/runs/d892774458ca47e29c04d27dcddecdce...d66cb2d65fd848d7bea471f571433ee9/)
 - [`BM_PlainEncodingSpacedBoolean` (C++) with params=32768/100, 
source=cpp-micro, 
suite=parquet-encoding-benchmark](https://conbench.ursa.dev/compare/benchmarks/0686bc63daea72948000eb7bd5ae64a1...0686bdc46d3e7acb8000c0d2467659a3)
 - [`BM_PlainEncodingSpacedBoolean` (C++) with params=32768/1, 
source=cpp-micro, 
suite=parquet-encoding-benchmark](https://conbench.ursa.dev/compare/benchmarks/0686bc63da7077c48000821fa3d04d46...0686bdc46cb774a480001d07160fdb74)
   - and 29 more (see the report linked below)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/45506332526) 
has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]

2025-07-07 Thread via GitHub


pitrou commented on PR #47013:
URL: https://github.com/apache/arrow/pull/47013#issuecomment-3045296685

   There doesn't seem to be any related regression on the benchmarks.
   I've also run this PR locally on a couple Parquet files I have lying around, 
and could not see any concerning performance drop.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]

2025-07-07 Thread via GitHub


ursabot commented on PR #47013:
URL: https://github.com/apache/arrow/pull/47013#issuecomment-3045176865

   Benchmark runs are scheduled for commit 
78fdb621dac8a04be04569ab9624eba36ad64c88. Watch 
https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A 
comment will be posted here when the runs are complete.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]

2025-07-07 Thread via GitHub


pitrou commented on PR #47013:
URL: https://github.com/apache/arrow/pull/47013#issuecomment-3045176485

   @ursabot please benchmark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]