Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]
conbench-apache-arrow[bot] commented on PR #47013: URL: https://github.com/apache/arrow/pull/47013#issuecomment-3052960622 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit d2a171805c63caa27f05232695b753e07c32cb1d. There were no benchmark performance regressions. 🎉 The [full Conbench report](https://github.com/apache/arrow/runs/45649935569) has more details. It also includes information about 91 possible false positives for unstable benchmarks that are known to sometimes produce them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]
pitrou merged PR #47013: URL: https://github.com/apache/arrow/pull/47013 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]
pitrou commented on PR #47013: URL: https://github.com/apache/arrow/pull/47013#issuecomment-3051272031 > So the problem is that, although the `builder->Reserve(num_decode_values)` is being set in decoder, it's batch size might be much smaller ( maybe page num-values ) than values to decode, causing re-allocating? Yes, because the decoder is working on a single page at a time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]
mapleFU commented on PR #47013: URL: https://github.com/apache/arrow/pull/47013#issuecomment-3050730164 So the problem is that, although the `builder->Reserve(num_decode_values)` is being set in decoder, it's batch size might be much smaller ( maybe page num-values ) than values to decode, causing re-allocating? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]
conbench-apache-arrow[bot] commented on PR #47013: URL: https://github.com/apache/arrow/pull/47013#issuecomment-3049567297 Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit 6067fe6c355aa88e202daa7ef2c538a291c3ad11. There were 128 benchmark results indicating a performance regression: - Pull Request Run on `amd64-c6a-4xlarge-linux` at [2025-07-08 10:39:44Z](https://conbench.ursa.dev/compare/runs/7be1072c93124af997dfb2d3e0b6d97f...770ee47d46dd45caae25a3bb46232b1b/) - [`BM_PlainEncodingSpacedBoolean` (C++) with params=32768/100, source=cpp-micro, suite=parquet-encoding-benchmark](https://conbench.ursa.dev/compare/benchmarks/0686cdffe51f78228000b67bc336ee3e...0686cf61f2b37ee0800016105304fda6) - [`BM_BatchComputeHash` (C++) with params=, source=cpp-micro, suite=parquet-bloom-filter-benchmark](https://conbench.ursa.dev/compare/benchmarks/0686ce0025667df58000843292ca29c6...0686cf6243ae7cdb8000e1eddff6b430) - and 126 more (see the report linked below) The [full Conbench report](https://github.com/apache/arrow/runs/45574526798) has more details. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]
pitrou commented on PR #47013: URL: https://github.com/apache/arrow/pull/47013#issuecomment-3049528862 cc @adamreeve @mapleFU -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]
ursabot commented on PR #47013: URL: https://github.com/apache/arrow/pull/47013#issuecomment-3048127270 Benchmark runs are scheduled for commit 6067fe6c355aa88e202daa7ef2c538a291c3ad11. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]
pitrou commented on PR #47013: URL: https://github.com/apache/arrow/pull/47013#issuecomment-3048126927 @ursabot please benchmark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]
conbench-apache-arrow[bot] commented on PR #47013: URL: https://github.com/apache/arrow/pull/47013#issuecomment-3046416673 Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit 78fdb621dac8a04be04569ab9624eba36ad64c88. There were 31 benchmark results indicating a performance regression: - Pull Request Run on `amd64-c6a-4xlarge-linux` at [2025-07-07 14:37:08Z](https://conbench.ursa.dev/compare/runs/d892774458ca47e29c04d27dcddecdce...d66cb2d65fd848d7bea471f571433ee9/) - [`BM_PlainEncodingSpacedBoolean` (C++) with params=32768/100, source=cpp-micro, suite=parquet-encoding-benchmark](https://conbench.ursa.dev/compare/benchmarks/0686bc63daea72948000eb7bd5ae64a1...0686bdc46d3e7acb8000c0d2467659a3) - [`BM_PlainEncodingSpacedBoolean` (C++) with params=32768/1, source=cpp-micro, suite=parquet-encoding-benchmark](https://conbench.ursa.dev/compare/benchmarks/0686bc63da7077c48000821fa3d04d46...0686bdc46cb774a480001d07160fdb74) - and 29 more (see the report linked below) The [full Conbench report](https://github.com/apache/arrow/runs/45506332526) has more details. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]
pitrou commented on PR #47013: URL: https://github.com/apache/arrow/pull/47013#issuecomment-3045296685 There doesn't seem to be any related regression on the benchmarks. I've also run this PR locally on a couple Parquet files I have lying around, and could not see any concerning performance drop. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]
ursabot commented on PR #47013: URL: https://github.com/apache/arrow/pull/47013#issuecomment-3045176865 Benchmark runs are scheduled for commit 78fdb621dac8a04be04569ab9624eba36ad64c88. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-47012: [C++][Parquet] Reserve values correctly when reading BYTE_ARRAY and FLBA [arrow]
pitrou commented on PR #47013: URL: https://github.com/apache/arrow/pull/47013#issuecomment-3045176485 @ursabot please benchmark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
