Re: [I] arrow_reader_row_filter benchmark doesn't capture page cache improvements [arrow-rs]

2025-05-13 Thread via GitHub
alamb closed issue #7460: arrow_reader_row_filter benchmark doesn't capture page cache improvements URL: https://github.com/apache/arrow-rs/issues/7460 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-13 Thread via GitHub
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2087295883 ## python/pyarrow/scalar.pxi: ## @@ -1064,13 +1129,24 @@ cdef class MapScalar(ListScalar): def __getitem__(self, i): """ -Return the value at

Re: [PR] GH-45750: [C++][Python][Parquet] Implement Content-Defined Chunking for the Parquet writer [arrow]

2025-05-13 Thread via GitHub
github-actions[bot] commented on PR #45360: URL: https://github.com/apache/arrow/pull/45360#issuecomment-2876958945 Revision: 1cc2e4b690e467f548dc4131f9b392a864a132aa Submitted crossbow builds: [ursacomputing/crossbow @ actions-81dc2a98bf](https://github.com/ursacomputing/crossbow/bra

Re: [PR] GH-46224: [C++][Acero] Fix the hang in asof join [arrow]

2025-05-13 Thread via GitHub
zanmato1984 commented on PR #46300: URL: https://github.com/apache/arrow/pull/46300#issuecomment-2877317594 Thanks @pitrou for reviewing. Is there anything else to address? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] GH-45229: [Python] skip scipy.sparse roundtrip tests for float16 [arrow]

2025-05-13 Thread via GitHub
TheNeuralBit commented on PR #46413: URL: https://github.com/apache/arrow/pull/46413#issuecomment-2877282753 > A bit of a nit: how about instead of skipping `float16` we rather introduce a new type list? Approximately so: > > ```python > # Scipy does not support float16 > scipy_

Re: [PR] GH-46376: [Docs] Replace Xitter link with BlueSky link [arrow]

2025-05-13 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #46402: URL: https://github.com/apache/arrow/pull/46402#issuecomment-2877282565 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit ce99d0036aa7a61814a100298e4dec0dd36d0b0c. There were no

Re: [PR] Add `arrow_reader_clickbench` benchmark [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on PR #7470: URL: https://github.com/apache/arrow-rs/pull/7470#issuecomment-2876930254 let's gogogogogo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] [C++][Python][Parquet] Support Content-Defined Chunking of Parquet files [arrow]

2025-05-13 Thread via GitHub
pitrou commented on issue #45750: URL: https://github.com/apache/arrow/issues/45750#issuecomment-2877174547 Issue resolved by pull request 45360 https://github.com/apache/arrow/pull/45360 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] GH-46420: [C++][Dataset] Fix DatasetWriter deadlock on writting batch greater than max_rows_queued [arrow]

2025-05-13 Thread via GitHub
pitrou commented on PR #46139: URL: https://github.com/apache/arrow/pull/46139#issuecomment-2876961869 @github-actions crossbow submit -g cpp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] GH-35166: [C++] Increase precision of decimals in aggregate functions [arrow]

2025-05-13 Thread via GitHub
khwilson commented on PR #44184: URL: https://github.com/apache/arrow/pull/44184#issuecomment-2877184646 > Ideally, the scale should be "floating" just as in floating-point arithmetic, depending on the current running sum (the running sum can be very large if all data is positive, or very s

Re: [PR] GH-45229: [Python] Migrate from scipy.spmatrix to scipy.sparray [arrow]

2025-05-13 Thread via GitHub
rok commented on PR #46423: URL: https://github.com/apache/arrow/pull/46423#issuecomment-2877161909 @TheNeuralBit does this look ok to you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] GH-45750: [C++][Python][Parquet] Implement Content-Defined Chunking for the Parquet writer [arrow]

2025-05-13 Thread via GitHub
kszucs commented on PR #45360: URL: https://github.com/apache/arrow/pull/45360#issuecomment-2877189444 Thanks @pitrou @wgtmac @kou @mapleFU for the reviews! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] [Swift] Publish Arrow Swift to `Swift Package Index` [arrow]

2025-05-13 Thread via GitHub
abandy commented on issue #46382: URL: https://github.com/apache/arrow/issues/46382#issuecomment-2877175928 > [@abandy](https://github.com/abandy) I'm not familiar with Swift package. Do we need a separated repository for each Swift package? For example, do we need apache/arrow-swift for [

Re: [PR] GH-45750: [C++][Python][Parquet] Implement Content-Defined Chunking for the Parquet writer [arrow]

2025-05-13 Thread via GitHub
pitrou merged PR #45360: URL: https://github.com/apache/arrow/pull/45360 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [I] Files containing binary data with >=8_388_855 bytes per row written with `arrow-rs` can't be read with `pyarrow` [arrow-rs]

2025-05-13 Thread via GitHub
jonded94 commented on issue #7489: URL: https://github.com/apache/arrow-rs/issues/7489#issuecomment-2877156966 @alamb I tried setting `statistics_truncate_length` as well as `column_index_truncate_length`, but for some reason, this didn't enable `pyarrow` to read the new file. I used

[I] `arrow-55.1.0` breaks `filter_record_batch` [arrow-rs]

2025-05-13 Thread via GitHub
ion-elgreco opened a new issue, #7500: URL: https://github.com/apache/arrow-rs/issues/7500 **Describe the bug** With latest minor release the `filter_record_batch` function stopped working: https://github.com/delta-io/delta-rs/actions/runs/14999489487/job/42142313983?pr=3426#step:4:2506

Re: [PR] MINOR: Bump version to 19.0.0-SNAPSHOT [arrow-java]

2025-05-13 Thread via GitHub
wgtmac merged PR #754: URL: https://github.com/apache/arrow-java/pull/754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apach

Re: [PR] MINOR: add missing SOURCE_DIR in dev/release/release.sh [arrow-java]

2025-05-13 Thread via GitHub
github-actions[bot] commented on PR #755: URL: https://github.com/apache/arrow-java/pull/755#issuecomment-2876772854 Thank you for opening a pull request! Please label the PR with one or more of: - bug-fix - chore - dependencies - documentation - enhancement

Re: [I] [C++][R]: gcc-UBSAN errors on CRAN [arrow]

2025-05-13 Thread via GitHub
assignUser commented on issue #46394: URL: https://github.com/apache/arrow/issues/46394#issuecomment-2877034283 Issue resolved by pull request 46397 https://github.com/apache/arrow/pull/46397 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] GH-46394: [C++][R] gcc-UBSAN errors on CRAN [arrow]

2025-05-13 Thread via GitHub
assignUser merged PR #46397: URL: https://github.com/apache/arrow/pull/46397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.ap

Re: [PR] GH-46349: [Python] Move parquet definitions to pyarrow/includes/libparquet.pxd [arrow]

2025-05-13 Thread via GitHub
raulcd commented on code in PR #46426: URL: https://github.com/apache/arrow/pull/46426#discussion_r2087128990 ## python/pyarrow/parquet/core.py: ## @@ -1020,7 +1019,7 @@ def __init__(self, where, schema, filesystem=None, sink = where self._metadata_collecto

Re: [PR] Add `arrow_reader_clickbench` benchmark [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on PR #7470: URL: https://github.com/apache/arrow-rs/pull/7470#issuecomment-2877006735 > Sorry @alamb, I intended to review, but between vacation and pestilence contracted while on vacation 😷 I haven't had the bandwidth 😢. Hopefully I'll be back on line this week 🤞 No

Re: [PR] Add `arrow_reader_clickbench` benchmark [arrow-rs]

2025-05-13 Thread via GitHub
etseidl commented on PR #7470: URL: https://github.com/apache/arrow-rs/pull/7470#issuecomment-2876982184 Sorry @alamb, I intended to review, but between vacation and pestilence contracted while on vacation 😷 I haven't had the bandwidth 😢. Hopefully I'll be back on line this week 🤞 -- T

Re: [PR] GH-46420: [C++][Dataset] Fix DatasetWriter deadlock on writting batch greater than max_rows_queued [arrow]

2025-05-13 Thread via GitHub
github-actions[bot] commented on PR #46139: URL: https://github.com/apache/arrow/pull/46139#issuecomment-2876971862 Revision: aaeefd893cacf927741f2c78446e7e3e4786e3a6 Submitted crossbow builds: [ursacomputing/crossbow @ actions-be1ed553c5](https://github.com/ursacomputing/crossbow/bra

Re: [PR] GH-46304: [Release][Packaging] Use optimized debug build for .deb [arrow]

2025-05-13 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #46392: URL: https://github.com/apache/arrow/pull/46392#issuecomment-2876901914 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit e7ccca6917be3bf8be9c25e35bd6b2459e69be0b. There were no

Re: [PR] GH-46420: [C++][Dataset] Fix DatasetWriter deadlock on writting batch greater than max_rows_queued [arrow]

2025-05-13 Thread via GitHub
pitrou commented on code in PR #46139: URL: https://github.com/apache/arrow/pull/46139#discussion_r2087094721 ## cpp/src/arrow/dataset/dataset_writer_test.cc: ## @@ -231,6 +232,7 @@ class DatasetWriterTestFixture : public testing::Test { util::AsyncTaskScheduler* scheduler_;

Re: [PR] feat(csharp/src/Drivers/Apache) : Add support for Sasl transport in Hive ADBC Driver [arrow-adbc]

2025-05-13 Thread via GitHub
CurtHagenlocher commented on code in PR #2822: URL: https://github.com/apache/arrow-adbc/pull/2822#discussion_r2087017443 ## csharp/src/Drivers/Apache/Thrift/Sasl/TSaslTransport.cs: ## @@ -0,0 +1,181 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +

[I] Consider clearing extension type metadata when modifying Field::data_type [arrow-rs]

2025-05-13 Thread via GitHub
alamb opened a new issue, #7499: URL: https://github.com/apache/arrow-rs/issues/7499 I missed this when adding extension type methods, but perhaps we should check if a field has extension type information when updating the data type, and if it does, drop the extension type (be

Re: [PR] GH-45750: [C++][Python][Parquet] Implement Content-Defined Chunking for the Parquet writer [arrow]

2025-05-13 Thread via GitHub
pitrou commented on PR #45360: URL: https://github.com/apache/arrow/pull/45360#issuecomment-2876952223 @github-actions crossbow submit preview-docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Add `arrow_reader_clickbench` benchmark [arrow-rs]

2025-05-13 Thread via GitHub
alamb merged PR #7470: URL: https://github.com/apache/arrow-rs/pull/7470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] Improve Field docs, add missing `Field::set_*` methods [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on code in PR #7497: URL: https://github.com/apache/arrow-rs/pull/7497#discussion_r2087070968 ## arrow-schema/src/field.rs: ## @@ -351,7 +384,7 @@ impl Field { /// assert_eq!(field.data_type(), &DataType::Utf8); /// ``` pub fn with_data_type(mut se

Re: [PR] GH-46420: [C++][Dataset] Fix DatasetWriter deadlock on writting batch greater than max_rows_queued [arrow]

2025-05-13 Thread via GitHub
gitmodimo commented on code in PR #46139: URL: https://github.com/apache/arrow/pull/46139#discussion_r2087065754 ## cpp/src/arrow/dataset/dataset_writer_test.cc: ## @@ -231,6 +232,7 @@ class DatasetWriterTestFixture : public testing::Test { util::AsyncTaskScheduler* scheduler

Re: [PR] Improve Field docs, add missing `Field::set_*` methods [arrow-rs]

2025-05-13 Thread via GitHub
mbrobbel commented on code in PR #7497: URL: https://github.com/apache/arrow-rs/pull/7497#discussion_r2087061286 ## arrow-schema/src/field.rs: ## @@ -351,7 +384,7 @@ impl Field { /// assert_eq!(field.data_type(), &DataType::Utf8); /// ``` pub fn with_data_type(mut

Re: [PR] GH-45750: [C++][Python][Parquet] Implement Content-Defined Chunking for the Parquet writer [arrow]

2025-05-13 Thread via GitHub
kszucs commented on PR #45360: URL: https://github.com/apache/arrow/pull/45360#issuecomment-2876873324 I collected the possible follow-ups: - [C++][Parquet] Support CDC in `WriteBatch` and `WriteBatchSpaced` https://github.com/apache/arrow/pull/45360#discussion_r1978502781 - [C++][Parq

Re: [I] Arithmetic kernels can be safer and faster [arrow-rs]

2025-05-13 Thread via GitHub
Dandandan closed issue #7494: Arithmetic kernels can be safer and faster URL: https://github.com/apache/arrow-rs/issues/7494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat(csharp/src/Drivers/Apache) : Add support for Sasl transport in Hive ADBC Driver [arrow-adbc]

2025-05-13 Thread via GitHub
CurtHagenlocher commented on code in PR #2822: URL: https://github.com/apache/arrow-adbc/pull/2822#discussion_r2087012399 ## csharp/src/Drivers/Apache/Thrift/Sasl/TSaslTransport.cs: ## @@ -0,0 +1,181 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +

Re: [PR] feat(csharp/src/Drivers/Apache) : Add support for Sasl transport in Hive ADBC Driver [arrow-adbc]

2025-05-13 Thread via GitHub
CurtHagenlocher commented on PR #2822: URL: https://github.com/apache/arrow-adbc/pull/2822#issuecomment-2876870641 The linter is also complaining about some trailing whitespace and some mixed (LF vs CRLF) line endings. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Improve Field docs, add missing `Field::set_*` methods [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on code in PR #7497: URL: https://github.com/apache/arrow-rs/pull/7497#discussion_r2087032070 ## arrow-schema/src/field.rs: ## @@ -351,7 +384,7 @@ impl Field { /// assert_eq!(field.data_type(), &DataType::Utf8); /// ``` pub fn with_data_type(mut se

Re: [PR] Speed up arithmetic kernels, reduce `unsafe` usage [arrow-rs]

2025-05-13 Thread via GitHub
Dandandan merged PR #7493: URL: https://github.com/apache/arrow-rs/pull/7493 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.ap

Re: [PR] Speed up arithmetic kernels, reduce `unsafe` usage [arrow-rs]

2025-05-13 Thread via GitHub
Dandandan commented on PR #7493: URL: https://github.com/apache/arrow-rs/pull/7493#issuecomment-2876842219 > > 🤖: Benchmark completed > > Looks like an across the board win to me ❤️ Yeah somehow this also is a larger win on my machine, but every win is a win! -- This is an au

Re: [PR] GH-45750: [C++][Python][Parquet] Implement Content-Defined Chunking for the Parquet writer [arrow]

2025-05-13 Thread via GitHub
github-actions[bot] commented on PR #45360: URL: https://github.com/apache/arrow/pull/45360#issuecomment-2876833860 Revision: 1cc2e4b690e467f548dc4131f9b392a864a132aa Submitted crossbow builds: [ursacomputing/crossbow @ actions-6a43d39b56](https://github.com/ursacomputing/crossbow/bra

Re: [PR] GH-45750: [C++][Python][Parquet] Implement Content-Defined Chunking for the Parquet writer [arrow]

2025-05-13 Thread via GitHub
kszucs commented on PR #45360: URL: https://github.com/apache/arrow/pull/45360#issuecomment-2876826732 @github-actions crossbow submit test-conda-cpp-valgrind -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] GH-45750: [C++][Python][Parquet] Implement Content-Defined Chunking for the Parquet writer [arrow]

2025-05-13 Thread via GitHub
kszucs commented on code in PR #45360: URL: https://github.com/apache/arrow/pull/45360#discussion_r2087012053 ## cpp/src/parquet/chunker_internal.cc: ## @@ -0,0 +1,413 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] GH-45750: [C++][Python][Parquet] Implement Content-Defined Chunking for the Parquet writer [arrow]

2025-05-13 Thread via GitHub
kszucs commented on code in PR #45360: URL: https://github.com/apache/arrow/pull/45360#discussion_r2086963427 ## cpp/src/parquet/properties.h: ## @@ -245,6 +245,34 @@ class PARQUET_EXPORT ColumnProperties { bool page_index_enabled_; }; +// EXPERIMENTAL: Options for content

Re: [PR] GH-46420: [C++][Dataset] Fix DatasetWriter deadlock on writting batch greater than max_rows_queued [arrow]

2025-05-13 Thread via GitHub
pitrou commented on code in PR #46139: URL: https://github.com/apache/arrow/pull/46139#discussion_r2086999005 ## cpp/src/arrow/dataset/dataset_writer_test.cc: ## @@ -231,6 +232,7 @@ class DatasetWriterTestFixture : public testing::Test { util::AsyncTaskScheduler* scheduler_;

Re: [PR] Prevent FlightSQL server panics for `do_put` when stream is empty or 1st stream element is an Err [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on code in PR #7492: URL: https://github.com/apache/arrow-rs/pull/7492#discussion_r2086946791 ## arrow-flight/src/sql/server.rs: ## @@ -710,10 +710,21 @@ where // we wrap this stream in a `Peekable` one, which allows us to peek at // the first m

Re: [PR] Improve Field docs, add missing `Field::set_*` methods [arrow-rs]

2025-05-13 Thread via GitHub
mbrobbel commented on code in PR #7497: URL: https://github.com/apache/arrow-rs/pull/7497#discussion_r2086981702 ## arrow-schema/src/field.rs: ## @@ -351,7 +384,7 @@ impl Field { /// assert_eq!(field.data_type(), &DataType::Utf8); /// ``` pub fn with_data_type(mut

Re: [PR] GH-46349: [Python] Move parquet definitions to pyarrow/includes/libparquet.pxd [arrow]

2025-05-13 Thread via GitHub
pitrou commented on code in PR #46426: URL: https://github.com/apache/arrow/pull/46426#discussion_r2086959221 ## python/pyarrow/parquet/core.py: ## @@ -1020,7 +1019,7 @@ def __init__(self, where, schema, filesystem=None, sink = where self._metadata_collecto

[PR] MINOR: add missing SOURCE_DIR in dev/release/release.sh [arrow-java]

2025-05-13 Thread via GitHub
wgtmac opened a new pull request, #755: URL: https://github.com/apache/arrow-java/pull/755 ## What's Changed `dev/release/release.sh` requires `SOURCE_DIR` to locate `.env` but it is missing. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] GH-45750: [C++][Python][Parquet] Implement Content-Defined Chunking for the Parquet writer [arrow]

2025-05-13 Thread via GitHub
kszucs commented on code in PR #45360: URL: https://github.com/apache/arrow/pull/45360#discussion_r2086966277 ## cpp/src/parquet/chunker_internal_test.cc: ## @@ -0,0 +1,1689 @@ +// Licensed to the Apache Software Foundation (ASF) under one Review Comment: Updated, now runnin

Re: [PR] GH-46349: [Python] Move parquet definitions to pyarrow/includes/libparquet.pxd [arrow]

2025-05-13 Thread via GitHub
pitrou commented on code in PR #46426: URL: https://github.com/apache/arrow/pull/46426#discussion_r2086937648 ## python/pyarrow/parquet/core.py: ## @@ -1020,7 +1019,7 @@ def __init__(self, where, schema, filesystem=None, sink = where self._metadata_collecto

Re: [PR] Poc for adaptive parquet predicate pushdown(bitmap/range) with page cache(3 data pages) [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on PR #7454: URL: https://github.com/apache/arrow-rs/pull/7454#issuecomment-2876744816 Thanks @zhuqi-lucas -- I have some ideas I will try out later today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Speed up arithmetic kernels, reduce `unsafe` usage [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on PR #7493: URL: https://github.com/apache/arrow-rs/pull/7493#issuecomment-2876741762 > 🤖: Benchmark completed Looks like an across the board win to me ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] GH-46349: [Python] Move parquet definitions to pyarrow/includes/libparquet.pxd [arrow]

2025-05-13 Thread via GitHub
pitrou commented on code in PR #46426: URL: https://github.com/apache/arrow/pull/46426#discussion_r2086956613 ## python/pyarrow/parquet/core.py: ## @@ -1020,7 +1019,7 @@ def __init__(self, where, schema, filesystem=None, sink = where self._metadata_collecto

[I] Rename `flight-sql-experimental` to `flight-sql` [arrow-rs]

2025-05-13 Thread via GitHub
alamb opened a new issue, #7498: URL: https://github.com/apache/arrow-rs/issues/7498 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** The arrow-flight crate has a feature flag named `flight-sql-experimental`: https://github.com/

Re: [PR] MINOR: Bump version to 19.0.0-SNAPSHOT [arrow-java]

2025-05-13 Thread via GitHub
wgtmac commented on PR #754: URL: https://github.com/apache/arrow-java/pull/754#issuecomment-2876676104 Yes, I was hesitant to choose from `18.4.0-SNAPSHOT` and `19.0.0-SNAPSHOT`. It can always be decided at the release time depending on the actual commits. -- This is an automated message

Re: [PR] GH-46417: [C++][Parquet] Fix UB in LoadEnumSafe for EdgeInterpolationAlgorithm [arrow]

2025-05-13 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #46418: URL: https://github.com/apache/arrow/pull/46418#issuecomment-2876679182 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 10f2c9c575486c24cd5bb4a6df780fae9debc2f8. There were no

Re: [PR] GH-46177: [C++][Compute] Enable MemAllocation::PREALLOCATE for DenseUnion, SparseUnion, ListView, LargeListView, BinaryView, StringView [arrow]

2025-05-13 Thread via GitHub
andishgar commented on PR #46317: URL: https://github.com/apache/arrow/pull/46317#issuecomment-2876654627 @pitrou Could you review my pull request? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] MINOR: Bump version to 19.0.0-SNAPSHOT [arrow-java]

2025-05-13 Thread via GitHub
lidavidm commented on PR #754: URL: https://github.com/apache/arrow-java/pull/754#issuecomment-2876623843 I guess we can always change it at release time (we can change the milestone name to just be 'Next' or something?) -- This is an automated message from the Apache Git Service. To resp

Re: [PR] MINOR: Bump version to 19.0.0-SNAPSHOT [arrow-java]

2025-05-13 Thread via GitHub
jbonofre commented on PR #754: URL: https://github.com/apache/arrow-java/pull/754#issuecomment-2876642261 I think it's ok to set 19.0.0-SNAPSHOT and bump the version to target release at release time. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] MINOR: Bump version to 19.0.0-SNAPSHOT [arrow-java]

2025-05-13 Thread via GitHub
lidavidm commented on PR #754: URL: https://github.com/apache/arrow-java/pull/754#issuecomment-2876622828 Do we actually want the next release to be 19.0.0? Or if we aren't making any breaking changes we can do 18.4.0? -- This is an automated message from the Apache Git Service. To respo

[PR] Improve Field docs, add missing `Field::set_name` [arrow-rs]

2025-05-13 Thread via GitHub
alamb opened a new pull request, #7497: URL: https://github.com/apache/arrow-rs/pull/7497 # Which issue does this PR close? Closes #. # Rationale for this change As we start to use Extension types more fully, let's make sure they are documented as much as possib

Re: [PR] GH-45750: [C++][Python][Parquet] Implement Content-Defined Chunking for the Parquet writer [arrow]

2025-05-13 Thread via GitHub
pitrou commented on code in PR #45360: URL: https://github.com/apache/arrow/pull/45360#discussion_r2086856970 ## cpp/src/parquet/chunker_internal_test.cc: ## @@ -0,0 +1,1689 @@ +// Licensed to the Apache Software Foundation (ASF) under one Review Comment: Yes, that's what ot

Re: [I] [Java] Does ADBC ​​FlightSqlConnection support Arrow Flight session management? [arrow-adbc]

2025-05-13 Thread via GitHub
lidavidm commented on issue #2821: URL: https://github.com/apache/arrow-adbc/issues/2821#issuecomment-2876551437 That would be much appreciated! And please file issues/complaints/suggestions, it helps me justify prioritizing more work on Java! -- This is an automated message from the Apa

Re: [I] rust: reevaluate `&mut self` declarations [arrow-adbc]

2025-05-13 Thread via GitHub
lidavidm commented on issue #2809: URL: https://github.com/apache/arrow-adbc/issues/2809#issuecomment-2876553353 Ok. If there's no objection here then I can close this and save this for the record. Just wanted to make sure! -- This is an automated message from the Apache Git Service. To r

Re: [PR] arrow-select: Implement concat for `RunArray`s [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on PR #7487: URL: https://github.com/apache/arrow-rs/pull/7487#issuecomment-2876521104 Thanks @brancz -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] arrow-select: Implement concat for `RunArray`s [arrow-rs]

2025-05-13 Thread via GitHub
alamb merged PR #7487: URL: https://github.com/apache/arrow-rs/pull/7487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] Speedup `filter_bytes` ~-20-40%, `filter_native` low selectivity (~-37%) [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on PR #7463: URL: https://github.com/apache/arrow-rs/pull/7463#issuecomment-2876522669 Thanks @Dandandan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] GH-46395: [C++][Statistics]: Correct the Equal method for min and max in arrow::ArrayStatistics [arrow]

2025-05-13 Thread via GitHub
andishgar commented on PR #46422: URL: https://github.com/apache/arrow/pull/46422#issuecomment-2876506761 @kou what is your view on `arrow::ArrayStatistics::ApproximateEquals`? If we aim for consistency with the Arrow C++ implementation, it seems this method should be part of it.

Re: [PR] Speed up arithmetic kernels, reduce `unsafe` usage [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on PR #7493: URL: https://github.com/apache/arrow-rs/pull/7493#issuecomment-2876510447 🤖: Benchmark completed Details ``` grouparithmetic_speed main -

Re: [I] (go/adbc/driver/snowflake): Large date values are not being returned correctly [arrow-adbc]

2025-05-13 Thread via GitHub
davidhcoe commented on issue #2811: URL: https://github.com/apache/arrow-adbc/issues/2811#issuecomment-2876497225 Judging by https://github.com/snowflakedb/gosnowflake/blob/02d592aff379ee8592eeb42047fb5117a8dc6cba/arrow_test.go#L421, it could come back as either seconds or nanoseconds but t

Re: [I] Docs build fails for object_store 0.12.1 [arrow-rs-object-store]

2025-05-13 Thread via GitHub
alamb commented on issue #360: URL: https://github.com/apache/arrow-rs-object-store/issues/360#issuecomment-2876455643 This looks to me like it is due to - https://github.com/apache/arrow-rs-object-store/issues/343 Thankfully I made a PR to fix it just this morning: - https://g

[I] Docs build fails for object_store 0.12.1 [arrow-rs-object-store]

2025-05-13 Thread via GitHub
alamb opened a new issue, #360: URL: https://github.com/apache/arrow-rs-object-store/issues/360 > Docs build failed as well, probably due to that failure _Originally posted by @ion-elgreco in [#357](https://github.com/apache/arrow-rs-object-store/issues/357#issuecomment-2876143954)_

Re: [I] Error running `cargo publish`: wildcard (`*`) dependency constraints are not allowed on crates.io. [arrow-rs-object-store]

2025-05-13 Thread via GitHub
alamb commented on issue #357: URL: https://github.com/apache/arrow-rs-object-store/issues/357#issuecomment-2876452632 > Docs build failed as well, probably due to that failure - Filed https://github.com/apache/arrow-rs-object-store/issues/360 to track -- This is an automated messa

Re: [PR] MINOR: Bump version to 19.0.0-SNAPSHOT [arrow-java]

2025-05-13 Thread via GitHub
github-actions[bot] commented on PR #754: URL: https://github.com/apache/arrow-java/pull/754#issuecomment-2876448966 Thank you for opening a pull request! Please label the PR with one or more of: - bug-fix - chore - dependencies - documentation - enhancement

[PR] MINOR: Bump version to 19.0.0-SNAPSHOT [arrow-java]

2025-05-13 Thread via GitHub
wgtmac opened a new pull request, #754: URL: https://github.com/apache/arrow-java/pull/754 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [PR] chore(deps): update tonic-build requirement from =0.12.3 to =0.13.1 [arrow-rs]

2025-05-13 Thread via GitHub
dependabot[bot] commented on PR #7471: URL: https://github.com/apache/arrow-rs/pull/7471#issuecomment-2876396878 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, le

Re: [I] rust: reevaluate `&mut self` declarations [arrow-adbc]

2025-05-13 Thread via GitHub
felipecrv commented on issue #2809: URL: https://github.com/apache/arrow-adbc/issues/2809#issuecomment-2876444252 > ...needs to be not mutable to fit in to some framework? The framework is Rust itself. 😄 When you have a `&mut` you can't have other `&`s. The `adbc_core` objects

Re: [PR] Speed up arithmetic kernels, reduce `unsafe` usage [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on PR #7493: URL: https://github.com/apache/arrow-rs/pull/7493#issuecomment-2876428389 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP

Re: [PR] Speed up arithmetic kernels, reduce `unsafe` usage [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on PR #7493: URL: https://github.com/apache/arrow-rs/pull/7493#issuecomment-2876422532 Sorry @Dandandan -- I ran the wrong script -- fixing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Upgrade tonic dependencies to 0.13.0 version [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on code in PR #7377: URL: https://github.com/apache/arrow-rs/pull/7377#discussion_r2086764585 ## arrow-flight/Cargo.toml: ## @@ -73,6 +74,7 @@ http = "1.1.0" http-body = "1.0.0" hyper-util = "0.1" pin-project-lite = "0.2" +rustls = { version = "0.23", default-

Re: [I] Release arrow-rs / parquet Minor version `55.1.0` (May 2025) [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on issue #7393: URL: https://github.com/apache/arrow-rs/issues/7393#issuecomment-2876401697 The release has been approved and published to crates.io: https://crates.io/crates/arrow/55.1.0 Thanks to @paleolimbot and @viirya for PMC votes to help make it happen 🙏 --

Re: [I] Release arrow-rs / parquet Minor version `55.1.0` (May 2025) [arrow-rs]

2025-05-13 Thread via GitHub
alamb closed issue #7393: Release arrow-rs / parquet Minor version `55.1.0` (May 2025) URL: https://github.com/apache/arrow-rs/issues/7393 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] chore(deps): update tonic requirement from 0.12.3 to 0.13.1 [arrow-rs]

2025-05-13 Thread via GitHub
dependabot[bot] commented on PR #7472: URL: https://github.com/apache/arrow-rs/pull/7472#issuecomment-2876397416 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, le

Re: [PR] chore(deps): update tonic requirement from 0.12.3 to 0.13.1 [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on PR #7472: URL: https://github.com/apache/arrow-rs/pull/7472#issuecomment-2876397263 - Dupe of https://github.com/apache/arrow-rs/pull/7377 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] arrow-flight: update tonic [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on PR #7495: URL: https://github.com/apache/arrow-rs/pull/7495#issuecomment-2876393585 - I think this is a dupe of https://github.com/apache/arrow-rs/pull/7377 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] chore(deps): update tonic-build requirement from =0.12.3 to =0.13.1 [arrow-rs]

2025-05-13 Thread via GitHub
alamb closed pull request #7471: chore(deps): update tonic-build requirement from =0.12.3 to =0.13.1 URL: https://github.com/apache/arrow-rs/pull/7471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] chore(deps): update tonic requirement from 0.12.3 to 0.13.1 [arrow-rs]

2025-05-13 Thread via GitHub
alamb closed pull request #7472: chore(deps): update tonic requirement from 0.12.3 to 0.13.1 URL: https://github.com/apache/arrow-rs/pull/7472 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] chore(deps): update tonic-build requirement from =0.12.3 to =0.13.1 [arrow-rs]

2025-05-13 Thread via GitHub
alamb commented on PR #7471: URL: https://github.com/apache/arrow-rs/pull/7471#issuecomment-2876396732 - Dupe of https://github.com/apache/arrow-rs/pull/7377 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] GH-46349: [Python] Move parquet definitions to pyarrow/includes/libparquet.pxd [arrow]

2025-05-13 Thread via GitHub
raulcd commented on code in PR #46426: URL: https://github.com/apache/arrow/pull/46426#discussion_r2086745327 ## python/pyarrow/parquet/core.py: ## @@ -1020,7 +1019,7 @@ def __init__(self, where, schema, filesystem=None, sink = where self._metadata_collecto

Re: [PR] GH-45750: [C++][Python][Parquet] Implement Content-Defined Chunking for the Parquet writer [arrow]

2025-05-13 Thread via GitHub
kszucs commented on code in PR #45360: URL: https://github.com/apache/arrow/pull/45360#discussion_r2086711091 ## cpp/src/parquet/chunker_internal.cc: ## @@ -0,0 +1,413 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] GH-46411: [C++] Add dataset option to Meson configuration [arrow]

2025-05-13 Thread via GitHub
github-actions[bot] commented on PR #46412: URL: https://github.com/apache/arrow/pull/46412#issuecomment-2876368912 Revision: 0f92840343a51d3ba965a2e830174b9ffc5abee9 Submitted crossbow builds: [ursacomputing/crossbow @ actions-281b908ab3](https://github.com/ursacomputing/crossbow/bra

Re: [PR] Speed up arithmetic kernels, reduce `unsafe` usage [arrow-rs]

2025-05-13 Thread via GitHub
Dandandan commented on PR #7493: URL: https://github.com/apache/arrow-rs/pull/7493#issuecomment-2876358469 > 🤖: Benchmark completed > > Details I don't think those kernels use this code path 🤔 -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] GH-46411: [C++] Add dataset option to Meson configuration [arrow]

2025-05-13 Thread via GitHub
WillAyd commented on PR #46412: URL: https://github.com/apache/arrow/pull/46412#issuecomment-2876357921 @github-actions crossbow submit *meson -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Update integration test to avoid long format strings [arrow-rs-object-store]

2025-05-13 Thread via GitHub
alamb commented on code in PR #359: URL: https://github.com/apache/arrow-rs-object-store/pull/359#discussion_r2086725116 ## src/integration.rs: ## @@ -1119,48 +1117,64 @@ pub async fn multipart_race_condition(storage: &dyn ObjectStore, last_writer_win let mut multipart_up

Re: [PR] Fix `cargo publish` by specifying version for wasm-bindgen-test [arrow-rs-object-store]

2025-05-13 Thread via GitHub
alamb commented on PR #358: URL: https://github.com/apache/arrow-rs-object-store/pull/358#issuecomment-2876349353 > I thought we did `cargo publish --dry-run` when testing a tarball, guessing that doesn't catch this? 🤔 it seems that we don't (or no longer do): https://github

[PR] Update integration test to avoid long format strings [arrow-rs-object-store]

2025-05-13 Thread via GitHub
alamb opened a new pull request, #359: URL: https://github.com/apache/arrow-rs-object-store/pull/359 # Which issue does this PR close? - Closes https://github.com/apache/arrow-rs-object-store/issues/343 # Rationale for this change The new rust release will limit

Re: [I] [Python][Parquet] Add EncryptionConfiguration.uniform_encryption to Python implementation [arrow]

2025-05-13 Thread via GitHub
pitrou commented on issue #38914: URL: https://github.com/apache/arrow/issues/38914#issuecomment-2876340239 Issue resolved by pull request 46347 https://github.com/apache/arrow/pull/46347 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] GH-38914: [Python] Add EncryptionConfiguration.uniform_encryption [arrow]

2025-05-13 Thread via GitHub
pitrou merged PR #46347: URL: https://github.com/apache/arrow/pull/46347 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

Re: [PR] GH-45750: [C++][Python][Parquet] Implement Content-Defined Chunking for the Parquet writer [arrow]

2025-05-13 Thread via GitHub
kszucs commented on code in PR #45360: URL: https://github.com/apache/arrow/pull/45360#discussion_r2086711091 ## cpp/src/parquet/chunker_internal.cc: ## @@ -0,0 +1,413 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

  1   2   3   >