Re: [I] [Java] java-jars fails with "We have duplicated artifacts attached" [arrow]

2024-05-01 Thread via GitHub
jbonofre commented on issue #41490: URL: https://github.com/apache/arrow/issues/41490#issuecomment-2089581910 That's due to the source maven plugin update coming with new Apache POM. On some profile, the source and jar maven plugins are executed by the parent Apache POM and by the project

Re: [I] [Format] Passing column statistics through Arrow C data interface [arrow]

2024-05-01 Thread via GitHub
lidavidm commented on issue #38837: URL: https://github.com/apache/arrow/issues/38837#issuecomment-2089533055 We talked about this a little, but what about approach (2) from [Kou's comment above](https://github.com/apache/arrow/issues/38837#issuecomment-2088101530), but for now only

Re: [I] [Java] java-jars fails with "We have duplicated artifacts attached" [arrow]

2024-05-01 Thread via GitHub
lidavidm commented on issue #41490: URL: https://github.com/apache/arrow/issues/41490#issuecomment-2089524024 Do you think you could bisect it? Or at least figure out which commits are possible causes in that time range? -- This is an automated message from the Apache Git Service. To

Re: [PR] docs: add sizing explanation to bloom filter docs in parquet [arrow-rs]

2024-05-01 Thread via GitHub
hiltontj commented on code in PR #5705: URL: https://github.com/apache/arrow-rs/pull/5705#discussion_r1587027649 ## parquet/src/bloom_filter/mod.rs: ## @@ -16,7 +16,61 @@ // under the License. //! Bloom filter implementation specific to Parquet, as described -//! in the

Re: [PR] docs: add sizing explanation to bloom filter docs in parquet [arrow-rs]

2024-05-01 Thread via GitHub
hiltontj commented on code in PR #5705: URL: https://github.com/apache/arrow-rs/pull/5705#discussion_r1587027143 ## parquet/src/bloom_filter/mod.rs: ## @@ -16,7 +16,61 @@ // under the License. //! Bloom filter implementation specific to Parquet, as described -//! in the

Re: [PR] MINOR: [JS] Bump rollup from 4.14.3 to 4.17.2 in /js [arrow]

2024-05-01 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #41484: URL: https://github.com/apache/arrow/pull/41484#issuecomment-2089455881 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 9ce7ab10fbb3937cdcb4800a791c06591523240b. There were

Re: [PR] feat(r): Add async infrastructure for specific methods [arrow-adbc]

2024-05-01 Thread via GitHub
krlmlr commented on code in PR #985: URL: https://github.com/apache/arrow-adbc/pull/985#discussion_r1587008993 ## r/adbcdrivermanager/src/async.cc: ## @@ -0,0 +1,262 @@ + +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] GH-41491: [Python] remove special methods related to buffers in python <2.6 [arrow]

2024-05-01 Thread via GitHub
github-actions[bot] commented on PR #41492: URL: https://github.com/apache/arrow/pull/41492#issuecomment-2089412173 :warning: GitHub issue #41491 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the

[PR] GH-41491: [Python] remove special methods related to buffers in python <2.6 [arrow]

2024-05-01 Thread via GitHub
tacaswell opened a new pull request, #41492: URL: https://github.com/apache/arrow/pull/41492 ### Rationale for this change These methods are not actually used and will be removed from Cython in an upcoming release. Closes #41491 ### What changes are

Re: [I] [Java] java-jars fails with "We have duplicated artifacts attached" [arrow]

2024-05-01 Thread via GitHub
vibhatha commented on issue #41490: URL: https://github.com/apache/arrow/issues/41490#issuecomment-2089375120 I know this doesn't help, but at least it should have happened sometime after 8 days ago. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] [Java] java-jars fails with "We have duplicated artifacts attached" [arrow]

2024-05-01 Thread via GitHub
vibhatha commented on issue #41490: URL: https://github.com/apache/arrow/issues/41490#issuecomment-2089363822 Checking my recent crossbow usage 8 days ago: passes https://github.com/apache/arrow/pull/40340#issuecomment-2074733574 6 days ago: fails

Re: [I] [Java] java-jars fails with "We have duplicated artifacts attached" [arrow]

2024-05-01 Thread via GitHub
vibhatha commented on issue #41490: URL: https://github.com/apache/arrow/issues/41490#issuecomment-2089361269 Would it be the recent maven build changes, my best guess would be the last build PR, but I can't be sure. Though I didn't notice it before that. -- This is an automated message

[PR] fix(python): Make shallow CArray copies less shallow to accomodate moving children [arrow-nanoarrow]

2024-05-01 Thread via GitHub
paleolimbot opened a new pull request, #451: URL: https://github.com/apache/arrow-nanoarrow/pull/451 This PR updates the logic that creates a "shallow copy" of an `ArrowArray`. Before, it simply made a shallow copy of the outer array, which works in most cases. However, the spec allows for

Re: [PR] MINOR: [JS] Bump memfs from 4.8.2 to 4.9.2 in /js [arrow]

2024-05-01 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #41482: URL: https://github.com/apache/arrow/pull/41482#issuecomment-2089319384 After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit cc78c7a9bf17ceba7d538b30ddda008daeb1db85. There were

Re: [I] [Java] java-jars fails with "We have duplicated artifacts attached" [arrow]

2024-05-01 Thread via GitHub
lidavidm commented on issue #41490: URL: https://github.com/apache/arrow/issues/41490#issuecomment-2089317171 @vibhatha or @jbonofre, any idea? It's unclear when exactly this popped up since the test was blocked by #41470 until just now -- This is an automated message from the Apache Git

Re: [I] [C++] Spurious duplicate registration of file:// factory [arrow]

2024-05-01 Thread via GitHub
lidavidm commented on issue #41470: URL: https://github.com/apache/arrow/issues/41470#issuecomment-2089307906 Issue resolved by pull request 41466 https://github.com/apache/arrow/pull/41466 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] GH-41470: [C++] Reuse deduplication logic for direct registration [arrow]

2024-05-01 Thread via GitHub
lidavidm merged PR #41466: URL: https://github.com/apache/arrow/pull/41466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] GH-41470: [C++] Reuse deduplication logic for direct registration [arrow]

2024-05-01 Thread via GitHub
lidavidm commented on PR #41466: URL: https://github.com/apache/arrow/pull/41466#issuecomment-2089282218 We can merge this regardless and @vibhatha can file a new ticket. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] GH-40494: [Go] add support for protobuf messages [arrow]

2024-05-01 Thread via GitHub
tscottcoombes1 commented on PR #40496: URL: https://github.com/apache/arrow/pull/40496#issuecomment-2089238144 @zeroshade any ideas on what I need to change to fix these cross machine issues? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] GH-40494: [Go] add support for protobuf messages [arrow]

2024-05-01 Thread via GitHub
tscottcoombes1 commented on code in PR #40496: URL: https://github.com/apache/arrow/pull/40496#discussion_r1586886704 ## go/arrow/util/protobuf_reflect_test.go: ## @@ -0,0 +1,180 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] GH-41334: [C++][Acero] Add env var to tune the size of the temp stack [arrow]

2024-05-01 Thread via GitHub
westonpace commented on PR #41335: URL: https://github.com/apache/arrow/pull/41335#issuecomment-2089189076 > Yes, this is my assumption. And the overhead is actually 64MB per plan * per thread. Ah, good point, 64MB per thread per plan is too much. > Per-node (meanwhile

Re: [PR] GH-40361: [C++] Make flatbuffers serialization more deterministic [arrow]

2024-05-01 Thread via GitHub
github-actions[bot] commented on PR #40392: URL: https://github.com/apache/arrow/pull/40392#issuecomment-2089179333 Revision: 4a81743474b50fdfbb9df50724f8a921059f0252 Submitted crossbow builds: [ursacomputing/crossbow @

Re: [PR] GH-40361: [C++] Make flatbuffers serialization more deterministic [arrow]

2024-05-01 Thread via GitHub
amoeba commented on PR #40392: URL: https://github.com/apache/arrow/pull/40392#issuecomment-2089177523 I'll let CI and the crossbow jobs run then merge if all looks good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] GH-40361: [C++] Make flatbuffers serialization more deterministic [arrow]

2024-05-01 Thread via GitHub
amoeba commented on PR #40392: URL: https://github.com/apache/arrow/pull/40392#issuecomment-2089176534 @github-actions crossbow submit -g cpp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] GH-40361: [C++] Make flatbuffers serialization more deterministic [arrow]

2024-05-01 Thread via GitHub
amoeba commented on code in PR #40392: URL: https://github.com/apache/arrow/pull/40392#discussion_r1586855179 ## cpp/src/arrow/ipc/message_internal_test.cc: ## @@ -0,0 +1,82 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] GH-40361: [C++] Make flatbuffers serialization more deterministic [arrow]

2024-05-01 Thread via GitHub
kou commented on code in PR #40392: URL: https://github.com/apache/arrow/pull/40392#discussion_r1586850006 ## cpp/src/arrow/ipc/message_internal_test.cc: ## @@ -0,0 +1,82 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] GH-40361: [C++] Make flatbuffers serialization more deterministic [arrow]

2024-05-01 Thread via GitHub
amoeba commented on PR #40392: URL: https://github.com/apache/arrow/pull/40392#issuecomment-2089162900 Thanks for taking a look @kou, I accepted all your changes. I feel pretty good about the state of this PR at this point and am not sure we need another review. Let me know what you think.

Re: [I] CSV reader cannot parse dates [arrow]

2024-05-01 Thread via GitHub
davlee1972 commented on issue #41488: URL: https://github.com/apache/arrow/issues/41488#issuecomment-2089162243 time32[x] types also cannot be parsed.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] GH-40361: [C++] Make flatbuffers serialization more deterministic [arrow]

2024-05-01 Thread via GitHub
amoeba commented on code in PR #40392: URL: https://github.com/apache/arrow/pull/40392#discussion_r1586844048 ## cpp/src/arrow/ipc/message_internal_test.cc: ## @@ -0,0 +1,89 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [I] Install instructions for R package fail to install parquet format functionality [arrow]

2024-05-01 Thread via GitHub
gaborcsardi commented on issue #41265: URL: https://github.com/apache/arrow/issues/41265#issuecomment-2089155215 To help people find this issue easier, this is the error you get when writing a parquet file with the CRAN binary build for 15.0.1 on macOS: ``` Error in

Re: [I] [Python] Building PyArrow: enable/disable python components by default based on availability in Arrow C++ [arrow]

2024-05-01 Thread via GitHub
kou commented on issue #41480: URL: https://github.com/apache/arrow/issues/41480#issuecomment-2089148878 I think that we can remove the logic from `python/setup.py`. The logic was borrowed from dynd-python:

Re: [PR] MINOR: [JS] Bump rollup from 4.14.3 to 4.17.2 in /js [arrow]

2024-05-01 Thread via GitHub
kou merged PR #41484: URL: https://github.com/apache/arrow/pull/41484 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] MINOR: [JS] Bump web-streams-polyfill from 3.2.1 to 4.0.0 in /js [arrow]

2024-05-01 Thread via GitHub
kou commented on PR #41483: URL: https://github.com/apache/arrow/pull/41483#issuecomment-2089126995 It seems that we need to change our code to use 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] MINOR: [JS] Bump memfs from 4.8.2 to 4.9.2 in /js [arrow]

2024-05-01 Thread via GitHub
kou merged PR #41482: URL: https://github.com/apache/arrow/pull/41482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] [Python] Segfault in `to_pandas()` on batch from IPC stream in specific edge cases [arrow]

2024-05-01 Thread via GitHub
Tom-Newton commented on issue #41469: URL: https://github.com/apache/arrow/issues/41469#issuecomment-2089088136 I managed to attach the a debugger so I can see a bit about why its segfaulting. Ultimately the segfault is on `arrow/array/array_nested.h:90`. Suspiciously the value of

Re: [PR] feat(python): Implement bitmap unpacking [arrow-nanoarrow]

2024-05-01 Thread via GitHub
WillAyd commented on code in PR #450: URL: https://github.com/apache/arrow-nanoarrow/pull/450#discussion_r1586775118 ## python/src/nanoarrow/_lib.pyx: ## @@ -1815,6 +1815,51 @@ cdef class CBufferView: else: return self._iter_dispatch(offset, length) +

Re: [PR] feat(go/adbc/driver/snowflake): add quoted identifier ignore case option [arrow-adbc]

2024-05-01 Thread via GitHub
zeroshade commented on PR #1800: URL: https://github.com/apache/arrow-adbc/pull/1800#issuecomment-2089029617 @davlee1972 is this sufficient to handle your request? Or do we need to explicitly implement an option that turns off the quote wrapping? -- This is an automated message from the

Re: [PR] GH-40361: [C++] Make flatbuffers serialization more deterministic [arrow]

2024-05-01 Thread via GitHub
amoeba commented on code in PR #40392: URL: https://github.com/apache/arrow/pull/40392#discussion_r1586756287 ## cpp/src/arrow/ipc/metadata_internal.cc: ## @@ -477,7 +477,9 @@ static Status GetDictionaryEncoding(FBB& fbb, const std::shared_ptr& fiel static KeyValueOffset

Re: [PR] GH-40361: [C++] Make flatbuffers serialization more deterministic [arrow]

2024-05-01 Thread via GitHub
amoeba commented on code in PR #40392: URL: https://github.com/apache/arrow/pull/40392#discussion_r1586754681 ## cpp/src/arrow/ipc/metadata_internal.cc: ## @@ -477,7 +477,9 @@ static Status GetDictionaryEncoding(FBB& fbb, const std::shared_ptr& fiel static KeyValueOffset

Re: [I] [Python] Windows fatal exception: access violation [arrow]

2024-05-01 Thread via GitHub
amoeba commented on issue #40100: URL: https://github.com/apache/arrow/issues/40100#issuecomment-2089009983 Hi @dburton-influxdata, I think the next step here is still to get a debug build in your hands. I can take another shot at it in the next two weeks here and let you know how that

Re: [I] Why is pyarrow.dataset direct from S3 so much slower than using dataset locally and upload/download separately? [arrow]

2024-05-01 Thread via GitHub
theogaraj commented on issue #40758: URL: https://github.com/apache/arrow/issues/40758#issuecomment-2089003477 Closing this as I was able to figure it out. I was able to improve performance by creating a scanner and tweaking the `batch_readahead` and `batch_size` options. More info

[PR] feat(python): Implement bitmap unpacking [arrow-nanoarrow]

2024-05-01 Thread via GitHub
paleolimbot opened a new pull request, #450: URL: https://github.com/apache/arrow-nanoarrow/pull/450 In prototyping a real-world use case, I remembered that unpacking bits is exceedingly difficult to get right if you need to support an arbitrary offset/length. The math for this is very

Re: [PR] GH-41306: [C++] Check to avoid copying when NullBitmapBuffer is Null [arrow]

2024-05-01 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #41452: URL: https://github.com/apache/arrow/pull/41452#issuecomment-2088989173 After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 281122c018df86601ca675f3941751ddc3a89b3d. There were

Re: [I] CSV reader cannot parse dates [arrow]

2024-05-01 Thread via GitHub
davlee1972 commented on issue #41488: URL: https://github.com/apache/arrow/issues/41488#issuecomment-2088987303 This issue also extends to filtering.. using something like pc.field("abc") < datetime.now() won't work if you can't cast field abc to a date32[day].. But this expression filter

Re: [PR] GH-40078: [C++] Import/Export ArrowDeviceArrayStream [arrow]

2024-05-01 Thread via GitHub
zeroshade commented on PR #40807: URL: https://github.com/apache/arrow/pull/40807#issuecomment-2088957599 @jorisvandenbossche i solved the crashing python test. Essentially I inadvertently exposed a bug in `MakeArrayOfNull` when handling nested extension types :smile: -- This is an

Re: [I] [C++] Unable to read date64 or date32 in specific format from CSV [arrow]

2024-05-01 Thread via GitHub
davlee1972 commented on issue #28303: URL: https://github.com/apache/arrow/issues/28303#issuecomment-2088922037 This is still an issue.. It also extends to time32[s].. **CSV conversion error to time32[s]: invalid value '7:55:00'** Right now the timestamp_parser will only

[PR] docs: add sizing explanation to bloom filter docs in parquet [arrow-rs]

2024-05-01 Thread via GitHub
hiltontj opened a new pull request, #5705: URL: https://github.com/apache/arrow-rs/pull/5705 Added documentation detailing the sizing of bloom filters in the parquet crate. # Which issue does this PR close? There is currently no issue for this change. #

Re: [PR] GH-41470: [C++] Reuse deduplication logic for direct registration [arrow]

2024-05-01 Thread via GitHub
vibhatha commented on PR #41466: URL: https://github.com/apache/arrow/pull/41466#issuecomment-2088901863 @lidavidm this seems to be from the recent maven build update? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] MINOR: [C++][Compute] Remove processing logic for ArrayData as ExecResults in ExecScalarCaseWhen [arrow]

2024-05-01 Thread via GitHub
bkietz commented on code in PR #41380: URL: https://github.com/apache/arrow/pull/41380#discussion_r1586597414 ## cpp/src/arrow/compute/kernels/scalar_if_else.cc: ## @@ -1482,39 +1482,21 @@ Status ExecScalarCaseWhen(KernelContext* ctx, const ExecSpan& batch, ExecResult*

Re: [PR] GH-41470: [C++] Reuse deduplication logic for direct registration [arrow]

2024-05-01 Thread via GitHub
bkietz commented on PR #41466: URL: https://github.com/apache/arrow/pull/41466#issuecomment-2088861280 The error doesn't seem related to filesystem anymore: https://github.com/ursacomputing/crossbow/actions/runs/8903205653/job/24451424677#step:6:16247 I'm not sure what it means

Re: [PR] GH-41478: [C++] Clean up more redundant move warnings [arrow]

2024-05-01 Thread via GitHub
github-actions[bot] commented on PR #41487: URL: https://github.com/apache/arrow/pull/41487#issuecomment-2088839795 :warning: GitHub issue #41478 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] GH41478: [C++] Clean up more redundant move warnings [arrow]

2024-05-01 Thread via GitHub
WillAyd commented on PR #41487: URL: https://github.com/apache/arrow/pull/41487#issuecomment-2088837783 Can see this in a branch adding meson warning level 2 downstream to nanoarrow: https://github.com/apache/arrow-nanoarrow/actions/runs/8912807091/job/24476999593?pr=448 -- This

Re: [PR] GH41478: [C++] Clean up more redundant move warnings [arrow]

2024-05-01 Thread via GitHub
github-actions[bot] commented on PR #41487: URL: https://github.com/apache/arrow/pull/41487#issuecomment-2088837625 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue

[PR] GH41478: [C++] Clean up more redundant move warnings [arrow]

2024-05-01 Thread via GitHub
WillAyd opened a new pull request, #41487: URL: https://github.com/apache/arrow/pull/41487 ### Rationale for this change Minor warning cleanup for downstream libraries trying to get warning-free builds ### What changes are included in this PR?

Re: [PR] Add Meson build with Werror [arrow-nanoarrow]

2024-05-01 Thread via GitHub
WillAyd commented on code in PR #448: URL: https://github.com/apache/arrow-nanoarrow/pull/448#discussion_r158624 ## ci/scripts/build-arrow-cpp-minimal.sh: ## Review Comment: These changes are not permanent - just setting them up to see what we need for a clean build

Re: [PR] GH-41334: [C++][Acero] Add env var to tune the size of the temp stack [arrow]

2024-05-01 Thread via GitHub
zanmato1984 commented on PR #41335: URL: https://github.com/apache/arrow/pull/41335#issuecomment-2088780641 > From the PR it mentions the max value for this variable should be 64MB. That seems confusing to me. Why wouldn't we just always use 64MB. Is there concern that 64MB per plan is too

Re: [I] How can I use arrow in my project as a git submodule [arrow]

2024-05-01 Thread via GitHub
ZhangChaoming commented on issue #13866: URL: https://github.com/apache/arrow/issues/13866#issuecomment-2088745136 Any solution? @g302ge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] [Python] Segfault in `to_pandas()` on batch from IPC stream in specific edge cases [arrow]

2024-05-01 Thread via GitHub
Tom-Newton commented on issue #41469: URL: https://github.com/apache/arrow/issues/41469#issuecomment-2088738164 So it turns out the bug is also only reproducible when numpy is imported prior to pyarrow. So my current smallest reproduce is ``` import numpy as np import pyarrow as

Re: [I] [Python] SEGFAULT when casting a slice of a fixed with binary array to binary [arrow]

2024-05-01 Thread via GitHub
westonpace commented on issue #41306: URL: https://github.com/apache/arrow/issues/41306#issuecomment-2088705612 Issue resolved by pull request 41452 https://github.com/apache/arrow/pull/41452 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] GH-41306: [C++] Check to avoid copying when NullBitmapBuffer is Null [arrow]

2024-05-01 Thread via GitHub
westonpace merged PR #41452: URL: https://github.com/apache/arrow/pull/41452 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[PR] MINOR: [JS] Bump web-streams-polyfill from 3.2.1 to 4.0.0 in /js [arrow]

2024-05-01 Thread via GitHub
dependabot[bot] opened a new pull request, #41483: URL: https://github.com/apache/arrow/pull/41483 Bumps [web-streams-polyfill](https://github.com/MattiasBuelens/web-streams-polyfill) from 3.2.1 to 4.0.0. Release notes Sourced from

[PR] MINOR: [JS] Bump rollup from 4.14.3 to 4.17.2 in /js [arrow]

2024-05-01 Thread via GitHub
dependabot[bot] opened a new pull request, #41484: URL: https://github.com/apache/arrow/pull/41484 Bumps [rollup](https://github.com/rollup/rollup) from 4.14.3 to 4.17.2. Release notes Sourced from https://github.com/rollup/rollup/releases;>rollup's releases. v4.17.2

[PR] MINOR: [JS] Bump @swc/helpers from 0.5.10 to 0.5.11 in /js [arrow]

2024-05-01 Thread via GitHub
dependabot[bot] opened a new pull request, #41486: URL: https://github.com/apache/arrow/pull/41486 Bumps [@swc/helpers](https://github.com/swc-project/swc) from 0.5.10 to 0.5.11. Commits See full diff in https://github.com/swc-project/swc/commits;>compare view

[PR] MINOR: [JS] Bump @typescript-eslint/eslint-plugin from 7.7.0 to 7.8.0 in /js [arrow]

2024-05-01 Thread via GitHub
dependabot[bot] opened a new pull request, #41485: URL: https://github.com/apache/arrow/pull/41485 Bumps [@typescript-eslint/eslint-plugin](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/eslint-plugin) from 7.7.0 to 7.8.0. Release notes Sourced from

[PR] MINOR: [JS] Bump memfs from 4.8.2 to 4.9.2 in /js [arrow]

2024-05-01 Thread via GitHub
dependabot[bot] opened a new pull request, #41482: URL: https://github.com/apache/arrow/pull/41482 Bumps [memfs](https://github.com/streamich/memfs) from 4.8.2 to 4.9.2. Release notes Sourced from https://github.com/streamich/memfs/releases;>memfs's releases. v4.9.2

Re: [PR] feat(csharp): imported objects should have call "release" when no longer in use [arrow-adbc]

2024-05-01 Thread via GitHub
CurtHagenlocher merged PR #1802: URL: https://github.com/apache/arrow-adbc/pull/1802 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] GH-41334: [C++][Acero] Add env var to tune the size of the temp stack [arrow]

2024-05-01 Thread via GitHub
westonpace commented on PR #41335: URL: https://github.com/apache/arrow/pull/41335#issuecomment-2088673206 From the PR it mentions the max value for this variable should be 64MB. > Sorry I'm still confused. From the discussion in the email thread, it seems like the intention at

Re: [I] [Python] Segfault in `to_pandas()` on batch from IPC stream in specific edge cases [arrow]

2024-05-01 Thread via GitHub
Tom-Newton commented on issue #41469: URL: https://github.com/apache/arrow/issues/41469#issuecomment-2088635338 Wait there is something else weird going. This smaller reproduce doesn't always work depending on the python environment. -- This is an automated message from the Apache

Re: [PR] GH-40078: [C++] Import/Export ArrowDeviceArrayStream [arrow]

2024-05-01 Thread via GitHub
zeroshade commented on code in PR #40807: URL: https://github.com/apache/arrow/pull/40807#discussion_r1586416413 ## cpp/src/arrow/record_batch.cc: ## @@ -623,6 +667,16 @@ Status RecordBatch::ValidateFull() const { return ValidateBatch(*this, /*full_validation=*/true); }

Re: [PR] GH-40078: [C++] Import/Export ArrowDeviceArrayStream [arrow]

2024-05-01 Thread via GitHub
zeroshade commented on code in PR #40807: URL: https://github.com/apache/arrow/pull/40807#discussion_r1586410971 ## cpp/src/arrow/c/bridge.h: ## @@ -321,6 +321,31 @@ ARROW_EXPORT Status ExportChunkedArray(std::shared_ptr chunked_array, struct

Re: [I] [Python] Segfault in `to_pandas()` on batch from IPC stream in specific edge cases [arrow]

2024-05-01 Thread via GitHub
Tom-Newton commented on issue #41469: URL: https://github.com/apache/arrow/issues/41469#issuecomment-2088613252 Thanks for the suggestions. PySpark doesn't really support this but I can hack it to make it do this. > does it still reproduce after a roundtrip to Parquet? No,

Re: [PR] GH-40078: [C++] Import/Export ArrowDeviceArrayStream [arrow]

2024-05-01 Thread via GitHub
zeroshade commented on code in PR #40807: URL: https://github.com/apache/arrow/pull/40807#discussion_r1586401760 ## cpp/src/arrow/array/data.cc: ## @@ -224,6 +224,41 @@ int64_t ArrayData::ComputeLogicalNullCount() const { return ArraySpan(*this).ComputeLogicalNullCount(); }

Re: [PR] GH-39301: [Archery][CI][Integration] Add nanoarrow to archery + integration setup [arrow]

2024-05-01 Thread via GitHub
paleolimbot commented on code in PR #39302: URL: https://github.com/apache/arrow/pull/39302#discussion_r1586399913 ## docker-compose.yml: ## @@ -1749,10 +1749,12 @@ services: volumes: *conda-volumes environment: <<: [*common, *ccache] -

Re: [I] [C++] Wrong and low inefficient expression execution for [if/else, case/when ... etc] expression [arrow]

2024-05-01 Thread via GitHub
felipecrv commented on issue #41094: URL: https://github.com/apache/arrow/issues/41094#issuecomment-2088594006 > The implementation method you mentioned is more elegant, and it basically does not affect the scheduling of the expression system of other functions. > We could focus on the

Re: [PR] feat(rust): add the driver manager [arrow-adbc]

2024-05-01 Thread via GitHub
alexandreyc commented on PR #1803: URL: https://github.com/apache/arrow-adbc/pull/1803#issuecomment-2088575625 CI is broken because we need the SQLite driver dynamic library in path to successfully run doctests... I have no idea how we can do that. Is there any similar situation elsewhere?

Re: [I] [Python] Building PyArrow: enable/disable python components by default based on availability in Arrow C++ [arrow]

2024-05-01 Thread via GitHub
paleolimbot commented on issue #41480: URL: https://github.com/apache/arrow/issues/41480#issuecomment-2088573991 Just a note that I think we have been doing this in the R package for quite some time (with apologies if I missed part of the nuance here):

Re: [PR] GH-40078: [C++] Import/Export ArrowDeviceArrayStream [arrow]

2024-05-01 Thread via GitHub
jorisvandenbossche commented on code in PR #40807: URL: https://github.com/apache/arrow/pull/40807#discussion_r1586297468 ## cpp/src/arrow/array/data.cc: ## @@ -224,6 +224,41 @@ int64_t ArrayData::ComputeLogicalNullCount() const { return

Re: [PR] feat(rust): add the driver manager [arrow-adbc]

2024-05-01 Thread via GitHub
alexandreyc commented on PR #1803: URL: https://github.com/apache/arrow-adbc/pull/1803#issuecomment-2088554164 CC @mbrobbel -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] feat(rust): add the driver manager [arrow-adbc]

2024-05-01 Thread via GitHub
alexandreyc opened a new pull request, #1803: URL: https://github.com/apache/arrow-adbc/pull/1803 Hey! Here is the penultimate PR containing the driver manager for Rust. The last PR will contain all the integration tests. CC @mbrobbel -- This is an automated message

Re: [PR] GH-40078: [C++] Import/Export ArrowDeviceArrayStream [arrow]

2024-05-01 Thread via GitHub
bkietz commented on code in PR #40807: URL: https://github.com/apache/arrow/pull/40807#discussion_r1586297491 ## python/pyarrow/tests/test_cffi.py: ## @@ -45,7 +45,7 @@ ValueError, match="Cannot import released ArrowArray") assert_stream_released = pytest.raises( -

[PR] feat(csharp): imported objects should have call "release" when no longer in use [arrow-adbc]

2024-05-01 Thread via GitHub
CurtHagenlocher opened a new pull request, #1802: URL: https://github.com/apache/arrow-adbc/pull/1802 Closes #1780 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] GH-40078: [C++] Import/Export ArrowDeviceArrayStream [arrow]

2024-05-01 Thread via GitHub
jorisvandenbossche commented on PR #40807: URL: https://github.com/apache/arrow/pull/40807#issuecomment-2088499517 From what I can see in GDB, this is happening with an empty schema (no fields), but then I don't understand why the size check in recordbatch.cc is not preventing this:

Re: [PR] GH-40078: [C++] Import/Export ArrowDeviceArrayStream [arrow]

2024-05-01 Thread via GitHub
jorisvandenbossche commented on PR #40807: URL: https://github.com/apache/arrow/pull/40807#issuecomment-2088492861 The Python failures seems definitely related. Fetching this PR in my local dev setup, I see the same segfault running the python tests. GDB backtrace: ``` $ gdb

Re: [I] [Python] _Py_IsFinalizing will be removed in Python 313 [arrow]

2024-05-01 Thread via GitHub
tacaswell commented on issue #41475: URL: https://github.com/apache/arrow/issues/41475#issuecomment-2088476819 https://github.com/cython/cython/blob/a6d810b970c21948c7fcdeec2cf28769e716e4a9/Cython/Utility/ModuleSetupCode.c#L671-L703 is an example of how cython does it internally (which is

Re: [PR] Add Julia example [WIP] [arrow-experiments]

2024-05-01 Thread via GitHub
ianmcook commented on PR #29: URL: https://github.com/apache/arrow-experiments/pull/29#issuecomment-2088470746 Thank you @simsurace! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Add Julia example [WIP] [arrow-experiments]

2024-05-01 Thread via GitHub
ianmcook merged PR #29: URL: https://github.com/apache/arrow-experiments/pull/29 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] chore(python): Remove C sources from wheels [arrow-nanoarrow]

2024-05-01 Thread via GitHub
jorisvandenbossche commented on PR #447: URL: https://github.com/apache/arrow-nanoarrow/pull/447#issuecomment-2088455969 > After this change the wheels are ~400kb and installed size is 1.5 MB (for me). Nice improvement! ;) -- This is an automated message from the Apache Git

[PR] chore(python): Clean up top-level namespace [arrow-nanoarrow]

2024-05-01 Thread via GitHub
paleolimbot opened a new pull request, #449: URL: https://github.com/apache/arrow-nanoarrow/pull/449 This PR cleans up the top-level namespace such that more advanced concepts that might be confusing to new nanoarrow users are tucked away in modules (i.e., they can be used and are

Re: [I] [Python] Building PyArrow: enable/disable python components by default based on availability in Arrow C++ [arrow]

2024-05-01 Thread via GitHub
jorisvandenbossche commented on issue #41480: URL: https://github.com/apache/arrow/issues/41480#issuecomment-2088423982 > Does somebody know the history behind the "failure_permitted" logic? Does it still have its use today? The main thing I don't understand about our current setup

Re: [PR] GH-41410: [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API reference [arrow]

2024-05-01 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #41411: URL: https://github.com/apache/arrow/pull/41411#issuecomment-2088414199 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 22f88fa4a8f5ac7250f1845aace5a78d20006ef2. There were

Re: [PR] GH-23221: [Python] python changes for pyodide build [arrow]

2024-05-01 Thread via GitHub
jorisvandenbossche commented on code in PR #37822: URL: https://github.com/apache/arrow/pull/37822#discussion_r1586242110 ## python/setup.py: ## @@ -133,8 +143,68 @@ def run(self): 'bundle the Arrow C++ headers')] +

Re: [PR] GH-23221: [Python] python changes for pyodide build [arrow]

2024-05-01 Thread via GitHub
jorisvandenbossche commented on code in PR #37822: URL: https://github.com/apache/arrow/pull/37822#discussion_r1586242110 ## python/setup.py: ## @@ -133,8 +143,68 @@ def run(self): 'bundle the Arrow C++ headers')] +

Re: [PR] chore(python): Remove C sources from wheels [arrow-nanoarrow]

2024-05-01 Thread via GitHub
paleolimbot merged PR #447: URL: https://github.com/apache/arrow-nanoarrow/pull/447 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] feat(python): Allow creation of dictionary and list types [arrow-nanoarrow]

2024-05-01 Thread via GitHub
paleolimbot merged PR #445: URL: https://github.com/apache/arrow-nanoarrow/pull/445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] GH-40078: [C++] Import/Export ArrowDeviceArrayStream [arrow]

2024-05-01 Thread via GitHub
paleolimbot commented on PR #40807: URL: https://github.com/apache/arrow/pull/40807#issuecomment-2088365810 > @paleolimbot said he was able to replicate the failure with his own debug python build It would be more accurate to say that I ran the tests with a debug build of Python

Re: [I] [Python][C++] Impossible to specify `is_adjusted_to_utc` for Time type when writing to Parquet [arrow]

2024-05-01 Thread via GitHub
LoganDark commented on issue #41476: URL: https://github.com/apache/arrow/issues/41476#issuecomment-2088308724 > In hindsight, I am also thinking that it would actually be more correct for Arrow to use `is_adjusted_to_utc=false` by default. I agree -- This is an automated message

Re: [I] `cast` kernel support for `StringViewArray` and `BinaryViewArray` [arrow-rs]

2024-05-01 Thread via GitHub
alamb commented on issue #5508: URL: https://github.com/apache/arrow-rs/issues/5508#issuecomment-2088296642 > As we know, we can use ViewArray for random access of byte buffers. So, when converting ViewArray to ByteArray, memory copy is unavoidable. I can't come up with a zero-copy way.

Re: [PR] GH-41463: [C++] Skip TestConcurrentFillFromScalar for platforms without threading support [arrow]

2024-05-01 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #41461: URL: https://github.com/apache/arrow/pull/41461#issuecomment-2088261009 After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 250291500b6a7d5d934901acef708cef2eb1dc08. There were

Re: [PR] feat(python): Allow creation of dictionary and list types [arrow-nanoarrow]

2024-05-01 Thread via GitHub
jorisvandenbossche commented on code in PR #445: URL: https://github.com/apache/arrow-nanoarrow/pull/445#discussion_r1586132695 ## python/src/nanoarrow/schema.py: ## @@ -957,14 +988,113 @@ def struct(fields, nullable=True) -> Schema: >>> import nanoarrow as na >>>

Re: [PR] feat(python): Allow creation of dictionary and list types [arrow-nanoarrow]

2024-05-01 Thread via GitHub
jorisvandenbossche commented on code in PR #445: URL: https://github.com/apache/arrow-nanoarrow/pull/445#discussion_r1586129139 ## python/src/nanoarrow/schema.py: ## @@ -957,14 +988,113 @@ def struct(fields, nullable=True) -> Schema: >>> import nanoarrow as na >>>

  1   2   >