Re: [I] Internal error: Invalid HashJoinExec partition count mismatch 1!=2 when constructing merge plan with 1 CPU [arrow-datafusion]

2024-04-04 Thread via GitHub
mustafasrepo commented on issue #9928: URL: https://github.com/apache/arrow-datafusion/issues/9928#issuecomment-2039083635 @echai58 can you post the PhysicalPlan for the failing test, if possible. It might help to produce datafusion only reproducer -- This is an automated message from th

Re: [PR] GH-23221: [C++] Add support for building with Emscripten [arrow]

2024-04-04 Thread via GitHub
joemarshall commented on PR #37821: URL: https://github.com/apache/arrow/pull/37821#issuecomment-2039050954 Brilliant, thanks everyone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Prune out constant expressions from output ordering. [arrow-datafusion]

2024-04-04 Thread via GitHub
mustafasrepo merged PR #9947: URL: https://github.com/apache/arrow-datafusion/pull/9947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] GH-40997: [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 [arrow]

2024-04-04 Thread via GitHub
ZhangHuiGui commented on PR #40998: URL: https://github.com/apache/arrow/pull/40998#issuecomment-2039024380 @mapleFU @westonpace PTAL the test, it will cover the case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] Some aggregates silently ignore `IGNORE NULLS` and `ORDER BY` on arguments [arrow-datafusion]

2024-04-04 Thread via GitHub
mustafasrepo commented on issue #9924: URL: https://github.com/apache/arrow-datafusion/issues/9924#issuecomment-2039021866 > FWIW @jayzhan211 has a really nice API suggestion here I think https://github.com/apache/arrow-datafusion/pull/9920/files#r1549905825 > > Specifically, add the

Re: [PR] feat: Support murmur3_hash and sha2 family hash functions [arrow-datafusion-comet]

2024-04-04 Thread via GitHub
viirya commented on code in PR #226: URL: https://github.com/apache/arrow-datafusion-comet/pull/226#discussion_r1552974787 ## core/src/execution/datafusion/spark_hash.rs: ## @@ -165,7 +165,6 @@ macro_rules! hash_array_primitive_float { } else {

Re: [PR] build(c): bump vendored fmt [arrow-adbc]

2024-04-04 Thread via GitHub
lidavidm merged PR #1708: URL: https://github.com/apache/arrow-adbc/pull/1708 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

Re: [PR] GH-39898: [C++] Add support for OpenTelemetry logging [arrow]

2024-04-04 Thread via GitHub
lidavidm commented on PR #39905: URL: https://github.com/apache/arrow/pull/39905#issuecomment-2039013402 It probably shouldn't. But problems like this are why I favor just opening up the gRPC part of Flight so that the user can configure everything themselves... -- This is an automated m

Re: [PR] Add Arrow Flight SQL ODBC driver [arrow]

2024-04-04 Thread via GitHub
jbonofre commented on PR #40939: URL: https://github.com/apache/arrow/pull/40939#issuecomment-2038994936 > > finalize Arrow update (the original code was using Arrow 9.x) > > I fear I can't really help on that front but once that's done I'd be happy to continue working on the CMake!

Re: [PR] Add Arrow Flight SQL ODBC driver [arrow]

2024-04-04 Thread via GitHub
jbonofre commented on PR #40939: URL: https://github.com/apache/arrow/pull/40939#issuecomment-2038994467 Hi @jduo ! > Some of the code in odbc_impl should be updated to use the same naming conventions as the arrow project (eg ODBCConnection -> odbc_connection). Yes, I'm doi

[I] Support SubqueryBroadcastExec in Comet [arrow-datafusion-comet]

2024-04-04 Thread via GitHub
viirya opened a new issue, #242: URL: https://github.com/apache/arrow-datafusion-comet/issues/242 ### What is the problem the feature request solves? Currently we support `BroadcastExchange` if it is under `BroadcastHashJoin` in Comet. Besides broadcast join, `BroadcastExchange

Re: [PR] GH-41020: [C++] Introduce portable compiler assumptions [arrow]

2024-04-04 Thread via GitHub
mapleFU commented on code in PR #41021: URL: https://github.com/apache/arrow/pull/41021#discussion_r1552930091 ## cpp/src/arrow/array/builder_nested.h: ## @@ -181,14 +181,13 @@ class ARROW_EXPORT VarLengthListLikeBuilder : public ArrayBuilder { if constexpr (is_list_view(T

Re: [PR] GH-40866: [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for row-major [arrow]

2024-04-04 Thread via GitHub
AlenkaF commented on code in PR #40867: URL: https://github.com/apache/arrow/pull/40867#discussion_r1552886195 ## cpp/src/arrow/record_batch.cc: ## @@ -283,18 +283,55 @@ struct ConvertColumnsToTensorVisitor { } }; +template +struct ConvertColumnsToTensorRowMajorVisitor {

Re: [PR] GH-40282: [Python] Use C++ type traits [arrow]

2024-04-04 Thread via GitHub
AlenkaF commented on PR #40761: URL: https://github.com/apache/arrow/pull/40761#issuecomment-2038914655 @pitrou mind giving one more look before I merge? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] DataFusion 37.0.0 upgrade [arrow-datafusion-comet]

2024-04-04 Thread via GitHub
viirya commented on issue #222: URL: https://github.com/apache/arrow-datafusion-comet/issues/222#issuecomment-2038880407 Yea, as #239 is going to use specified arrow-rs and DataFusion branch to work around a Java Arrow bug, we can hold on upgrading these dependencies. The fix of the

Re: [I] Support BroadcastNestedLoopJoinExec [arrow-datafusion-comet]

2024-04-04 Thread via GitHub
viirya commented on issue #198: URL: https://github.com/apache/arrow-datafusion-comet/issues/198#issuecomment-2038877061 YNote that I found several bugs in current broadcast implementation when trying to enable broadcast by default in #213. Since BroadcastNestedLoopJoinExec uses broadcast,

Re: [I] [Ruby] Improve Ruby's GC integration [arrow]

2024-04-04 Thread via GitHub
datbth commented on issue #40881: URL: https://github.com/apache/arrow/issues/40881#issuecomment-2038828252 ### Question Hi, may I ask if there is any planning/estimation for this yet? What does the effort look like? Would you need any help? Background I'm facing this wh

Re: [PR] GH-23221: [C++] Add support for building with Emscripten [arrow]

2024-04-04 Thread via GitHub
bitsondatadev commented on PR #37821: URL: https://github.com/apache/arrow/pull/37821#issuecomment-2038799868 Thanks for your reviews @kou and thanks @joemarshall for all the hard work!! -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] GH-40991: [R] Prefer r-universe, add a startup message [arrow]

2024-04-04 Thread via GitHub
jonkeane commented on code in PR #41019: URL: https://github.com/apache/arrow/pull/41019#discussion_r1552772364 ## r/R/arrow-info.R: ## @@ -139,7 +139,8 @@ arrow_with_json <- function() { some_features_are_off <- function(features) { # `features` is a named logical vector (a

Re: [PR] GH-34785: [C++][Parquet] Parquet Bloom Filter Writer Implementation [arrow]

2024-04-04 Thread via GitHub
wgtmac commented on code in PR #37400: URL: https://github.com/apache/arrow/pull/37400#discussion_r1552735314 ## cpp/src/parquet/bloom_filter_builder.cc: ## @@ -0,0 +1,155 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

[PR] coercion vec[Dictionary, Utf8] to Dictionary [arrow-datafusion]

2024-04-04 Thread via GitHub
Lordworms opened a new pull request, #9958: URL: https://github.com/apache/arrow-datafusion/pull/9958 ## Which issue does this PR close? Closes #9925 ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] GH-40991: [R] Prefer r-universe, add a startup message [arrow]

2024-04-04 Thread via GitHub
jonkeane commented on code in PR #41019: URL: https://github.com/apache/arrow/pull/41019#discussion_r1552724501 ## r/R/arrow-package.R: ## @@ -182,37 +182,50 @@ configure_tzdb <- function() { .onAttach <- function(libname, pkgname) { # Just to be extra safe, let's wrap this

Re: [PR] GH-39898: [C++] Add support for OpenTelemetry logging [arrow]

2024-04-04 Thread via GitHub
benibus commented on PR #39905: URL: https://github.com/apache/arrow/pull/39905#issuecomment-2038737969 @lidavidm Somewhat related question: I've noticed that `arrow::internal::tracing::GetTracer` (which, AFAIK, is used to create the vast majority of Arrow's spans) will only ever [return th

Re: [I] [CI][Archery] Archery linking should also check for undefined symbols macOS [arrow]

2024-04-04 Thread via GitHub
kou commented on issue #40965: URL: https://github.com/apache/arrow/issues/40965#issuecomment-2038735095 We can use `nm` on macOS too. We can use `otool -L` instead of `ldd` on macOS: ```console $ otool -L arrow-dataset-15.0.0/aarch_64/libarrow_dataset_jni.dylib arrow-dataset-

Re: [I] [Benchmarking][Java] new `java.lang.OutOfMemoryError` in Java benchmarks after local build cache change [arrow]

2024-04-04 Thread via GitHub
kou commented on issue #40775: URL: https://github.com/apache/arrow/issues/40775#issuecomment-2038712155 Issue resolved by pull request 40786 https://github.com/apache/arrow/pull/40786 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] GH-40775: [Benchmarking][Java] Fix conbench timeout [arrow]

2024-04-04 Thread via GitHub
kou merged PR #40786: URL: https://github.com/apache/arrow/pull/40786 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [I] Support ORDER BY in AggregateUDF [arrow-datafusion]

2024-04-04 Thread via GitHub
jayzhan211 commented on issue #8984: URL: https://github.com/apache/arrow-datafusion/issues/8984#issuecomment-2038711391 Complete with #9874 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Support ORDER BY in AggregateUDF [arrow-datafusion]

2024-04-04 Thread via GitHub
jayzhan211 closed issue #8984: Support ORDER BY in AggregateUDF URL: https://github.com/apache/arrow-datafusion/issues/8984 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] GH-40924: [MATLAB][Packaging] Add script for uploading Release Candidate (RC) MLTBX packages for the MATLAB bindings to the Apache Arrow GitHub Releases area [arrow]

2024-04-04 Thread via GitHub
kou commented on PR #40956: URL: https://github.com/apache/arrow/pull/40956#issuecomment-2038675892 We can use `dev/release/04-binary-download.sh ${MAJOR}.${MINOR}.${PATCH} ${RC} --task-filter matlab` to download the `.mltbx` file. :-) -- This is an automated message from the Apache Git S

Re: [PR] GH-23221: [C++] Add support for building with Emscripten [arrow]

2024-04-04 Thread via GitHub
kou merged PR #37821: URL: https://github.com/apache/arrow/pull/37821 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] GH-40974: [CI][Python] CI failures on Python builds due to pytest_cython [arrow]

2024-04-04 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #40975: URL: https://github.com/apache/arrow/pull/40975#issuecomment-2038639573 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit bbeeb33a2fb65f40caf6c3176ee377de2b9de6e5. There were no

Re: [PR] MINOR: [Java] `DenseUnionVector.empty` should create not-nullable DUVs [arrow]

2024-04-04 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #41001: URL: https://github.com/apache/arrow/pull/41001#issuecomment-2038637412 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit ad6758900da1706d3cbfd59e5fe7d1d548c4235b. There were no

Re: [PR] Add Arrow Flight SQL ODBC driver [arrow]

2024-04-04 Thread via GitHub
assignUser commented on PR #40939: URL: https://github.com/apache/arrow/pull/40939#issuecomment-2038601379 > finalize Arrow update (the original code was using Arrow 9.x) I fear I can't really help on that front but once that's done I'd be happy to continue working on the CMake! --

[I] Remove builtin aggregate function `FirstValue` [arrow-datafusion]

2024-04-04 Thread via GitHub
jayzhan211 opened a new issue, #9957: URL: https://github.com/apache/arrow-datafusion/issues/9957 ### Is your feature request related to a problem or challenge? Follow up #9874 I did not remove the old built-in first-value aggregate function, so I plan to remove it. ### Des

Re: [PR] GH-40400: [C++] Add support for LLD [arrow]

2024-04-04 Thread via GitHub
kou commented on PR #40927: URL: https://github.com/apache/arrow/pull/40927#issuecomment-2038582301 Could you also add this to use `ARROW_USE_LLD` environment variable value? ```diff diff --git a/ci/scripts/cpp_build.sh b/ci/scripts/cpp_build.sh index 1e09924a5e..62bb1bc13f 10075

Re: [PR] GH-40400: [C++] Add support for LLD [arrow]

2024-04-04 Thread via GitHub
kou commented on code in PR #40927: URL: https://github.com/apache/arrow/pull/40927#discussion_r1552654177 ## cpp/cmake_modules/DefineOptions.cmake: ## @@ -170,6 +170,8 @@ takes precedence over ccache if a storage backend is configured" ON) define_option(ARROW_USE_LD_GOLD

Re: [PR] GH-40775: [Benchmarking][Java] Fix conbench timeout [arrow]

2024-04-04 Thread via GitHub
danepitkin commented on PR #40786: URL: https://github.com/apache/arrow/pull/40786#issuecomment-2038576954 > @danepitkin mind updating the PR description now that the approach is different? Agh, good catch. Done! -- This is an automated message from the Apache Git Service. To respo

Re: [I] move Floor, Gcd, Lcm, Pi to datafusion-functions [arrow-datafusion]

2024-04-04 Thread via GitHub
Omega359 commented on issue #9861: URL: https://github.com/apache/arrow-datafusion/issues/9861#issuecomment-2038571047 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] GH-40991: [R] Prefer r-universe, add a startup message [arrow]

2024-04-04 Thread via GitHub
nealrichardson commented on code in PR #41019: URL: https://github.com/apache/arrow/pull/41019#discussion_r1552630057 ## r/R/install-arrow.R: ## @@ -60,6 +62,7 @@ install_arrow <- function(nightly = FALSE, minimal = Sys.getenv("LIBARROW_MINIMAL", FALSE

Re: [PR] GH-41020: [C++] Introduce portable compiler assumptions [arrow]

2024-04-04 Thread via GitHub
github-actions[bot] commented on PR #41021: URL: https://github.com/apache/arrow/pull/41021#issuecomment-2038498586 :warning: GitHub issue #41020 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Compiler assume [arrow]

2024-04-04 Thread via GitHub
github-actions[bot] commented on PR #41021: URL: https://github.com/apache/arrow/pull/41021#issuecomment-2038496541 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue f

[PR] Compiler assume [arrow]

2024-04-04 Thread via GitHub
felipecrv opened a new pull request, #41021: URL: https://github.com/apache/arrow/pull/41021 ### Rationale for this change Allow portable use of the "assumption" feature of modern GCC, Clang, and MSVC compilers. ### What changes are included in this PR? - Documenting of

Re: [I] Versions >32.0.0 on PyPI have broken substrait support [arrow-datafusion]

2024-04-04 Thread via GitHub
EpsilonPrime commented on issue #9823: URL: https://github.com/apache/arrow-datafusion/issues/9823#issuecomment-2038491750 I had one working environment so I started removing packages. When I downgraded libabseil I started getting this result. I don't think that's the culprit but it did

Re: [PR] feat(format): add info codes for supported capabilities [arrow-adbc]

2024-04-04 Thread via GitHub
joellubi commented on code in PR #1649: URL: https://github.com/apache/arrow-adbc/pull/1649#discussion_r1552617846 ## adbc.h: ## @@ -459,6 +459,28 @@ const struct AdbcError* AdbcErrorFromArrayStream(struct ArrowArrayStream* stream /// /// \see AdbcConnectionGetInfo #define A

Re: [I] Add spilling in SortMergeJoin [arrow-datafusion]

2024-04-04 Thread via GitHub
comphead commented on issue #9359: URL: https://github.com/apache/arrow-datafusion/issues/9359#issuecomment-2038473649 Related to https://github.com/apache/arrow-datafusion/issues/9846 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Versions >32.0.0 on PyPI have broken substrait support [arrow-datafusion]

2024-04-04 Thread via GitHub
EpsilonPrime commented on issue #9823: URL: https://github.com/apache/arrow-datafusion/issues/9823#issuecomment-2038466573 I'm also seeing this behavior. The possibilities I can come up with is not being included as a feature in recent builds (which seems unlikely since it sometimes works

Re: [I] Add Spark expression coverage [arrow-datafusion-comet]

2024-04-04 Thread via GitHub
comphead commented on issue #240: URL: https://github.com/apache/arrow-datafusion-comet/issues/240#issuecomment-2038466849 The list of Spark expression can be found https://spark.apache.org/docs/latest/api/sql/index.html -- This is an automated message from the Apache Git Service. To res

Re: [I] DataFusion 37.0.0 upgrade [arrow-datafusion-comet]

2024-04-04 Thread via GitHub
comphead commented on issue #222: URL: https://github.com/apache/arrow-datafusion-comet/issues/222#issuecomment-2038463526 Related to https://github.com/apache/arrow-datafusion-comet/pull/239/files -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] GH-40991: [R] Prefer r-universe, add a startup message [arrow]

2024-04-04 Thread via GitHub
jonkeane commented on code in PR #41019: URL: https://github.com/apache/arrow/pull/41019#discussion_r1552592383 ## r/R/install-arrow.R: ## @@ -76,22 +79,15 @@ install_arrow <- function(nightly = FALSE, ARROW_R_DEV = verbose, ARROW_USE_PKG_CONFIG = use_system )

Re: [PR] GH-40991: [R] Prefer r-universe, add a startup message [arrow]

2024-04-04 Thread via GitHub
jonkeane commented on code in PR #41019: URL: https://github.com/apache/arrow/pull/41019#discussion_r1552591184 ## r/R/arrow-package.R: ## @@ -182,37 +182,50 @@ configure_tzdb <- function() { .onAttach <- function(libname, pkgname) { # Just to be extra safe, let's wrap this

Re: [PR] GH-40991: [R] Prefer r-universe, add a startup message [arrow]

2024-04-04 Thread via GitHub
jonkeane commented on code in PR #41019: URL: https://github.com/apache/arrow/pull/41019#discussion_r1552591720 ## r/R/install-arrow.R: ## @@ -60,6 +62,7 @@ install_arrow <- function(nightly = FALSE, minimal = Sys.getenv("LIBARROW_MINIMAL", FALSE),

Re: [PR] GH-40991: [R] Prefer r-universe, add a startup message [arrow]

2024-04-04 Thread via GitHub
jonkeane commented on code in PR #41019: URL: https://github.com/apache/arrow/pull/41019#discussion_r1552590123 ## r/DESCRIPTION: ## @@ -43,7 +43,7 @@ Imports: utils, vctrs Roxygen: list(markdown = TRUE, r6 = FALSE, load = "source") -RoxygenNote: 7.2.3 +RoxygenNote: 7

Re: [PR] GH-40224: [C++] Fix: improve the backpressure handling in the dataset writer [arrow]

2024-04-04 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #40722: URL: https://github.com/apache/arrow/pull/40722#issuecomment-2038450161 After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 640c10191a51f6d0f408c72f45dbf5d94ec0b9d7. There were no

Re: [PR] GH-40991: [R] Prefer r-universe, add a startup message [arrow]

2024-04-04 Thread via GitHub
jonkeane commented on code in PR #41019: URL: https://github.com/apache/arrow/pull/41019#discussion_r1552590380 ## r/R/arrow-package.R: ## @@ -182,37 +182,50 @@ configure_tzdb <- function() { .onAttach <- function(libname, pkgname) { # Just to be extra safe, let's wrap this

Re: [PR] GH-40991: [R] Prefer r-universe, add a startup message [arrow]

2024-04-04 Thread via GitHub
github-actions[bot] commented on PR #41019: URL: https://github.com/apache/arrow/pull/41019#issuecomment-2038450150 :warning: GitHub issue #40991 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-40991: [R] Prefer r-universe, add a startup message [arrow]

2024-04-04 Thread via GitHub
jonkeane opened a new pull request, #41019: URL: https://github.com/apache/arrow/pull/41019 ### Rationale for this change If someone loads a version of Arrow on macOS with features disabled, warn them on startup that they can use `install_arrow()`. By default, prefer R-Universe in `i

Re: [PR] GH-40361: [C++] Make flatbuffers serialization more deterministic [arrow]

2024-04-04 Thread via GitHub
amoeba commented on PR #40392: URL: https://github.com/apache/arrow/pull/40392#issuecomment-2038434367 I put up a draft PR with a test to exercise this and hopefully watch it fail on CI at https://github.com/apache/arrow/pull/41018. It fails on my amd64 linux machine so I think it's close t

Re: [PR] [C++] Add test for flatbuffers serialization [arrow]

2024-04-04 Thread via GitHub
amoeba commented on PR #41018: URL: https://github.com/apache/arrow/pull/41018#issuecomment-2038430690 The test added by this PR succeeds on macOS (where I generated the test data) and fails on my Linux machine so I think that's a start. I'll wait for CI and return to this when that's all b

Re: [PR] Add Arrow Flight SQL ODBC driver [arrow]

2024-04-04 Thread via GitHub
jduo commented on code in PR #40939: URL: https://github.com/apache/arrow/pull/40939#discussion_r1552573858 ## cpp/src/arrow/flight/sql/odbc/flight_sql/include/flight_sql/ui/add_property_window.h: ## @@ -0,0 +1,125 @@ +// Licensed to the Apache Software Foundation (ASF) under on

Re: [PR] GH-40400: [C++] Add support for LLD [arrow]

2024-04-04 Thread via GitHub
cryos commented on PR #40927: URL: https://github.com/apache/arrow/pull/40927#issuecomment-2038394954 I think this should address all of your review comments @kou and it looks more general. I tested locally, and hopefully this now gets some testing in CI. -- This is an automated message f

Re: [PR] GH-40339: [Java] StringView Initial Implementation [arrow]

2024-04-04 Thread via GitHub
lidavidm commented on code in PR #40340: URL: https://github.com/apache/arrow/pull/40340#discussion_r1552529173 ## java/vector/src/main/java/org/apache/arrow/vector/BaseVariableWidthViewVector.java: ## @@ -0,0 +1,1570 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] GH-40249: [Java] Fix NPE in ArrowDatabaseMetadata [arrow]

2024-04-04 Thread via GitHub
lidavidm commented on code in PR #40988: URL: https://github.com/apache/arrow/pull/40988#discussion_r1552519380 ## java/flight/flight-sql-jdbc-core/src/main/java/org/apache/arrow/driver/jdbc/ArrowDatabaseMetadata.java: ## @@ -754,7 +754,11 @@ private T getSqlInfoAndCacheIfCache

Re: [PR] GH-40775: [Benchmarking][Java] Fix conbench timeout [arrow]

2024-04-04 Thread via GitHub
lidavidm commented on PR #40786: URL: https://github.com/apache/arrow/pull/40786#issuecomment-2038340762 @danepitkin mind updating the PR description now that the approach is different? -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] build(c): bump vendored fmt [arrow-adbc]

2024-04-04 Thread via GitHub
lidavidm commented on PR #1708: URL: https://github.com/apache/arrow-adbc/pull/1708#issuecomment-2038335842 aha, I didn't realize there was a docker image for the clang18 R uses! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[I] Zero Copy Support [arrow-rs]

2024-04-04 Thread via GitHub
plewis110 opened a new issue, #5593: URL: https://github.com/apache/arrow-rs/issues/5593 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Hi, I've been having trouble integrating with the [`arrow_ipc`](https://docs.rs/arrow

Re: [PR] [C++] Add test for flatbuffers serialization [arrow]

2024-04-04 Thread via GitHub
github-actions[bot] commented on PR #41018: URL: https://github.com/apache/arrow/pull/41018#issuecomment-2038273494 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue f

[PR] [C++] Add test for flatbuffers serialization [arrow]

2024-04-04 Thread via GitHub
amoeba opened a new pull request, #41018: URL: https://github.com/apache/arrow/pull/41018 **This is a draft PR that shouldn't get merged.** This is testing the test described in https://github.com/apache/arrow/issues/40361. Once the test has done its job on CI, I'll add a follow-up c

Re: [I] DataFusion weekly project plan (Andrew Lamb) - April 1, 2024 [arrow-datafusion]

2024-04-04 Thread via GitHub
seddonm1 commented on issue #9899: URL: https://github.com/apache/arrow-datafusion/issues/9899#issuecomment-2038252900 Thanks @Omega359 . I was thrown as the aliases are not registered here https://github.com/apache/arrow-datafusion/blob/3ae029988754c3fd3eb000abd4b76e643b9cbc7b/datafusion/e

Re: [PR] Avoid copying (so much) for `LogicalPlan::map_children` [arrow-datafusion]

2024-04-04 Thread via GitHub
alamb commented on PR #9946: URL: https://github.com/apache/arrow-datafusion/pull/9946#issuecomment-2038247737 > Sorry @alamb , I'm still working on my #9913. I realized that there are a few more issues I need to fix and test. Will try to finish it tomorrow or during the weekend and ping y

Re: [I] inner join involving hive-partitioned parquet dataset and filters on LHS and RHS causes panic [arrow-datafusion]

2024-04-04 Thread via GitHub
jwimberl commented on issue #9797: URL: https://github.com/apache/arrow-datafusion/issues/9797#issuecomment-2038234776 OK, with DataFusion 36.0.0, I still get a panic when running this query in its original context -- which is a rust module using the datafusion crate and its dependencies.

Re: [I] DataFusion weekly project plan (Andrew Lamb) - April 1, 2024 [arrow-datafusion]

2024-04-04 Thread via GitHub
alamb commented on issue #9899: URL: https://github.com/apache/arrow-datafusion/issues/9899#issuecomment-2038232770 DataFusion 37.0.0 is released: https://github.com/apache/arrow-datafusion/issues/9682 🌮 -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [I] Release DataFusion 37.0.0 [arrow-datafusion]

2024-04-04 Thread via GitHub
alamb closed issue #9682: Release DataFusion 37.0.0 URL: https://github.com/apache/arrow-datafusion/issues/9682 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[PR] Minor: Update release README [arrow-datafusion]

2024-04-04 Thread via GitHub
alamb opened a new pull request, #9956: URL: https://github.com/apache/arrow-datafusion/pull/9956 ## Which issue does this PR close? Part of https://github.com/apache/arrow-datafusion/issues/9682 ## Rationale for this change There are some new crates in datafusion 27.0.0

Re: [I] Release DataFusion 37.0.0 [arrow-datafusion]

2024-04-04 Thread via GitHub
alamb commented on issue #9682: URL: https://github.com/apache/arrow-datafusion/issues/9682#issuecomment-2038232143 I made some small improvements based on my experience this time, but otherwise I think this is now done 1. https://github.com/apache/arrow-datafusion/pull/9955 2. https:

Re: [PR] CI: releasing CLI to PyPI [arrow-datafusion]

2024-04-04 Thread via GitHub
MohamedAbdeen21 commented on PR #9452: URL: https://github.com/apache/arrow-datafusion/pull/9452#issuecomment-2038223556 Removed the release from PyPI and closing this issue, to be re-opened in the Python repo by the maintainers. -- This is an automated message from the Apache Git Servic

Re: [PR] Fix datafusion-cli publishing [arrow-datafusion]

2024-04-04 Thread via GitHub
alamb commented on code in PR #9955: URL: https://github.com/apache/arrow-datafusion/pull/9955#discussion_r1552444361 ## datafusion-cli/src/command.rs: ## @@ -26,9 +26,9 @@ use datafusion::arrow::array::{ArrayRef, StringArray}; use datafusion::arrow::datatypes::{DataType, Field

Re: [PR] CI: releasing CLI to PyPI [arrow-datafusion]

2024-04-04 Thread via GitHub
MohamedAbdeen21 closed pull request #9452: CI: releasing CLI to PyPI URL: https://github.com/apache/arrow-datafusion/pull/9452 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Fix datafusion-cli publishing [arrow-datafusion]

2024-04-04 Thread via GitHub
alamb commented on code in PR #9955: URL: https://github.com/apache/arrow-datafusion/pull/9955#discussion_r1552445101 ## datafusion-cli/Cargo.toml: ## @@ -45,7 +45,6 @@ datafusion = { path = "../datafusion/core", version = "37.0.0", features = [ "unicode_expressions",

[PR] Fix datafusion-cli publishing [arrow-datafusion]

2024-04-04 Thread via GitHub
alamb opened a new pull request, #9955: URL: https://github.com/apache/arrow-datafusion/pull/9955 ## Which issue does this PR close? Part of #9682 ## Rationale for this change I had to add a `version` field to the datafusion-cli `Cargo.toml` file to correctly pu

Re: [PR] Remove `OwnedTableReference` and `OwnedSchemaReference` [arrow-datafusion]

2024-04-04 Thread via GitHub
comphead merged PR #9933: URL: https://github.com/apache/arrow-datafusion/pull/9933 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@a

Re: [PR] GH-40959: [JS] Store Timestamps in 64 bits [arrow]

2024-04-04 Thread via GitHub
domoritz commented on code in PR #40960: URL: https://github.com/apache/arrow/pull/40960#discussion_r1552427192 ## js/test/unit/vector/date-vector-tests.ts: ## @@ -15,34 +15,62 @@ // specific language governing permissions and limitations // under the License. -import { Date

Re: [I] Release DataFusion 37.0.0 [arrow-datafusion]

2024-04-04 Thread via GitHub
alamb commented on issue #9682: URL: https://github.com/apache/arrow-datafusion/issues/9682#issuecomment-2038204509 The release is available here: https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-37.0.0 The release has also been published to crates.io: https:/

Re: [PR] GH-40806: [C++] Revert changes from PR #40857 [arrow]

2024-04-04 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #40980: URL: https://github.com/apache/arrow/pull/40980#issuecomment-2038204396 After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 26631d7504420ff00a827d40273b589c6d38860f. There were no

Re: [PR] filter for run end array [arrow-rs]

2024-04-04 Thread via GitHub
Jefffrey merged PR #5573: URL: https://github.com/apache/arrow-rs/pull/5573 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

Re: [I] Seek page in column chunk of Parquet [arrow]

2024-04-04 Thread via GitHub
Luosuu commented on issue #40981: URL: https://github.com/apache/arrow/issues/40981#issuecomment-2038169477 Thank you so much. I think I understand now. I will look into `arrow-rs` for my usage then. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Avoid copying (so much) for `LogicalPlan::map_children` [arrow-datafusion]

2024-04-04 Thread via GitHub
peter-toth commented on PR #9946: URL: https://github.com/apache/arrow-datafusion/pull/9946#issuecomment-2038163683 > Ok, I think once https://github.com/apache/arrow-datafusion/pull/9913 from @peter-toth is merged this PR will be ready to review Sorry @alamb , I'm still working on m

Re: [PR] GH-40959: [JS] Store Timestamps in 64 bits [arrow]

2024-04-04 Thread via GitHub
domoritz commented on code in PR #40960: URL: https://github.com/apache/arrow/pull/40960#discussion_r1552386081 ## js/src/type.ts: ## @@ -333,23 +333,47 @@ export class Decimal extends DataType { /** @ignore */ export type Dates = Type.Date | Type.DateDay | Type.DateMillisecon

Re: [PR] GH-40959: [JS] Store Timestamps in 64 bits [arrow]

2024-04-04 Thread via GitHub
domoritz commented on code in PR #40960: URL: https://github.com/apache/arrow/pull/40960#discussion_r1552386081 ## js/src/type.ts: ## @@ -333,23 +333,47 @@ export class Decimal extends DataType { /** @ignore */ export type Dates = Type.Date | Type.DateDay | Type.DateMillisecon

Re: [I] unnest doesn't take into account null values [arrow-datafusion]

2024-04-04 Thread via GitHub
alamb commented on issue #9932: URL: https://github.com/apache/arrow-datafusion/issues/9932#issuecomment-2038148937 Thank you for the report @rescrv -- is there any chance you can provide a self contained reproducer? I suspect this issue would get fixed quickly if that were available --

Re: [PR] GH-40959: [JS] Store Timestamps in 64 bits [arrow]

2024-04-04 Thread via GitHub
domoritz commented on code in PR #40960: URL: https://github.com/apache/arrow/pull/40960#discussion_r1552377309 ## js/src/type.ts: ## @@ -333,23 +333,47 @@ export class Decimal extends DataType { /** @ignore */ export type Dates = Type.Date | Type.DateDay | Type.DateMillisecon

Re: [PR] GH-40959: [JS] Store Timestamps in 64 bits [arrow]

2024-04-04 Thread via GitHub
domoritz commented on code in PR #40960: URL: https://github.com/apache/arrow/pull/40960#discussion_r1552376861 ## js/src/type.ts: ## @@ -333,23 +333,47 @@ export class Decimal extends DataType { /** @ignore */ export type Dates = Type.Date | Type.DateDay | Type.DateMillisecon

Re: [PR] Avoid copying (so much) for `LogicalPlan::map_children` [arrow-datafusion]

2024-04-04 Thread via GitHub
alamb commented on code in PR #9946: URL: https://github.com/apache/arrow-datafusion/pull/9946#discussion_r1552374770 ## datafusion/expr/src/logical_plan/rewrite.rs: ## @@ -0,0 +1,228 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] Avoid copying (so much) for `LogicalPlan::map_children` [arrow-datafusion]

2024-04-04 Thread via GitHub
alamb commented on PR #9946: URL: https://github.com/apache/arrow-datafusion/pull/9946#issuecomment-2038134330 Ok, I think once https://github.com/apache/arrow-datafusion/pull/9913 from @peter-toth is merged this PR will be ready to review -- This is an automated message from the Apache

[PR] Introduce OptimizerRule::try_optimize_owned to rewrite in place [arrow-datafusion]

2024-04-04 Thread via GitHub
alamb opened a new pull request, #9954: URL: https://github.com/apache/arrow-datafusion/pull/9954 WIP builds on https://github.com/apache/arrow-datafusion/pull/9948 TODO: figure out subquery recursion ## Which issue does this PR close? Part of https://github.com/a

Re: [I] Seek page in column chunk of Parquet [arrow]

2024-04-04 Thread via GitHub
mapleFU commented on issue #40981: URL: https://github.com/apache/arrow/issues/40981#issuecomment-2038130354 Aha, no. Whether we have page index: "a", "b", "c", and "d" are stored in different pages. There are three levels: * File ( a whole parquet file with same schema containing zero or

[I] Colon (:) in in object_store::path::{Path} is not handled in Windows [arrow-rs]

2024-04-04 Thread via GitHub
thomasfrederikhoeck opened a new issue, #5592: URL: https://github.com/apache/arrow-rs/issues/5592 **Describe the bug** When creating a `Path` on Windows colon (`:`) is not sanitized like `<` and `|`. **To Reproduce** ```rust use object_store::path::{Path}; fn main() {

Re: [PR] Add Send + Sync traits for Datum [arrow-rs]

2024-04-04 Thread via GitHub
viirya commented on PR #5587: URL: https://github.com/apache/arrow-rs/pull/5587#issuecomment-2038103930 Sorry, I still don't know how can I add `Send` as a constraint to `Scalar` outside this crate. 🤔 I can let the function return `Datum + Send` so `Send` is a constraint of the retu

Re: [I] Seek page in column chunk of Parquet [arrow]

2024-04-04 Thread via GitHub
Luosuu commented on issue #40981: URL: https://github.com/apache/arrow/issues/40981#issuecomment-2038101224 I see. If I understand correctly, you mean when page index enabled, for example: ```json { # <-- First record "a": 1, # <-- the top level fields are

Re: [PR] Validate partitions columns in `CREATE EXTERNAL TABLE` if table already exists. [arrow-datafusion]

2024-04-04 Thread via GitHub
MohamedAbdeen21 commented on PR #9912: URL: https://github.com/apache/arrow-datafusion/pull/9912#issuecomment-2038087981 Hi @alamb, I've implemented the check we discussed. Apart from adding unit tests for the two new functions, I believe the PR is now adequately covered. I'm curious

Re: [I] Seek page in column chunk of Parquet [arrow]

2024-04-04 Thread via GitHub
mapleFU commented on issue #40981: URL: https://github.com/apache/arrow/issues/40981#issuecomment-2038081376 > "a", "b", "c", and "d" are stored in different pages Yes, and row would not cross page if page index enabled. -- This is an automated message from the Apache Git Service. T

  1   2   3   4   5   >