Re: [I] Interoperability between arrow-rs and nanoarrow [arrow-rs]

2023-11-07 Thread via GitHub
evgenyx00 commented on issue #5052: URL: https://github.com/apache/arrow-rs/issues/5052#issuecomment-1801269101 @tustvold appreciate your prompt assistance :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [I] [Python] Python cookbooks are failing against current development [arrow-cookbook]

2023-11-07 Thread via GitHub
AlenkaF commented on issue #331: URL: https://github.com/apache/arrow-cookbook/issues/331#issuecomment-1801265027 The example is failing due to the change in the error raised by `pa.unify_schemas`. The change happened in https://github.com/apache/arrow/pull/36846/files and is now raising `

Re: [I] [C++] During unit testing, the float array contains nan equality judgment [arrow]

2023-11-07 Thread via GitHub
Light-City commented on issue #38624: URL: https://github.com/apache/arrow/issues/38624#issuecomment-1801262662 right. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] [Parquet][Python] Potential regression in Parquet parallel reading [arrow]

2023-11-07 Thread via GitHub
eeroel commented on issue #38591: URL: https://github.com/apache/arrow/issues/38591#issuecomment-1801259896 > Thanks @eeroel > > I've merged this patch now, maybe I'll try to add some counting test tonight Thank you! -- This is an automated message from the Apache Git Servic

Re: [PR] feat: emitting partial join results in `HashJoinStream` [arrow-datafusion]

2023-11-07 Thread via GitHub
korowa commented on code in PR #8020: URL: https://github.com/apache/arrow-datafusion/pull/8020#discussion_r1386140174 ## datafusion/sqllogictest/test_files/join_disable_repartition_joins.slt: ## @@ -72,11 +72,11 @@ SELECT t1.a, t1.b, t1.c, t2.a as a2 ON t1.d = t2.d ORDER BY a

Re: [PR] feat: emitting partial join results in `HashJoinStream` [arrow-datafusion]

2023-11-07 Thread via GitHub
korowa commented on code in PR #8020: URL: https://github.com/apache/arrow-datafusion/pull/8020#discussion_r1386140174 ## datafusion/sqllogictest/test_files/join_disable_repartition_joins.slt: ## @@ -72,11 +72,11 @@ SELECT t1.a, t1.b, t1.c, t2.a as a2 ON t1.d = t2.d ORDER BY a

Re: [I] Reading parquet file behavior change from 13.0.0 to 14.0.0 [arrow]

2023-11-07 Thread via GitHub
mapleFU commented on issue #38577: URL: https://github.com/apache/arrow/issues/38577#issuecomment-1801247211 @jhwang7628 After go though the code, I think this might related to https://github.com/apache/arrow/pull/38437 I'll try to check this in this week. This might trable when you r

Re: [PR] Fixing broken link [arrow-datafusion]

2023-11-07 Thread via GitHub
viirya commented on PR #8085: URL: https://github.com/apache/arrow-datafusion/pull/8085#issuecomment-1801205588 Thank you @edmondop -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Fixing broken link [arrow-datafusion]

2023-11-07 Thread via GitHub
viirya merged PR #8085: URL: https://github.com/apache/arrow-datafusion/pull/8085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arr

Re: [PR] feat: emitting partial join results in `HashJoinStream` [arrow-datafusion]

2023-11-07 Thread via GitHub
korowa commented on PR #8020: URL: https://github.com/apache/arrow-datafusion/pull/8020#issuecomment-1801202574 @alamb I guess so -- seems like it's going to grow large enough, to make the review inconvenient and reach critical mass of quite an important changes 😞 One possible way i

[PR] Minor: Cleanup BuiltinScalarFunction::return_type() [arrow-datafusion]

2023-11-07 Thread via GitHub
2010YOUY01 opened a new pull request, #8088: URL: https://github.com/apache/arrow-datafusion/pull/8088 ## Which issue does this PR close? A small refactor for https://github.com/apache/arrow-datafusion/issues/8045 ## Rationale for this change #8045 proposed a

Re: [I] [C++] During unit testing, the float array contains nan equality judgment [arrow]

2023-11-07 Thread via GitHub
mapleFU commented on issue #38624: URL: https://github.com/apache/arrow/issues/38624#issuecomment-1801184988 I agree it's a bug, lets fixing this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] [Parquet][Python] Potential regression in Parquet parallel reading [arrow]

2023-11-07 Thread via GitHub
mapleFU commented on issue #38591: URL: https://github.com/apache/arrow/issues/38591#issuecomment-1801178533 Thanks @eeroel I've merged this patch now, maybe I'll try to add some counting test tonight -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] GH-38591: [Parquet][C++] Remove redundant open calls in `ParquetFileFormat::GetReaderAsync` [arrow]

2023-11-07 Thread via GitHub
mapleFU merged PR #38621: URL: https://github.com/apache/arrow/pull/38621 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apach

Re: [I] [C++] Dynamic hook simd level seems not involve sse4.2 [arrow]

2023-11-07 Thread via GitHub
js8544 commented on issue #38623: URL: https://github.com/apache/arrow/issues/38623#issuecomment-1801172017 Adding to kou's comment, the code you pasted are specific to the arrow compute functions. Currently no compute functions have a SSE kernel implementation so there's no need yet to che

Re: [I] [C++] During unit testing, the float array contains nan equality judgment [arrow]

2023-11-07 Thread via GitHub
js8544 commented on issue #38624: URL: https://github.com/apache/arrow/issues/38624#issuecomment-1801163688 Yeah I think it's a bug because when comparing arrays we use the `TestingEqualOptions()` which has `nans_equal=true`. Could you submit a PR to fix this? -- This is an automated mes

Re: [PR] GH-38589: [C++][Gandiva] Support registering external C interface functions [arrow]

2023-11-07 Thread via GitHub
niyue commented on code in PR #38632: URL: https://github.com/apache/arrow/pull/38632#discussion_r1386048070 ## cpp/src/gandiva/tests/projector_test.cc: ## @@ -3608,4 +3608,80 @@ TEST_F(TestProjector, TestExtendedFunctions) { EXPECT_ARROW_ARRAY_EQUALS(out, outs.at(0)); } +

Re: [PR] GH-38589: [C++][Gandiva] Support registering external C interface functions [arrow]

2023-11-07 Thread via GitHub
niyue commented on code in PR #38632: URL: https://github.com/apache/arrow/pull/38632#discussion_r1386047504 ## cpp/src/gandiva/tests/projector_test.cc: ## @@ -3608,4 +3608,80 @@ TEST_F(TestProjector, TestExtendedFunctions) { EXPECT_ARROW_ARRAY_EQUALS(out, outs.at(0)); } +

Re: [PR] GH-38589: [C++][Gandiva] Support registering external C interface functions [arrow]

2023-11-07 Thread via GitHub
niyue commented on code in PR #38632: URL: https://github.com/apache/arrow/pull/38632#discussion_r1386047275 ## cpp/src/gandiva/tests/projector_test.cc: ## @@ -3608,4 +3608,80 @@ TEST_F(TestProjector, TestExtendedFunctions) { EXPECT_ARROW_ARRAY_EQUALS(out, outs.at(0)); } +

Re: [I] [C++][Gandiva] Enhance random data generation [arrow]

2023-11-07 Thread via GitHub
js8544 commented on issue #38569: URL: https://github.com/apache/arrow/issues/38569#issuecomment-1801159680 > In previous benchmarks, the precision range was set to randomly generate values between low and high I think in both cases precision is fixed. In `DoDecimalAdd2` it's given as

Re: [PR] GH-38589: [C++][Gandiva] Support registering external C interface functions [arrow]

2023-11-07 Thread via GitHub
niyue commented on code in PR #38632: URL: https://github.com/apache/arrow/pull/38632#discussion_r1386042380 ## cpp/src/gandiva/function_registry.h: ## @@ -52,9 +58,24 @@ class GANDIVA_EXPORT FunctionRegistry { arrow::Status Register(const std::vector& funcs,

Re: [PR] GH-38589: [C++][Gandiva] Support registering external C interface functions [arrow]

2023-11-07 Thread via GitHub
niyue commented on code in PR #38632: URL: https://github.com/apache/arrow/pull/38632#discussion_r1386042380 ## cpp/src/gandiva/function_registry.h: ## @@ -52,9 +58,24 @@ class GANDIVA_EXPORT FunctionRegistry { arrow::Status Register(const std::vector& funcs,

Re: [PR] GH-38589: [C++][Gandiva] Support registering external C interface functions [arrow]

2023-11-07 Thread via GitHub
niyue commented on code in PR #38632: URL: https://github.com/apache/arrow/pull/38632#discussion_r1386041943 ## cpp/src/gandiva/function_holder_registry.h: ## @@ -1,80 +0,0 @@ -// Licensed to the Apache Software Foundation (ASF) under one -// or more contributor license agreemen

Re: [PR] GH-38589: [C++][Gandiva] Support registering external C interface functions [arrow]

2023-11-07 Thread via GitHub
niyue commented on code in PR #38632: URL: https://github.com/apache/arrow/pull/38632#discussion_r1386041257 ## cpp/src/gandiva/external_c_interface_functions.cc: ## @@ -0,0 +1,95 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [I] [C++][Gandiva] Enhance random data generation [arrow]

2023-11-07 Thread via GitHub
kou commented on issue #38569: URL: https://github.com/apache/arrow/issues/38569#issuecomment-1801151626 Thanks for summarizing the behavior. String: Are you using https://github.com/apache/arrow/pull/38526/files#diff-b440faf74bbde4937a0a476511319f0c1cc255fbf0fb7372277c5f465df7a970R22

Re: [PR] GH-38589: [C++][Gandiva] Support registering external C interface functions [arrow]

2023-11-07 Thread via GitHub
niyue commented on code in PR #38632: URL: https://github.com/apache/arrow/pull/38632#discussion_r1386037885 ## cpp/src/gandiva/engine.cc: ## @@ -146,8 +146,13 @@ Engine::Engine(const std::shared_ptr& conf, Status Engine::Init() { std::call_once(register_exported_funcs_fla

Re: [PR] MINOR: [JS] Bump eslint from 8.42.0 to 8.52.0 in /js [arrow]

2023-11-07 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38545: URL: https://github.com/apache/arrow/pull/38545#issuecomment-1801149162 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 25c18d8cd6a299f3bb6b72966f2dca357db26399. There were no

Re: [PR] GH-38589: [C++][Gandiva] Support registering external C interface functions [arrow]

2023-11-07 Thread via GitHub
github-actions[bot] commented on PR #38632: URL: https://github.com/apache/arrow/pull/38632#issuecomment-1801144112 :warning: GitHub issue #38589 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-38589: [C++][Gandiva] Support registering external C interface functions [arrow]

2023-11-07 Thread via GitHub
niyue opened a new pull request, #38632: URL: https://github.com/apache/arrow/pull/38632 ### Rationale for this change This PR tries to enhance Gandiva by supporting registering external C interface functions to its function registry, so that developers can author third party functions w

[I] Substrait support for IS NULL and IS NOT NULL operators [arrow-datafusion]

2023-11-07 Thread via GitHub
tgujar opened a new issue, #8087: URL: https://github.com/apache/arrow-datafusion/issues/8087 ### Is your feature request related to a problem or challenge? Substrait producer currently does not support `IS NULL` and `IS NOT NULL` operations. Implementing this feature would allow for

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
vibhatha commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1385998927 ## java/vector/src/main/java/org/apache/arrow/vector/dictionary/BatchedDictionary.java: ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] Minor: remove unnecessary projection in `single_distinct_to_group_by` rule [arrow-datafusion]

2023-11-07 Thread via GitHub
haohuaijin commented on PR #8061: URL: https://github.com/apache/arrow-datafusion/pull/8061#issuecomment-1801098612 > Could you please create a new issue to provide more detail for this pull request? This will make it easier for people to track. 🤔 track in #8086 -- This is an

[I] Remove unnecessary projection in `single_distinct_to_group_by` rule [arrow-datafusion]

2023-11-07 Thread via GitHub
haohuaijin opened a new issue, #8086: URL: https://github.com/apache/arrow-datafusion/issues/8086 ### Is your feature request related to a problem or challenge? In `single_distinct_to_group_by` rule, we used [`Projection`](https://github.com/apache/arrow-datafusion/blob/0506a5cff2c61f

Re: [PR] Array_union implementation [arrow-datafusion]

2023-11-07 Thread via GitHub
edmondop commented on code in PR #7897: URL: https://github.com/apache/arrow-datafusion/pull/7897#discussion_r1385985482 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -1358,6 +1360,94 @@ macro_rules! to_string { }}; } +fn union_generic_lists( +l: &Gener

Re: [PR] Array_union implementation [arrow-datafusion]

2023-11-07 Thread via GitHub
edmondop commented on code in PR #7897: URL: https://github.com/apache/arrow-datafusion/pull/7897#discussion_r1385984708 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -1358,6 +1360,94 @@ macro_rules! to_string { }}; } +fn union_generic_lists( +l: &Gener

Re: [PR] Array_union implementation [arrow-datafusion]

2023-11-07 Thread via GitHub
edmondop commented on code in PR #7897: URL: https://github.com/apache/arrow-datafusion/pull/7897#discussion_r1385985306 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -1358,6 +1360,94 @@ macro_rules! to_string { }}; } +fn union_generic_lists( +l: &Gener

Re: [PR] Array_union implementation [arrow-datafusion]

2023-11-07 Thread via GitHub
edmondop commented on code in PR #7897: URL: https://github.com/apache/arrow-datafusion/pull/7897#discussion_r1385960497 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -1358,6 +1360,94 @@ macro_rules! to_string { }}; } +fn union_generic_lists( +l: &Gener

Re: [PR] Array_union implementation [arrow-datafusion]

2023-11-07 Thread via GitHub
edmondop commented on code in PR #7897: URL: https://github.com/apache/arrow-datafusion/pull/7897#discussion_r1385967435 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -1358,6 +1360,94 @@ macro_rules! to_string { }}; } +fn union_generic_lists( +l: &Gener

Re: [PR] Minor: remove unnecessary projection in `single_distinct_to_group_by` rule [arrow-datafusion]

2023-11-07 Thread via GitHub
haohuaijin commented on code in PR #8061: URL: https://github.com/apache/arrow-datafusion/pull/8061#discussion_r1385972626 ## datafusion/sqllogictest/test_files/tpch/q16.slt.part: ## @@ -69,11 +69,11 @@ physical_plan GlobalLimitExec: skip=0, fetch=10 --SortPreservingMergeExec:

Re: [PR] GH-36388: [C++][Array] Add Overflow check for string Repeat [arrow]

2023-11-07 Thread via GitHub
js8544 commented on code in PR #38504: URL: https://github.com/apache/arrow/pull/38504#discussion_r1385967985 ## cpp/src/arrow/array/util.cc: ## @@ -669,11 +671,18 @@ class RepeatedArrayFactory { enable_if_base_binary Visit(const T&) { const std::shared_ptr& value = scal

Re: [PR] Array_union implementation [arrow-datafusion]

2023-11-07 Thread via GitHub
edmondop commented on code in PR #7897: URL: https://github.com/apache/arrow-datafusion/pull/7897#discussion_r1385969512 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -1752,6 +1752,34 @@ select array_to_string(make_array(), ',') (empty) + +## array_union (alias

Re: [PR] Array_union implementation [arrow-datafusion]

2023-11-07 Thread via GitHub
edmondop commented on code in PR #7897: URL: https://github.com/apache/arrow-datafusion/pull/7897#discussion_r1385967435 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -1358,6 +1360,94 @@ macro_rules! to_string { }}; } +fn union_generic_lists( +l: &Gener

Re: [PR] Array_union implementation [arrow-datafusion]

2023-11-07 Thread via GitHub
edmondop commented on code in PR #7897: URL: https://github.com/apache/arrow-datafusion/pull/7897#discussion_r1385960842 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -1358,6 +1360,94 @@ macro_rules! to_string { }}; } +fn union_generic_lists( +l: &Gener

Re: [PR] Array_union implementation [arrow-datafusion]

2023-11-07 Thread via GitHub
edmondop commented on code in PR #7897: URL: https://github.com/apache/arrow-datafusion/pull/7897#discussion_r1385960497 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -1358,6 +1360,94 @@ macro_rules! to_string { }}; } +fn union_generic_lists( +l: &Gener

Re: [PR] Array_union implementation [arrow-datafusion]

2023-11-07 Thread via GitHub
edmondop commented on code in PR #7897: URL: https://github.com/apache/arrow-datafusion/pull/7897#discussion_r1385958994 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -1358,6 +1360,94 @@ macro_rules! to_string { }}; } +fn union_generic_lists( +l: &Gener

Re: [PR] GH-38498: [Python][Compute] `random` does not respect options arg [arrow]

2023-11-07 Thread via GitHub
js8544 commented on PR #38499: URL: https://github.com/apache/arrow/pull/38499#issuecomment-1801020599 Thank you for your contribution! The change looks good. Could you please fix the CI errors? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] GH-38335: [C++] Implement `GetFileInfo` for a single file in Azure filesystem [arrow]

2023-11-07 Thread via GitHub
kou commented on code in PR #38505: URL: https://github.com/apache/arrow/pull/38505#discussion_r1385857556 ## cpp/src/arrow/filesystem/azurefs.cc: ## @@ -453,27 +457,137 @@ class ObjectInputFile final : public io::RandomAccessFile { class AzureFileSystem::Impl { public: i

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
manolama commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1385884655 ## java/vector/src/test/java/org/apache/arrow/vector/ipc/BaseFileTest.java: ## @@ -846,4 +862,293 @@ protected void validateListAsMapData(VectorSchemaRoot root) {

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
manolama commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1385883271 ## java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowFile.java: ## @@ -131,4 +150,67 @@ public void testFileStreamHasEos() throws IOException { }

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
manolama commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1385882993 ## java/vector/src/test/java/org/apache/arrow/vector/ipc/BaseFileTest.java: ## @@ -846,4 +862,293 @@ protected void validateListAsMapData(VectorSchemaRoot root) {

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
manolama commented on PR #38423: URL: https://github.com/apache/arrow/pull/38423#issuecomment-1800909178 > One thing that would be really great to have alongside this change is an update to the docs https://arrow.apache.org/docs/java/vector.html#dictionary-encoding. Creating an issue to up

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
manolama commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1385882139 ## java/vector/src/main/java/org/apache/arrow/vector/dictionary/BatchedDictionary.java: ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
manolama commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1385882048 ## java/vector/src/main/java/org/apache/arrow/vector/dictionary/BaseDictionary.java: ## @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] GH-37242: [Python][Parquet] Parquet Support write and validate Page CRC [arrow]

2023-11-07 Thread via GitHub
github-actions[bot] commented on PR #38360: URL: https://github.com/apache/arrow/pull/38360#issuecomment-1800906079 :warning: GitHub issue #37242 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] GH-37242: [Python][Parquet] Parquet Support write and validate Page CRC (Take 2) [arrow]

2023-11-07 Thread via GitHub
mapleFU commented on PR #38360: URL: https://github.com/apache/arrow/pull/38360#issuecomment-1800898957 Sorry I'm not clear enought It means waiting for other apache arrow committer's[1] review. Since I'm familiar with C++ parquet but might forgot something in Python part. [1] https:

Re: [I] [Python] Segementation fault when pyarrow is imported in exit handler [arrow]

2023-11-07 Thread via GitHub
gusostow commented on issue #38626: URL: https://github.com/apache/arrow/issues/38626#issuecomment-1800895795 Exit handlers in a complex codebase might incidentally import pyarrow especially via pandas. -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] fix: DataFusion suggests invalid functions [arrow-datafusion]

2023-11-07 Thread via GitHub
jonahgao commented on PR #8083: URL: https://github.com/apache/arrow-datafusion/pull/8083#issuecomment-1800890548 > I wonder if we need to do something similar for window functions? I added the same test to check the `BuiltInWindowFunction`. -- This is an automated message from

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
manolama commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1385850436 ## java/vector/src/test/java/org/apache/arrow/vector/ipc/BaseFileTest.java: ## @@ -846,4 +862,293 @@ protected void validateListAsMapData(VectorSchemaRoot root) {

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
manolama commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1385849993 ## java/vector/src/main/java/org/apache/arrow/vector/ipc/ArrowFileReader.java: ## @@ -164,12 +146,54 @@ public boolean loadNextBatch() throws IOException { Arro

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
manolama commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1385848843 ## java/vector/src/main/java/org/apache/arrow/vector/dictionary/DictionaryProvider.java: ## @@ -79,15 +84,24 @@ public final Set getDictionaryIds() { } @Ov

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
manolama commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1385848500 ## java/vector/src/main/java/org/apache/arrow/vector/dictionary/DictionaryProvider.java: ## @@ -79,15 +84,24 @@ public final Set getDictionaryIds() { } @Ov

Re: [PR] GH-38627: [Java][FlightRPC] Handle null parameter values [arrow]

2023-11-07 Thread via GitHub
aiguofer commented on PR #38628: URL: https://github.com/apache/arrow/pull/38628#issuecomment-1800863317 Ahhh good call! I'll move it up. I still don't know much about Union types but we can handle that later. -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
manolama commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1385845189 ## java/vector/src/main/java/org/apache/arrow/vector/dictionary/BatchedDictionary.java: ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
manolama commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1385840863 ## java/vector/src/main/java/org/apache/arrow/vector/dictionary/BatchedDictionary.java: ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
manolama commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1385840745 ## java/vector/src/main/java/org/apache/arrow/vector/dictionary/BatchedDictionary.java: ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [I] Interoperability between arrow-rs and nanoarrow [arrow-rs]

2023-11-07 Thread via GitHub
tustvold commented on issue #5052: URL: https://github.com/apache/arrow-rs/issues/5052#issuecomment-1800852211 Annotating the relevant RecordBatch we get Good ``` 0x14, 0x00, 0x00, 0x00, // Message offset (20) 0x00, 0x00, 0x00, 0x00, // Message VTable 0x0c, 0x00, // VTa

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
manolama commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1385840032 ## java/memory/memory-core/src/main/java/org/apache/arrow/memory/util/hash/MurmurHasher.java: ## @@ -106,6 +111,36 @@ public static int hashCode(long address, long len

Re: [I] [EPIC] Unify Function Interface (remove `BuiltInScalarFunction`) [arrow-datafusion]

2023-11-07 Thread via GitHub
2010YOUY01 commented on issue #8045: URL: https://github.com/apache/arrow-datafusion/issues/8045#issuecomment-1800831991 > @2010YOUY01 I have updated the task list on this ticket based on your investigation. Please take a look when you have a chance. > > The only one I don't understa

Re: [PR] GH-37753: [C++][Gandiva] Add external function registry support [arrow]

2023-11-07 Thread via GitHub
kou merged PR #38116: URL: https://github.com/apache/arrow/pull/38116 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] GH-37753: [C++][Gandiva] Add external function registry support [arrow]

2023-11-07 Thread via GitHub
kou commented on PR #38116: URL: https://github.com/apache/arrow/pull/38116#issuecomment-1800822623 I'll merge this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] Array_union implementation [arrow-datafusion]

2023-11-07 Thread via GitHub
jayzhan211 commented on code in PR #7897: URL: https://github.com/apache/arrow-datafusion/pull/7897#discussion_r1385786235 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -1358,6 +1360,94 @@ macro_rules! to_string { }}; } +fn union_generic_lists( +l: &Gen

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
jduo commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1385786095 ## java/vector/src/main/java/org/apache/arrow/vector/dictionary/BatchedDictionary.java: ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] Fixing broken link [arrow-datafusion]

2023-11-07 Thread via GitHub
viirya commented on code in PR #8085: URL: https://github.com/apache/arrow-datafusion/pull/8085#discussion_r1385787821 ## docs/source/contributor-guide/index.md: ## @@ -222,7 +222,7 @@ Below is a checklist of what you need to do to add a new scalar function to Data - a new l

Re: [PR] GH-38570: [R] Ensure that test-nix-libs is warning free [arrow]

2023-11-07 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38571: URL: https://github.com/apache/arrow/pull/38571#issuecomment-1800450372 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 7622ded456497911785f997a9e0240033e57042d. There were no

Re: [I] [Python] How to do arrow table group by and split? [arrow]

2023-11-07 Thread via GitHub
willshiao commented on issue #14882: URL: https://github.com/apache/arrow/issues/14882#issuecomment-1800437413 Also ran into this issue, and totally agree that it would be an useful addition to PyArrow. But in the meantime, a workaround is to convert it to a Polars dataframe (see [`polars.

[PR] Fixing broken link [arrow-datafusion]

2023-11-07 Thread via GitHub
edmondop opened a new pull request, #8085: URL: https://github.com/apache/arrow-datafusion/pull/8085 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

[PR] GH-38614: [Java] Add VarBinary and VarCharWriter helper methods to more writers [arrow]

2023-11-07 Thread via GitHub
jduo opened a new pull request, #38631: URL: https://github.com/apache/arrow/pull/38631 ### Rationale for this change Add the overrides for new convenience Writer methods added to VarCharWriter and VarBinaryWriter so that classes that use composition such as UnionWriter and PromotableW

Re: [PR] GH-37242: [Python][Parquet] Parquet Support write and validate Page CRC (Take 2) [arrow]

2023-11-07 Thread via GitHub
frazar commented on PR #38360: URL: https://github.com/apache/arrow/pull/38360#issuecomment-1800430621 What is the meaning of the "awaiting committer review" label? Does the term "committer" mean "the PR author"? -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] [Python][FlightRPC] Segmentation Fault when invoking authenticate concurrently over a same FlightClient [arrow]

2023-11-07 Thread via GitHub
kou commented on issue #38565: URL: https://github.com/apache/arrow/issues/38565#issuecomment-1800428893 > We should document the requirements for Authenticate instead, then. +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Lots of code duplication in array_functions (`array_replace_all`, `array_replace_n`, `array_replace`, etc) [arrow-datafusion]

2023-11-07 Thread via GitHub
jayzhan211 commented on issue #7988: URL: https://github.com/apache/arrow-datafusion/issues/7988#issuecomment-1800427350 > @jayzhan211 Thanks for your kind advice! ~I'll try array_append/prepend first!~ > > It seems that the above functions have been all implemented. Yeah, all

Re: [I] debug: access symbols for libarrow_python_flight.dylib [arrow]

2023-11-07 Thread via GitHub
kou commented on issue #38519: URL: https://github.com/apache/arrow/issues/38519#issuecomment-1800421740 Could you try `cd python && rm -rf build && PYARROW_BUILD_TYPE=debug PYARROW_...=... pip install .` instead of `python setup.py build_ext ...`? It works for me. -- This is an automa

Re: [I] Implement `array_except` function [arrow-datafusion]

2023-11-07 Thread via GitHub
jayzhan211 commented on issue #6979: URL: https://github.com/apache/arrow-datafusion/issues/6979#issuecomment-1800414000 Plan to work on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Interoperability between arrow-rs and nanoarrow [arrow-rs]

2023-11-07 Thread via GitHub
evgenyx00 commented on issue #5052: URL: https://github.com/apache/arrow-rs/issues/5052#issuecomment-1800296011 Apologies for not being clear, the provided non-working/working samples contains only a RecordBatch, without preceding schema, so they that can be tested by nanowarrow example(app

Re: [PR] Add mechanism for verifying that source code in documentation is valid [arrow-datafusion]

2023-11-07 Thread via GitHub
alamb commented on PR #7956: URL: https://github.com/apache/arrow-datafusion/pull/7956#issuecomment-1800276891 I took the liberty of merging this branch to main. While I was testing it it turns out there are a bunch of clippy errors in the library code now such as ``` erro

Re: [PR] Add mechanism for verifying that source code in documentation is valid [arrow-datafusion]

2023-11-07 Thread via GitHub
alamb commented on code in PR #7956: URL: https://github.com/apache/arrow-datafusion/pull/7956#discussion_r1385654811 ## docs/source/library-user-guide/building-logical-plans.md: ## @@ -36,35 +36,7 @@ much easier to use the [LogicalPlanBuilder], which is described in the next s

Re: [PR] Add mechanism for verifying that source code in documentation is valid [arrow-datafusion]

2023-11-07 Thread via GitHub
alamb commented on code in PR #7956: URL: https://github.com/apache/arrow-datafusion/pull/7956#discussion_r1385654811 ## docs/source/library-user-guide/building-logical-plans.md: ## @@ -36,35 +36,7 @@ much easier to use the [LogicalPlanBuilder], which is described in the next s

Re: [I] array functions don't work with `FixedSizeList` [arrow-datafusion]

2023-11-07 Thread via GitHub
alamb commented on issue #8084: URL: https://github.com/apache/arrow-datafusion/issues/8084#issuecomment-1800250248 The same thing probably goes for `LargeList` as well (it isn't supported) -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Minor: remove unnecessary projection in `single_distinct_to_group_by` rule [arrow-datafusion]

2023-11-07 Thread via GitHub
alamb commented on code in PR #8061: URL: https://github.com/apache/arrow-datafusion/pull/8061#discussion_r1385635559 ## datafusion/sqllogictest/test_files/tpch/q16.slt.part: ## @@ -69,11 +69,11 @@ physical_plan GlobalLimitExec: skip=0, fetch=10 --SortPreservingMergeExec: [sup

Re: [PR] Minor: remove unnecessary projection in `single_distinct_to_group_by` rule [arrow-datafusion]

2023-11-07 Thread via GitHub
Weijun-H commented on PR #8061: URL: https://github.com/apache/arrow-datafusion/pull/8061#issuecomment-1800240504 Could you please create a new issue to provide more detail for this pull request? This will make it easier for people to track. 🤔 -- This is an automated message from the Ap

Re: [I] Update Parquet Encoding Documentation [arrow-rs]

2023-11-07 Thread via GitHub
tustvold closed issue #5051: Update Parquet Encoding Documentation URL: https://github.com/apache/arrow-rs/issues/5051 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] GH-38414 [Java] [Vector] Add Delta dictionary support. [arrow]

2023-11-07 Thread via GitHub
danepitkin commented on code in PR #38423: URL: https://github.com/apache/arrow/pull/38423#discussion_r1385614601 ## java/vector/src/main/java/org/apache/arrow/vector/dictionary/BatchedDictionary.java: ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [I] assume_timezone can only handle IANA timezone names, not fixed offsets [arrow]

2023-11-07 Thread via GitHub
MarcoGorelli commented on issue #36558: URL: https://github.com/apache/arrow/issues/36558#issuecomment-1800203953 > it's whether or not the column can be considered to use a single, uniform timezone I don't think Arrow (nor any dataframe library, as far as I'm aware) supports columns

Re: [I] [MATLAB] `arrow.array.BooleanArray`'s `toMATLAB` method does not take slice offsets into account [arrow]

2023-11-07 Thread via GitHub
sgilmore10 commented on issue #38630: URL: https://github.com/apache/arrow/issues/38630#issuecomment-1800190971 The same bug also manifests itself when accessing the Valid property on all sliced arrays. This is because both BooleanArray's toMATLAB method and the Valid getter function call a

Re: [I] [Python] Segementation fault when pyarrow is imported in exit handler [arrow]

2023-11-07 Thread via GitHub
kou commented on issue #38626: URL: https://github.com/apache/arrow/issues/38626#issuecomment-1800161593 Why do you want to do this...? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Update parquet encoding docs [arrow-rs]

2023-11-07 Thread via GitHub
tustvold merged PR #5053: URL: https://github.com/apache/arrow-rs/pull/5053 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

Re: [I] [Python][FlightRPC] Segmentation Fault when invoking authenticate concurrently over a same FlightClient [arrow]

2023-11-07 Thread via GitHub
lidavidm commented on issue #38565: URL: https://github.com/apache/arrow/issues/38565#issuecomment-1800131268 Ah...we'd have to put a reader/writer lock over every method - I'm not sure that's worth the tradeoff. We should document the requirements for Authenticate instead, then. -- This

Re: [I] [MATLAB] `arrow.array.BooleanArray`'s `toMATLAB` method does not take slice offsets into account [arrow]

2023-11-07 Thread via GitHub
sgilmore10 commented on issue #38630: URL: https://github.com/apache/arrow/issues/38630#issuecomment-1800130860 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [I] [MATLAB] Release version 0.1 of the MATLAB interface to Arrow [arrow]

2023-11-07 Thread via GitHub
kevingurney commented on issue #38612: URL: https://github.com/apache/arrow/issues/38612#issuecomment-1800125396 Thanks for asking this question @kou! Sorry, I am just now reading your comment (I didn't notice it before). We actually just started a mailing list discussion about this e

Re: [I] [MATLAB] Add indexing "slice" method to C++ `Array` Proxy class [arrow]

2023-11-07 Thread via GitHub
sgilmore10 commented on issue #38415: URL: https://github.com/apache/arrow/issues/38415#issuecomment-1800123452 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [I] [Python][FlightRPC] Segmentation Fault when invoking authenticate concurrently over a same FlightClient [arrow]

2023-11-07 Thread via GitHub
kou commented on issue #38565: URL: https://github.com/apache/arrow/issues/38565#issuecomment-1800122378 > We could/should probably add a lock for this to be consistent with other methods I think that we can't do it... Because `auth_handler_` is also used in other methods such as

  1   2   3   4   >