Re: [PR] GH-39163: [C++] Add missing data copy in StreamDecoder::Consume(data) [arrow]

2024-01-05 Thread via GitHub
kou merged PR #39164: URL: https://github.com/apache/arrow/pull/39164 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

Re: [PR] GH-39163: [C++] Add missing data copy in StreamDecoder::Consume(data) [arrow]

2024-01-05 Thread via GitHub
kou commented on PR #39164: URL: https://github.com/apache/arrow/pull/39164#issuecomment-1879587318 I'll merge this for 15.0.0. If there is a problem in this change, I'll work on it in a follow-up PR. -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] GH-39392: [C++][Parquet] Support page pruning [arrow]

2024-01-05 Thread via GitHub
wgtmac commented on PR #39393: URL: https://github.com/apache/arrow/pull/39393#issuecomment-1879584886 Sorry for chiming in not sooner. I agree that it would be good to break down this PR into smaller ones. Otherwise, it would be challenging to get properly reviewed. To provide some

Re: [I] [C++][Parquet] Support read by row ranges [arrow]

2024-01-05 Thread via GitHub
emkornfield commented on issue #39392: URL: https://github.com/apache/arrow/issues/39392#issuecomment-1879582991 This seems like mostly a dupe of https://github.com/apache/arrow/issues/38865 ? -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] GH-39392: [C++][Parquet] Support page pruning [arrow]

2024-01-05 Thread via GitHub
emkornfield commented on PR #39393: URL: https://github.com/apache/arrow/pull/39393#issuecomment-1879581846 I commented on specific sections but looking over the PR it seems quite different then what was discussed in https://docs.google.com/document/d/1SeVcYudu6uD9rb9zRAnlLGgdauutaNZlAaS0gV

Re: [PR] GH-39392: [C++][Parquet] Support page pruning [arrow]

2024-01-05 Thread via GitHub
emkornfield commented on code in PR #39393: URL: https://github.com/apache/arrow/pull/39393#discussion_r1443649381 ## cpp/src/parquet/row_ranges.h: ## @@ -0,0 +1,201 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. Se

Re: [PR] Change ScalarValue::Struct to ArrayRef [arrow-datafusion]

2024-01-05 Thread via GitHub
jayzhan211 commented on code in PR #7893: URL: https://github.com/apache/arrow-datafusion/pull/7893#discussion_r1443647677 ## datafusion/proto/tests/cases/roundtrip_logical_plan.rs: ## @@ -693,32 +694,6 @@ impl LogicalExtensionCodec for TopKExtensionCodec { } } -#[test]

Re: [PR] GH-39392: [C++][Parquet] Support page pruning [arrow]

2024-01-05 Thread via GitHub
emkornfield commented on code in PR #39393: URL: https://github.com/apache/arrow/pull/39393#discussion_r1443647516 ## cpp/src/parquet/row_ranges.h: ## @@ -0,0 +1,201 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. Se

Re: [PR] GH-39392: [C++][Parquet] Support page pruning [arrow]

2024-01-05 Thread via GitHub
emkornfield commented on code in PR #39393: URL: https://github.com/apache/arrow/pull/39393#discussion_r1443647207 ## cpp/src/arrow/metrics.h: ## Review Comment: it also doesn't seem like it has tests associated with it? -- This is an automated message from the Apache G

Re: [PR] GH-39398: [C++][Parquet] Use std::count in ColumnReader ReadLevels [arrow]

2024-01-05 Thread via GitHub
mapleFU commented on PR #39397: URL: https://github.com/apache/arrow/pull/39397#issuecomment-1879576437 I'll draft a pr for benchmark only and posting the data there -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] GH-39392: [C++][Parquet] Support page pruning [arrow]

2024-01-05 Thread via GitHub
emkornfield commented on code in PR #39393: URL: https://github.com/apache/arrow/pull/39393#discussion_r1443646681 ## cpp/src/arrow/metrics.h: ## Review Comment: this probably belongs in util? -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] GH-39392: [C++][Parquet] Support page pruning [arrow]

2024-01-05 Thread via GitHub
emkornfield commented on PR #39393: URL: https://github.com/apache/arrow/pull/39393#issuecomment-1879576038 I agree with @mapleFU that this PR is very large. I'd recommend at least breaking up the components added to Arrow from the parquet changes (and possibly splitting out the metrics cl

Re: [PR] GH-39398: [C++][Parquet] Use std::count in ColumnReader ReadLevels [arrow]

2024-01-05 Thread via GitHub
emkornfield commented on PR #39397: URL: https://github.com/apache/arrow/pull/39397#issuecomment-1879575149 > Hmmm @emkornfield SKip and Read in parquet-column-reader benchmark get faster in my MacOS, ReadLevels itself doesn't have any benchmark directly, since it's a bit hacking to get it

Re: [PR] Change `ScalarValue::{List, LargeList, FixedSizedList}` to take specific types rather than `ArrayRef` [arrow-datafusion]

2024-01-05 Thread via GitHub
rspears74 commented on PR #8562: URL: https://github.com/apache/arrow-datafusion/pull/8562#issuecomment-1879574562 > Welp, I had a merge conflict, and upon fixing it, there's some new code that's incompatible with these changes (not compiling), so I'm gonna have to do a little more involve

Re: [PR] Change ScalarValue::Struct to ArrayRef [arrow-datafusion]

2024-01-05 Thread via GitHub
jayzhan211 commented on code in PR #7893: URL: https://github.com/apache/arrow-datafusion/pull/7893#discussion_r1443643473 ## datafusion/proto/tests/cases/roundtrip_logical_plan.rs: ## @@ -693,32 +694,6 @@ impl LogicalExtensionCodec for TopKExtensionCodec { } } -#[test]

[PR] Move tests from `expr.rs` to sqllogictests. Part1 [arrow-datafusion]

2024-01-05 Thread via GitHub
comphead opened a new pull request, #8773: URL: https://github.com/apache/arrow-datafusion/pull/8773 ## Which issue does this PR close? Closes partially #8201. ## Rationale for this change Move tests from `expr.rs` to sqllogictests. Moved most of tests, other tests n

Re: [PR] GH-39468: [Java] Fix site build for docs [arrow]

2024-01-05 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #39471: URL: https://github.com/apache/arrow/pull/39471#issuecomment-1879564366 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit b736c99cea9e6b86475e8f2ce264ede3262a237c. There were no

Re: [PR] GH-39439 [C++][Gandiva] Add regex_like [arrow]

2024-01-05 Thread via GitHub
js8544 commented on PR #39441: URL: https://github.com/apache/arrow/pull/39441#issuecomment-1879549308 Thanks for your contribution! It generally looks good to me, just a couple of nits. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] GH-39439 [C++][Gandiva] Add regex_like [arrow]

2024-01-05 Thread via GitHub
js8544 commented on code in PR #39441: URL: https://github.com/apache/arrow/pull/39441#discussion_r1443621316 ## cpp/src/gandiva/regex_functions_holder.cc: ## @@ -275,4 +275,78 @@ const char* ExtractHolder::operator()(ExecutionContext* ctx, const char* user_in return result_

Re: [I] [JS] What is best practice for connecting to an arrow instance in javascript? [arrow]

2024-01-05 Thread via GitHub
jay-bulk commented on issue #36625: URL: https://github.com/apache/arrow/issues/36625#issuecomment-1879549230 I guess I should have been more specific. It would be nice to see a node library for this. I don't have the bandwidth to get that going and I'm not sure the community is asking for

Re: [PR] feat: Add bloom filter statistics to ParquetExec [arrow-datafusion]

2024-01-05 Thread via GitHub
Jefffrey commented on code in PR #8772: URL: https://github.com/apache/arrow-datafusion/pull/8772#discussion_r1443614066 ## datafusion/core/src/datasource/physical_plan/parquet/metrics.rs: ## @@ -29,8 +29,10 @@ use crate::physical_plan::metrics::{ pub struct ParquetFileMetrics

[PR] feat: Add bloom filter statistics to ParquetExec [arrow-datafusion]

2024-01-05 Thread via GitHub
my-vegetable-has-exploded opened a new pull request, #8772: URL: https://github.com/apache/arrow-datafusion/pull/8772 ## Which issue does this PR close? Closes #8767, #8768 ## Rationale for this change ## What changes are included in this PR?

Re: [PR] GH-39439 [C++][Gandiva] Add regex_like [arrow]

2024-01-05 Thread via GitHub
niyue commented on PR #39441: URL: https://github.com/apache/arrow/pull/39441#issuecomment-1879513672 I posted some minor issues, overall it looks good to me. Disclosure: Kun Li (the PR author) is an acquaintance of mine and we are co-workers offline. -- This is an automated messag

Re: [I] c: fix include paths for adbc.h [arrow-adbc]

2024-01-05 Thread via GitHub
rtadepalli commented on issue #1150: URL: https://github.com/apache/arrow-adbc/issues/1150#issuecomment-1879513172 What's the preferred way forward here? I'm working on adding a new `arrow-adbc` directory under `c/` -- want to confirm once that this is ok (does seem like the only way to do

Re: [PR] GH-39439 [C++][Gandiva] Add regex_like [arrow]

2024-01-05 Thread via GitHub
niyue commented on code in PR #39441: URL: https://github.com/apache/arrow/pull/39441#discussion_r1443584894 ## cpp/src/gandiva/regex_functions_holder.cc: ## @@ -275,4 +275,78 @@ const char* ExtractHolder::operator()(ExecutionContext* ctx, const char* user_in return result_b

Re: [PR] GH-39419: [C++][Parquet] Style: Using arrow::Buffer data_as api rather than reinterpret_cast [arrow]

2024-01-05 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #39420: URL: https://github.com/apache/arrow/pull/39420#issuecomment-1879510682 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 01deb9438acde11f1968acd2a0bb5d3e8e4a4cc6. There were 4

Re: [PR] GH-39439 [C++][Gandiva] Add regex_like [arrow]

2024-01-05 Thread via GitHub
niyue commented on code in PR #39441: URL: https://github.com/apache/arrow/pull/39441#discussion_r1443577123 ## cpp/src/gandiva/regex_functions_holder.cc: ## @@ -275,4 +275,78 @@ const char* ExtractHolder::operator()(ExecutionContext* ctx, const char* user_in return result_b

Re: [PR] GH-39439 [C++][Gandiva] Add regex_like [arrow]

2024-01-05 Thread via GitHub
niyue commented on code in PR #39441: URL: https://github.com/apache/arrow/pull/39441#discussion_r1443575394 ## cpp/src/gandiva/regex_functions_holder.cc: ## @@ -275,4 +275,78 @@ const char* ExtractHolder::operator()(ExecutionContext* ctx, const char* user_in return result_b

Re: [PR] GH-39439 [C++][Gandiva] Add regex_like [arrow]

2024-01-05 Thread via GitHub
niyue commented on code in PR #39441: URL: https://github.com/apache/arrow/pull/39441#discussion_r1443568666 ## cpp/src/gandiva/regex_functions_holder_test.cc: ## @@ -635,4 +635,93 @@ TEST_F(TestExtractHolder, TestErrorWhileBuildingHolder) { execution_context_.Reset(); } +

Re: [PR] GH-39439 [C++][Gandiva] Add regex_like [arrow]

2024-01-05 Thread via GitHub
niyue commented on code in PR #39441: URL: https://github.com/apache/arrow/pull/39441#discussion_r1443567097 ## cpp/src/gandiva/regex_functions_holder_test.cc: ## @@ -635,4 +635,93 @@ TEST_F(TestExtractHolder, TestErrorWhileBuildingHolder) { execution_context_.Reset(); } +

Re: [PR] GH-39439 [C++][Gandiva] Add regex_like [arrow]

2024-01-05 Thread via GitHub
niyue commented on code in PR #39441: URL: https://github.com/apache/arrow/pull/39441#discussion_r1443565871 ## cpp/src/gandiva/gdv_function_stubs.h: ## @@ -384,4 +384,13 @@ const char* mask_utf8_utf8(int64_t context, const char* in, int32_t length, GANDIVA_EXPORT const cha

Re: [I] Cannot insert with DMLStatement into a table with non-nullable fields [arrow-datafusion]

2024-01-05 Thread via GitHub
jonahgao commented on issue #8763: URL: https://github.com/apache/arrow-datafusion/issues/8763#issuecomment-1879484539 I will try to fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Implement trait based API for define AggregateUDF [arrow-datafusion]

2024-01-05 Thread via GitHub
guojidan commented on PR #8733: URL: https://github.com/apache/arrow-datafusion/pull/8733#issuecomment-1879481128 > This looks amazing @guojidan -- thank you 🙏 -- let me know if I can help with moving this PR along I want implement `Clean internal implementation` like #8746 in this P

Re: [PR] GH-39484: [Java] Support 256 bit decimals in JdbcToArrowUtils [arrow]

2024-01-05 Thread via GitHub
aiguofer commented on code in PR #39485: URL: https://github.com/apache/arrow/pull/39485#discussion_r1443546283 ## java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java: ## @@ -164,12 +167,21 @@ public static ArrowType getArrowTypeFromJdbcType(final

Re: [PR] Change `ScalarValue::{List, LargeList, FixedSizedList}` to take specific types rather than `ArrayRef` [arrow-datafusion]

2024-01-05 Thread via GitHub
rspears74 commented on PR #8562: URL: https://github.com/apache/arrow-datafusion/pull/8562#issuecomment-1879462609 Welp, I had a merge conflict, and upon fixing it, there's some new code that's incompatible with these changes, so I'm gonna have to do a little more involved work to fix thos

Re: [PR] GH-39484: [Java] Support 256 bit decimals in JdbcToArrowUtils [arrow]

2024-01-05 Thread via GitHub
aiguofer commented on code in PR #39485: URL: https://github.com/apache/arrow/pull/39485#discussion_r1443544489 ## java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java: ## @@ -164,12 +167,21 @@ public static ArrowType getArrowTypeFromJdbcType(final

Re: [PR] GH-39484: [Java] Support 256 bit decimals in JdbcToArrowUtils [arrow]

2024-01-05 Thread via GitHub
aiguofer commented on code in PR #39485: URL: https://github.com/apache/arrow/pull/39485#discussion_r1443544489 ## java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java: ## @@ -164,12 +167,21 @@ public static ArrowType getArrowTypeFromJdbcType(final

Re: [PR] GH-39484: [Java] Support 256 bit decimals in JdbcToArrowUtils [arrow]

2024-01-05 Thread via GitHub
aiguofer commented on code in PR #39485: URL: https://github.com/apache/arrow/pull/39485#discussion_r1443533290 ## java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java: ## @@ -169,7 +171,11 @@ public static ArrowType getArrowTypeFromJdbcType(final

Re: [PR] docs: document SessionConfig [arrow-datafusion]

2024-01-05 Thread via GitHub
wjones127 commented on code in PR #8771: URL: https://github.com/apache/arrow-datafusion/pull/8771#discussion_r1443534062 ## datafusion/common/src/config.rs: ## @@ -306,7 +322,7 @@ config_namespace! { pub metadata_size_hint: Option, default = None /// If true

[PR] docs: document SessionConfig [arrow-datafusion]

2024-01-05 Thread via GitHub
wjones127 opened a new pull request, #8771: URL: https://github.com/apache/arrow-datafusion/pull/8771 ## Which issue does this PR close? Closes #8770. ## Rationale for this change It took me a surprisingly long time to figure out how to set these configuration options.

Re: [PR] GH-39484: [Java] Support 256 bit decimals in JdbcToArrowUtils [arrow]

2024-01-05 Thread via GitHub
aiguofer commented on code in PR #39485: URL: https://github.com/apache/arrow/pull/39485#discussion_r1443533290 ## java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowUtils.java: ## @@ -169,7 +171,11 @@ public static ArrowType getArrowTypeFromJdbcType(final

[PR] GH-39484: [Java] Support 256 bit decimals in JdbcToArrowUtils [arrow]

2024-01-05 Thread via GitHub
aiguofer opened a new pull request, #39485: URL: https://github.com/apache/arrow/pull/39485 ### Rationale for this change This PR allows users of `JdbcToArrowUtils` to convert 256 bit decimals. ### What changes are included in this PR? Add a `Decimal256Consumer` a

Re: [PR] GH-39484: [Java] Support 256 bit decimals in JdbcToArrowUtils [arrow]

2024-01-05 Thread via GitHub
github-actions[bot] commented on PR #39485: URL: https://github.com/apache/arrow/pull/39485#issuecomment-1879420623 :warning: GitHub issue #39484 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[I] Document usage of SessionConfig [arrow-datafusion]

2024-01-05 Thread via GitHub
wjones127 opened a new issue, #8770: URL: https://github.com/apache/arrow-datafusion/issues/8770 ### Is your feature request related to a problem or challenge? When I was trying to configure DataFusion, I encountered two roadblocks while reading the documentation: 1. Once I've

Re: [PR] GH-38861: [C++] Add missing "-framework Security" to Libs.private in arrow.pc [arrow]

2024-01-05 Thread via GitHub
jeroen commented on PR #38869: URL: https://github.com/apache/arrow/pull/38869#issuecomment-1879394316 @kou there is another one, could you maybe also add libcurl in the same way? On MacOS ARM64 I get: ```sh Undefined symbols for architecture arm64: "_curl_multi_poll", ref

Re: [PR] Change `ScalarValue::{List, LargeList, FixedSizedList}` to take specific types rather than `ArrayRef` [arrow-datafusion]

2024-01-05 Thread via GitHub
rspears74 commented on code in PR #8562: URL: https://github.com/apache/arrow-datafusion/pull/8562#discussion_r1443528261 ## datafusion/common/src/scalar.rs: ## @@ -359,29 +359,77 @@ impl PartialOrd for ScalarValue { (FixedSizeBinary(_, _), _) => None,

Re: [PR] GH-38772: [C++] Implement directory semantics even when the storage account doesn't support HNS [arrow]

2024-01-05 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #39361: URL: https://github.com/apache/arrow/pull/39361#issuecomment-1879380822 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit aae6fa40b458a90c598df281fdc8fc023e05a262. There were no

Re: [PR] GH-39449: [C++] Use default Azure credentials implicitly and support anonymous credentials explicitly [arrow]

2024-01-05 Thread via GitHub
felipecrv merged PR #39450: URL: https://github.com/apache/arrow/pull/39450 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

Re: [PR] Change `ScalarValue::{List, LargeList, FixedSizedList}` to take specific types rather than `ArrayRef` [arrow-datafusion]

2024-01-05 Thread via GitHub
rspears74 commented on code in PR #8562: URL: https://github.com/apache/arrow-datafusion/pull/8562#discussion_r1443521603 ## datafusion/proto/src/logical_plan/to_proto.rs: ## @@ -1189,24 +1188,79 @@ impl TryFrom<&ScalarValue> for protobuf::ScalarValue { sche

Re: [PR] Object_store: get_file and put_file [arrow-rs]

2024-01-05 Thread via GitHub
tustvold commented on code in PR #5281: URL: https://github.com/apache/arrow-rs/pull/5281#discussion_r1443512351 ## object_store/src/local.rs: ## @@ -1082,6 +1097,58 @@ fn convert_walkdir_result( } } + +/// Download a remote object to a local [`File`] +pub async fn uploa

Re: [PR] GH-39289: [JS] Add types to exports [arrow]

2024-01-05 Thread via GitHub
domoritz merged PR #39475: URL: https://github.com/apache/arrow/pull/39475 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

Re: [PR] GH-38936: [JS] initialize overrides for DOM and Node in IIFE [arrow]

2024-01-05 Thread via GitHub
trxcllnt commented on PR #39472: URL: https://github.com/apache/arrow/pull/39472#issuecomment-1879324985 What impact would setting `sideEffects: true` have? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] GH-39289: [JS] Add types to exports [arrow]

2024-01-05 Thread via GitHub
domoritz commented on code in PR #39475: URL: https://github.com/apache/arrow/pull/39475#discussion_r1443438135 ## js/gulp/package-task.js: ## @@ -54,18 +57,17 @@ const createMainPackageJson = (target, format) => (orig) => ({ node: { import: `./${m

Re: [PR] GH-38936: [JS] initialize overrides for DOM and Node in IIFE [arrow]

2024-01-05 Thread via GitHub
trxcllnt commented on PR #39472: URL: https://github.com/apache/arrow/pull/39472#issuecomment-1879324088 @domoritz yes, I believe so. I'm not sure if there's a clean way to do this... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] Minor: Fix flake in newly added dictionary.slt test [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb opened a new pull request, #8769: URL: https://github.com/apache/arrow-datafusion/pull/8769 ## Which issue does this PR close? Related to https://github.com/apache/arrow-datafusion/pull/8750 ## Rationale for this change I added new tests in https://github.com/apach

Re: [PR] GH-39482: [JS] refactor type imports [arrow]

2024-01-05 Thread via GitHub
trxcllnt commented on code in PR #39483: URL: https://github.com/apache/arrow/pull/39483#discussion_r1443421265 ## js/bin/integration.ts: ## @@ -17,8 +17,8 @@ // specific language governing permissions and limitations // under the License. -import * as fs from 'fs'; -import

Re: [PR] Convert Binary Operator `StringConcat` to Function for `array_concat`, `array_append` and `array_prepend` [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on PR #8636: URL: https://github.com/apache/arrow-datafusion/pull/8636#issuecomment-1879318590 Thanks again @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Convert Binary Operator `StringConcat` to Function for `array_concat`, `array_append` and `array_prepend` [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb merged PR #8636: URL: https://github.com/apache/arrow-datafusion/pull/8636 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] GH-38936: [JS] initialize overrides for DOM and Node in IIFE [arrow]

2024-01-05 Thread via GitHub
domoritz commented on PR #39472: URL: https://github.com/apache/arrow/pull/39472#issuecomment-1879318440 I see. The issue with my code is that the method might still be thrown away? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Change `ScalarValue::{List, LargeList, FixedSizedList}` to take specific types rather than `ArrayRef` [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on code in PR #8562: URL: https://github.com/apache/arrow-datafusion/pull/8562#discussion_r1443407346 ## datafusion/proto/src/logical_plan/to_proto.rs: ## @@ -1189,24 +1188,79 @@ impl TryFrom<&ScalarValue> for protobuf::ScalarValue { schema:

Re: [PR] GH-39047: [JS] Enable test for generate_primitive_large_offsets_case [arrow]

2024-01-05 Thread via GitHub
domoritz merged PR #39470: URL: https://github.com/apache/arrow/pull/39470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

Re: [PR] GH-39366: [JS] Add largeUtf8 to benchmark [arrow]

2024-01-05 Thread via GitHub
domoritz merged PR #39367: URL: https://github.com/apache/arrow/pull/39367 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

Re: [PR] GH-39289: [JS] Add types to exports [arrow]

2024-01-05 Thread via GitHub
domoritz commented on PR #39475: URL: https://github.com/apache/arrow/pull/39475#issuecomment-1879311179 Thanks! I made the updates. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Release arrow-rs version 50.0.0 [arrow-rs]

2024-01-05 Thread via GitHub
tustvold commented on issue #5234: URL: https://github.com/apache/arrow-rs/issues/5234#issuecomment-1879302087 It intend to cut the release on Monday -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Add ForeignKey constraint type [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on code in PR #8566: URL: https://github.com/apache/arrow-datafusion/pull/8566#discussion_r1443401501 ## datafusion/common/src/functional_dependencies.rs: ## @@ -97,8 +108,39 @@ impl Constraints { Constraint::Unique(indices)

Re: [PR] GH-38998: [Java] Build memory-core and memory-unsafe as JPMS modules [arrow]

2024-01-05 Thread via GitHub
jduo commented on PR #39011: URL: https://github.com/apache/arrow/pull/39011#issuecomment-1879296034 > If we're going to include this with Arrow 15, we should get on the doc changes immediately. OK, I have added the doc change to this PR instead. -- This is an automated message fro

[PR] GH-39482: refactor type imports [arrow]

2024-01-05 Thread via GitHub
domoritz opened a new pull request, #39483: URL: https://github.com/apache/arrow/pull/39483 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

Re: [I] Improve Parallel Reading (CSV, JSON) / Help Wanted [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on issue #8723: URL: https://github.com/apache/arrow-datafusion/issues/8723#issuecomment-1879294920 I didn't have a chance to review https://github.com/marvinlanhenke/arrow-datafusion/blob/poc_optimize_get_req/datafusion/core/src/datasource/physical_plan/json.rs#L232-L381 i

Re: [I] Release arrow-rs version 50.0.0 [arrow-rs]

2024-01-05 Thread via GitHub
changhiskhan commented on issue #5234: URL: https://github.com/apache/arrow-rs/issues/5234#issuecomment-1879287472 Hi just curious if this is still happening this week or early next week. We're waiting for a bug fix in arrow-rs plus the next datafusion release to support upsert in Lance. Th

Re: [PR] Add http(s) support to the command line [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on PR #8753: URL: https://github.com/apache/arrow-datafusion/pull/8753#issuecomment-1879274741 Thank you @kcolford and @Jefffrey for the review. It would be great to address @Jefffrey 's comments I tried this out on the ClickBench file fetched via http and it worke

Re: [PR] Implement trait based API for define AggregateUDF [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on PR #8733: URL: https://github.com/apache/arrow-datafusion/pull/8733#issuecomment-1879267369 This looks amazing @guojidan -- thank you 🙏 -- let me know if I can help with moving this PR along -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] [MINOR] CLI error handling on streaming use cases [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb merged PR #8761: URL: https://github.com/apache/arrow-datafusion/pull/8761 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Add note on using larger row group size [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on PR #8745: URL: https://github.com/apache/arrow-datafusion/pull/8745#issuecomment-1879261515 I took the liberty of pushing a a commit to fix the CI failures -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] feat: support `largelist` in `array_to_string` [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb merged PR #8729: URL: https://github.com/apache/arrow-datafusion/pull/8729 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [I] Clean internal implementation of WindowUDF to use WindowUDFImpl (rather than the function pointers) [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb closed issue #8734: Clean internal implementation of WindowUDF to use WindowUDFImpl (rather than the function pointers) URL: https://github.com/apache/arrow-datafusion/issues/8734 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Clean internal implementation of WindowUDF [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb merged PR #8746: URL: https://github.com/apache/arrow-datafusion/pull/8746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] GH-39289: [JS] Add types to exports [arrow]

2024-01-05 Thread via GitHub
andrew0 commented on PR #39475: URL: https://github.com/apache/arrow/pull/39475#issuecomment-1879255801 I believe that this should fix the types for all the import scenarios: https://github.com/andrew0/arrow/commit/207881a857ada94a86c4a371da2cf89d5c6335e4 https://github.com/apache/arr

Re: [PR] Prepare object_store 0.9.0 [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on code in PR #8758: URL: https://github.com/apache/arrow-datafusion/pull/8758#discussion_r1443365340 ## datafusion-cli/src/exec.rs: ## @@ -340,14 +340,6 @@ mod tests { let session_token = "fake_session_token"; let location = "s3://bucket/path/f

Re: [I] Regression: Unneeded fields pushed to TableProvider if struct field is part of query [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on issue #8735: URL: https://github.com/apache/arrow-datafusion/issues/8735#issuecomment-1879252687 So the regression is that projection pushdown isn't working correctly. Maybe this is related to https://github.com/apache/arrow-datafusion/pull/8073 from @berkaysynnada

Re: [PR] GH-39289: [JS] Add types to exports [arrow]

2024-01-05 Thread via GitHub
andrew0 commented on PR #39475: URL: https://github.com/apache/arrow/pull/39475#issuecomment-1879251883 I ran `yarn build && npx @arethetypeswrong/cli --pack targets/apache-arrow`, and it gives these errors: https://github.com/apache/arrow/assets/739172/d642f026-a2a8-4886-9a20-9c7281a

Re: [I] Allow projection of schemas/structs [arrow]

2024-01-05 Thread via GitHub
Fokko commented on issue #38615: URL: https://github.com/apache/arrow/issues/38615#issuecomment-1879251134 Still running into this. I would expect something like below to work: ```python In [1]: import pyarrow as pa In [2]: current_schema = pa.schema([ ...: ^Ipa.field("

Re: [I] Cannot insert with DMLStatement into a table with non-nullable fields [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on issue #8763: URL: https://github.com/apache/arrow-datafusion/issues/8763#issuecomment-1879248006 This definitely sounds like a bug -- thank you for filing this @rebasedming I think @jonahgao has worked on something similar -- This is an automated message from t

Re: [I] Consolidate DDL / Catalog manipulation LogicalPlans (refactor) [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on issue #3349: URL: https://github.com/apache/arrow-datafusion/issues/3349#issuecomment-1879247113 I believe all such plans are now part of LogicalPlan::Dml / LogicalPlan::Ddl so closing this issue https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Lo

Re: [I] Consolidate DDL / Catalog manipulation LogicalPlans (refactor) [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb closed issue #3349: Consolidate DDL / Catalog manipulation LogicalPlans (refactor) URL: https://github.com/apache/arrow-datafusion/issues/3349 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] Cannot insert with DMLStatement into a table with non-nullable fields [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on issue #8763: URL: https://github.com/apache/arrow-datafusion/issues/8763#issuecomment-1879245883 https://github.com/apache/arrow-datafusion/issues/7693 and https://github.com/apache/arrow-datafusion/issues/7636 may be related -- This is an automated message from the Ap

Re: [I] Implement monotonicity for ScalarUDF [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on issue #8756: URL: https://github.com/apache/arrow-datafusion/issues/8756#issuecomment-1879241548 > cc @alamb, Is there any mistake in my understanding? if not I will implement this. Thank you @guojidan -- I agree this looks good. I also added it to the tracking e

Re: [I] [DISCUSSION] We need a Hero for datafusion-python [arrow-datafusion-python]

2024-01-05 Thread via GitHub
alamb commented on issue #440: URL: https://github.com/apache/arrow-datafusion-python/issues/440#issuecomment-1879239594 > Is there anyone currently adding pyi files for datafusion-python? I have experience in this area and I would like to be involved in this work Thanks @woxiaosa -

Re: [PR] fix guarantees in allways_true of PruningPredicate [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on PR #8732: URL: https://github.com/apache/arrow-datafusion/pull/8732#issuecomment-1879239123 > * One PR to add the new metrics to distinguish filtering on bloom filters vs statistics > * One PR with some integration tests to verify bloom filters are actually

[I] Add bloom filter integration tests [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb opened a new issue, #8768: URL: https://github.com/apache/arrow-datafusion/issues/8768 ### Is your feature request related to a problem or challenge? We (well, really I) introduced a regression applying BloomFilters in ParquetExec https://github.com/apache/arrow-datafusion/issue

[I] Add bloom filter statistics to ParquetExec [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb opened a new issue, #8767: URL: https://github.com/apache/arrow-datafusion/issues/8767 ### Is your feature request related to a problem or challenge? It appears that there is no good way to know if the bloom filter code is working via logging or metrics 🤔 We have metrics

Re: [PR] Move `repartition_file_scans` out of `enable_round_robin` check in `EnforceDistribution` rule [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb merged PR #8731: URL: https://github.com/apache/arrow-datafusion/pull/8731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Minor: Add documentation about stream cancellation [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb merged PR #8747: URL: https://github.com/apache/arrow-datafusion/pull/8747 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Minor: Use caster check for column name in schema merge [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb merged PR #8765: URL: https://github.com/apache/arrow-datafusion/pull/8765 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] ci(python): Add cibuildwheel setup for Python wheels [arrow-nanoarrow]

2024-01-05 Thread via GitHub
codecov-commenter commented on PR #353: URL: https://github.com/apache/arrow-nanoarrow/pull/353#issuecomment-1879228682 ## [Codecov](https://app.codecov.io/gh/apache/arrow-nanoarrow/pull/353?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_t

Re: [I] Make a faster way to check column existence in optimizer (not `is_err()`) [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on issue #5309: URL: https://github.com/apache/arrow-datafusion/issues/5309#issuecomment-1879228505 > @alamb sorry for delay here, I went down a rabbit hole of trying to get some good memory / allocation benchmarks as a i really wanted to be able measure / compare cause (al

Re: [I] DataFusion weekly project plan (Andrew Lamb) - Jan 1, 2024 [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on issue #8704: URL: https://github.com/apache/arrow-datafusion/issues/8704#issuecomment-1879228062 DataFusion - [ ] https://github.com/apache/arrow-datafusion/issues/8723 - [ ] https://github.com/apache/arrow-datafusion/pull/8562 - [ ] https://github.com/apache/arro

[PR] ci(python): Add cibuildwheel setup for Python wheels [arrow-nanoarrow]

2024-01-05 Thread via GitHub
paleolimbot opened a new pull request, #353: URL: https://github.com/apache/arrow-nanoarrow/pull/353 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] GH-39289: [JS] Add types to exports [arrow]

2024-01-05 Thread via GitHub
andrew0 commented on code in PR #39475: URL: https://github.com/apache/arrow/pull/39475#discussion_r1443344349 ## js/gulp/package-task.js: ## @@ -54,18 +57,17 @@ const createMainPackageJson = (target, format) => (orig) => ({ node: { import: `./${ma

Re: [PR] MINOR: [Java] Bump com.google.errorprone:error_prone_core from 2.4.0 to 2.24.0 in /java [arrow]

2024-01-05 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #39452: URL: https://github.com/apache/arrow/pull/39452#issuecomment-1879218339 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 42b995b4f8de239da2be17430706cf4eb795ac50. There were no

Re: [I] Materialize Dictionaries in Group Keys [arrow-datafusion]

2024-01-05 Thread via GitHub
alamb commented on issue #7647: URL: https://github.com/apache/arrow-datafusion/issues/7647#issuecomment-1879216716 I want to be clear that I have no particular evidence one way or the other about the performance implications of this particular change (and I probably confused the issue wit

  1   2   3   4   >