Re: [PR] Support Partitioning Data by Dictionary Encoded String Array Types [arrow-datafusion]

2023-10-21 Thread via GitHub
suremarc commented on code in PR #7896: URL: https://github.com/apache/arrow-datafusion/pull/7896#discussion_r1367839079 ## datafusion/core/src/datasource/file_format/write/demux.rs: ## @@ -310,7 +311,37 @@ fn compute_partition_keys_by_row<'a>( for i in 0..rb.nu

Re: [PR] GH-37511: [C++] Implement file reads for Azure filesystem [arrow]

2023-10-21 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38269: URL: https://github.com/apache/arrow/pull/38269#issuecomment-1773992952 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 23dfd0e8643799b803b286e88ca6253303ecb703. There were no

Re: [I] Error when parsing numbers exceeding `INT64::MAX` in a csv file [arrow-datafusion]

2023-10-21 Thread via GitHub
haohuaijin closed issue #7894: Error when parsing numbers exceeding `INT64::MAX` in a csv file URL: https://github.com/apache/arrow-datafusion/issues/7894 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] Error when parsing numbers exceeding `INT64::MAX` in a csv file [arrow-datafusion]

2023-10-21 Thread via GitHub
haohuaijin commented on issue #7894: URL: https://github.com/apache/arrow-datafusion/issues/7894#issuecomment-1773986136 duplicate #3174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] GH-37753: [C++][Gandiva] Add external function registry support [arrow]

2023-10-21 Thread via GitHub
niyue commented on code in PR #38116: URL: https://github.com/apache/arrow/pull/38116#discussion_r1367827763 ## cpp/src/gandiva/function_registry.cc: ## @@ -16,18 +16,23 @@ // under the License. #include "gandiva/function_registry.h" + +#include +#include +#include + +#in

Re: [I] Implement `array_union` function [arrow-datafusion]

2023-10-21 Thread via GitHub
jayzhan211 commented on issue #6981: URL: https://github.com/apache/arrow-datafusion/issues/6981#issuecomment-1773968815 > performing deduplication after would require to pattern match the internal type of the array We may not need pattern matching for Internal type of array. Type c

Re: [PR] GH-37910: [Java][Integration] Implement C Data Interface integration testing [arrow]

2023-10-21 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38248: URL: https://github.com/apache/arrow/pull/38248#issuecomment-1773967091 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit ea936e3506e5b408ff39a2ef762ab5fa7aba72ae. There were no

[I] Panic when using an azure object store with an invalid access key [arrow-rs]

2023-10-21 Thread via GitHub
scsmithr opened a new issue, #4972: URL: https://github.com/apache/arrow-rs/issues/4972 **Describe the bug** When using an azure object store configured with an invalid access key, attempting to use the store will result in a panic. **To Reproduce** Example:

Re: [I] Removing .arrow files without closing Julia seems impossible in Windows [arrow-julia]

2023-10-21 Thread via GitHub
Tortar commented on issue #492: URL: https://github.com/apache/arrow-julia/issues/492#issuecomment-1773954994 Reopening since even with this new information I couldn't get the code to work on Windows in the test of Agents.jl and so...I got a new updated MWE: ```julia using Arrow, D

Re: [PR] GH-38323: [CI][Python] Use system gdb on test-conda-python [arrow]

2023-10-21 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38324: URL: https://github.com/apache/arrow/pull/38324#issuecomment-1773940690 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 883a439076233f3f2836f281614ebe01601ccab9. There were no

[I] ScalarValue::new_list loses timestamp timezone info [arrow-datafusion]

2023-10-21 Thread via GitHub
Dandandan opened a new issue, #7900: URL: https://github.com/apache/arrow-datafusion/issues/7900 ### Describe the bug ScalarValue::new_list loses timestamp timezone info ### To Reproduce _No response_ ### Expected behavior Keep timezone during creation of li

[PR] Maintain time_zone in `ScalarValue::new_list` [arrow-datafusion]

2023-10-21 Thread via GitHub
Dandandan opened a new pull request, #7899: URL: https://github.com/apache/arrow-datafusion/pull/7899 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [I] Removing .arrow files without closing Julia seems impossible in Windows [arrow-julia]

2023-10-21 Thread via GitHub
Tortar commented on issue #492: URL: https://github.com/apache/arrow-julia/issues/492#issuecomment-1773932723 All in all, I think this is probably a problem with DataFrames.jl -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [I] Removing .arrow files without closing Julia seems impossible in Windows [arrow-julia]

2023-10-21 Thread via GitHub
Tortar commented on issue #492: URL: https://github.com/apache/arrow-julia/issues/492#issuecomment-1773931920 ohh ok got it, the dataframe doesn't copy the content of the arrow table, just steal the reference, if one tries: ```julia julia> deleteat!(data_saved, 2) ERROR: Argumen

Re: [I] Removing .arrow files without closing Julia seems impossible in Windows [arrow-julia]

2023-10-21 Thread via GitHub
Tortar commented on issue #492: URL: https://github.com/apache/arrow-julia/issues/492#issuecomment-1773930779 ok strangely enough to me if you add `data_saved = nothing` the file can gets removed without problems. This sounds strange to me since `data_saved` is a `DataFrame` which even if c

Re: [PR] GH-38376 [R]: Add `dimnames` method to `Dataset` class [arrow]

2023-10-21 Thread via GitHub
paleolimbot commented on code in PR #38377: URL: https://github.com/apache/arrow/pull/38377#discussion_r1367802343 ## r/R/dataset.R: ## @@ -527,6 +527,9 @@ names.Dataset <- function(x) names(x$schema) #' @export dim.Dataset <- function(x) c(x$num_rows, x$num_cols) +#' @expor

Re: [PR] feat(csharp/drivers/bigquery): add BigQuery ADBC driver [arrow-adbc]

2023-10-21 Thread via GitHub
CurtHagenlocher commented on code in PR #1192: URL: https://github.com/apache/arrow-adbc/pull/1192#discussion_r1367800438 ## csharp/test/Drivers/FlightSql/Apache.Arrow.Adbc.Tests.Drivers.FlightSql.csproj: ## @@ -1,35 +1,26 @@ - - -net472;net6.0 -disable -False -

Re: [PR] feat(csharp/drivers/bigquery): add BigQuery ADBC driver [arrow-adbc]

2023-10-21 Thread via GitHub
CurtHagenlocher commented on code in PR #1192: URL: https://github.com/apache/arrow-adbc/pull/1192#discussion_r1367800297 ## csharp/src/Drivers/BigQuery/BigQueryStatement.cs: ## @@ -0,0 +1,422 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contri

Re: [PR] feat(csharp/drivers/bigquery): add BigQuery ADBC driver [arrow-adbc]

2023-10-21 Thread via GitHub
CurtHagenlocher commented on code in PR #1192: URL: https://github.com/apache/arrow-adbc/pull/1192#discussion_r1367800198 ## csharp/src/Drivers/BigQuery/BigQueryConnection.cs: ## @@ -0,0 +1,980 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contr

Re: [PR] feat(csharp/drivers/bigquery): add BigQuery ADBC driver [arrow-adbc]

2023-10-21 Thread via GitHub
CurtHagenlocher commented on code in PR #1192: URL: https://github.com/apache/arrow-adbc/pull/1192#discussion_r1367799398 ## csharp/src/Drivers/BigQuery/Apache.Arrow.Adbc.Drivers.BigQuery.csproj: ## @@ -0,0 +1,15 @@ + + +netstandard2.0;net6.0 + + + Review Comment:

Re: [PR] GH-37753: [C++][Gandiva] Add external function registry support [arrow]

2023-10-21 Thread via GitHub
kou commented on code in PR #38116: URL: https://github.com/apache/arrow/pull/38116#discussion_r1367797582 ## cpp/src/gandiva/function_registry.cc: ## @@ -16,18 +16,23 @@ // under the License. #include "gandiva/function_registry.h" + +#include +#include +#include + +#incl

Re: [PR] Initial implementation of array union without deduplication [arrow-datafusion]

2023-10-21 Thread via GitHub
edmondop commented on code in PR #7897: URL: https://github.com/apache/arrow-datafusion/pull/7897#discussion_r1367795554 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -1478,6 +1478,28 @@ macro_rules! to_string { }}; } + +/// Array_union SQL function +pub fn

Re: [I] to_timestamp timeunit to be consistent with postgresql's [arrow-datafusion]

2023-10-21 Thread via GitHub
alamb commented on issue #2979: URL: https://github.com/apache/arrow-datafusion/issues/2979#issuecomment-1773913795 Thanks @waitingkuo -- no rush! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Allow Setting Minimum Parallelism with RowCount Based Demuxer [arrow-datafusion]

2023-10-21 Thread via GitHub
alamb merged PR #7841: URL: https://github.com/apache/arrow-datafusion/pull/7841 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Allow Setting Minimum Parallelism with RowCount Based Demuxer [arrow-datafusion]

2023-10-21 Thread via GitHub
alamb commented on PR #7841: URL: https://github.com/apache/arrow-datafusion/pull/7841#issuecomment-1773913740 LGTM -- thanks again @devinjdangelo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Allow Setting Minimum Parallelism with RowCount Based Demuxer [arrow-datafusion]

2023-10-21 Thread via GitHub
alamb commented on code in PR #7841: URL: https://github.com/apache/arrow-datafusion/pull/7841#discussion_r1367793562 ## datafusion/common/src/config.rs: ## @@ -255,6 +255,12 @@ config_namespace! { /// Number of files to read in parallel when inferring schema and stati

Re: [I] [EPIC] A collection of Interval arithmetic (not Intervals of time) improvements [arrow-datafusion]

2023-10-21 Thread via GitHub
alamb commented on issue #7882: URL: https://github.com/apache/arrow-datafusion/issues/7882#issuecomment-1773913221 > . I don't want to keep you waiting, but I think it will be easier to solve these issues after I quickly finalize my PR and present it to you. There is no particular t

Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

2023-10-21 Thread via GitHub
alamb commented on code in PR #7885: URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367792837 ## datafusion/physical-expr/src/intervals/interval_aritmetic.rs: ## @@ -1079,6 +1094,117 @@ mod tests { Interval::make(lower, upper, (false, false))

Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

2023-10-21 Thread via GitHub
alamb commented on code in PR #7885: URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367792837 ## datafusion/physical-expr/src/intervals/interval_aritmetic.rs: ## @@ -1079,6 +1094,117 @@ mod tests { Interval::make(lower, upper, (false, false))

Re: [PR] Support Interval analysis for `OR` expressions [arrow-datafusion]

2023-10-21 Thread via GitHub
alamb commented on code in PR #7884: URL: https://github.com/apache/arrow-datafusion/pull/7884#discussion_r1367792605 ## datafusion/physical-expr/src/intervals/interval_aritmetic.rs: ## @@ -1240,6 +1278,35 @@ mod tests { Ok(()) } +#[test] +fn or_test() ->

Re: [PR] Support Interval analysis for `OR` expressions [arrow-datafusion]

2023-10-21 Thread via GitHub
alamb commented on code in PR #7884: URL: https://github.com/apache/arrow-datafusion/pull/7884#discussion_r1367792536 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -344,8 +344,8 @@ impl PhysicalExpr for BinaryExpr { let right_interval = children[1];

Re: [PR] Support Interval analysis for `OR` expressions [arrow-datafusion]

2023-10-21 Thread via GitHub
alamb commented on code in PR #7884: URL: https://github.com/apache/arrow-datafusion/pull/7884#discussion_r1367792523 ## datafusion/physical-expr/src/intervals/interval_aritmetic.rs: ## @@ -1240,6 +1270,32 @@ mod tests { Ok(()) } +#[test] +fn or_test() ->

Re: [PR] Support Interval analysis for `OR` expressions [arrow-datafusion]

2023-10-21 Thread via GitHub
alamb commented on code in PR #7884: URL: https://github.com/apache/arrow-datafusion/pull/7884#discussion_r1367792390 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -344,8 +344,8 @@ impl PhysicalExpr for BinaryExpr { let right_interval = children[1];

Re: [I] Consolidate interval analysies from `Interval` and `PruningPredicate` [arrow-datafusion]

2023-10-21 Thread via GitHub
alamb commented on issue #7887: URL: https://github.com/apache/arrow-datafusion/issues/7887#issuecomment-1773911548 > I am not sure this is what you searched for but there was an issue https://github.com/apache/arrow-datafusion/issues/5535. Thank you -- this is exactly what I was loo

Re: [PR] GH-38332: [CI][Release] Resolve symlinks in RAT lint [arrow]

2023-10-21 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38337: URL: https://github.com/apache/arrow/pull/38337#issuecomment-1773910715 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 3cf96b39d04adfd4b9f2cf8ff762d8326312a129. There were no

[PR] Add multi-column topk tests [arrow-datafusion]

2023-10-21 Thread via GitHub
alamb opened a new pull request, #7898: URL: https://github.com/apache/arrow-datafusion/pull/7898 ## Which issue does this PR close? This is a follow on to https://github.com/apache/arrow-datafusion/pull/7772 (❤️ ) by @Tangruilin ## Rationale for this change I would

Re: [PR] Prepare 32.0.0 Release [arrow-datafusion-python]

2023-10-21 Thread via GitHub
andygrove merged PR #525: URL: https://github.com/apache/arrow-datafusion-python/pull/525 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] [test] add fuzz test for topk [arrow-datafusion]

2023-10-21 Thread via GitHub
alamb commented on PR #7772: URL: https://github.com/apache/arrow-datafusion/pull/7772#issuecomment-1773893623 BTW I am working on an extesion of this test to support multiple columns -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] python/adbc_driver_postgresql ingest NOT_IMPLEMENTED when running adbc_ingest with boolean containing table [arrow-adbc]

2023-10-21 Thread via GitHub
lidavidm commented on issue #1216: URL: https://github.com/apache/arrow-adbc/issues/1216#issuecomment-1773888163 [Only on main](https://github.com/apache/arrow-adbc/commit/4e0c25252923f5f4d827dc1cbfd9f84f0107e63a). It will be in the next release (TBD, possibly early November) -- This is

Re: [I] Removing .arrow files without closing Julia seems impossible in Windows [arrow-julia]

2023-10-21 Thread via GitHub
Tortar commented on issue #492: URL: https://github.com/apache/arrow-julia/issues/492#issuecomment-1773881274 thanks anyway for the help @Moelf . To be able to bisect the issue more easily I created this MWE: ```julia using Arrow, DataFrames function writer_arrow(file

Re: [PR] Initial implementation of array union without deduplication [arrow-datafusion]

2023-10-21 Thread via GitHub
comphead commented on code in PR #7897: URL: https://github.com/apache/arrow-datafusion/pull/7897#discussion_r1367772395 ## datafusion/physical-expr/src/array_expressions.rs: ## @@ -1478,6 +1478,28 @@ macro_rules! to_string { }}; } + +/// Array_union SQL function +pub fn

[PR] Prepare 32.0.0 Release [arrow-datafusion-python]

2023-10-21 Thread via GitHub
andygrove opened a new pull request, #525: URL: https://github.com/apache/arrow-datafusion-python/pull/525 # Which issue does this PR close? N/A # Rationale for this change We want to release update bindings for DataFusion 32 # What changes are include

Re: [PR] Small clippy fix [arrow-datafusion-python]

2023-10-21 Thread via GitHub
andygrove merged PR #524: URL: https://github.com/apache/arrow-datafusion-python/pull/524 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [I] Bindings for window functions [arrow-datafusion-python]

2023-10-21 Thread via GitHub
andygrove closed issue #520: Bindings for window functions URL: https://github.com/apache/arrow-datafusion-python/issues/520 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add support for window function bindings [arrow-datafusion-python]

2023-10-21 Thread via GitHub
andygrove merged PR #521: URL: https://github.com/apache/arrow-datafusion-python/pull/521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] Not reexport RecordBatch twice [arrow-rs]

2023-10-21 Thread via GitHub
tustvold commented on PR #4968: URL: https://github.com/apache/arrow-rs/pull/4968#issuecomment-1773867546 I think this isn't worth the downstream churn, but thank you for raising this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Not reexport RecordBatch twice [arrow-rs]

2023-10-21 Thread via GitHub
tustvold closed pull request #4968: Not reexport RecordBatch twice URL: https://github.com/apache/arrow-rs/pull/4968 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] GH-36594: [C++] Don't use MSVC_VERSION to determin -fms-compatibility-version [arrow]

2023-10-21 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #36595: URL: https://github.com/apache/arrow/pull/36595#issuecomment-1773867627 After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 63840f297df681e710a0b1a822235eca54f1c2fb. There was 1 b

Re: [PR] Add Decimal256 sqllogictests for SUM, MEDIAN and COUNT aggregate expressions [arrow-datafusion]

2023-10-21 Thread via GitHub
viirya commented on PR #7889: URL: https://github.com/apache/arrow-datafusion/pull/7889#issuecomment-1773851462 Thank you @Dandandan @alamb @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Not reexport RecordBatch twice [arrow-rs]

2023-10-21 Thread via GitHub
lewiszlw commented on PR #4968: URL: https://github.com/apache/arrow-rs/pull/4968#issuecomment-1773844472 Understand, we might add deprecation info, but feel free to close this pr if you don't think we need this. -- This is an automated message from the Apache Git Service. To respond to t

[PR] Small clippy fix [arrow-datafusion-python]

2023-10-21 Thread via GitHub
andygrove opened a new pull request, #524: URL: https://github.com/apache/arrow-datafusion-python/pull/524 # Which issue does this PR close? N/A # Rationale for this change I ran clippy locally with Rust 1.73 and this change was suggested. # What chang

Re: [I] Error when parsing numbers exceeding `INT64::MAX` in a csv file [arrow-datafusion]

2023-10-21 Thread via GitHub
Weijun-H commented on issue #7894: URL: https://github.com/apache/arrow-datafusion/issues/7894#issuecomment-1773839508 This issue is caused by arrow-rs, not datafusion 🤔 https://github.com/apache/arrow-rs/blob/03d0505fc864c09e6dcd208d3cdddeecefb90345/arrow-csv/src/reader/mod.rs#L889-

Re: [PR] Add support for window function bindings [arrow-datafusion-python]

2023-10-21 Thread via GitHub
andygrove commented on PR #521: URL: https://github.com/apache/arrow-datafusion-python/pull/521#issuecomment-1773837559 @dlovell fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Implement `array_union` function [arrow-datafusion]

2023-10-21 Thread via GitHub
edmondop commented on issue #6981: URL: https://github.com/apache/arrow-datafusion/issues/6981#issuecomment-1773837359 @jayzhan211 I looked deeper in the code, it seems that: - performing deduplication after would require to pattern match the internal type of the array - performing d

[PR] Initial implementation of array union without deduplication [arrow-datafusion]

2023-10-21 Thread via GitHub
edmondop opened a new pull request, #7897: URL: https://github.com/apache/arrow-datafusion/pull/7897 ## Which issue does this PR close? #6981 Closes #. ## Rationale for this change Support additional array_union SQL keyword ## Are these changes tested?

Re: [PR] Add support for window function bindings [arrow-datafusion-python]

2023-10-21 Thread via GitHub
andygrove commented on code in PR #521: URL: https://github.com/apache/arrow-datafusion-python/pull/521#discussion_r1367749996 ## src/expr/window.rs: ## @@ -0,0 +1,297 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] GH-37753: [C++][Gandiva] Add external function registry support [arrow]

2023-10-21 Thread via GitHub
pitrou commented on code in PR #38116: URL: https://github.com/apache/arrow/pull/38116#discussion_r1367748106 ## cpp/src/gandiva/function_registry.cc: ## @@ -16,18 +16,23 @@ // under the License. #include "gandiva/function_registry.h" + +#include +#include +#include + +#i

Re: [PR] Support Partitioning Data by Dictionary Encoded String Array Types [arrow-datafusion]

2023-10-21 Thread via GitHub
devinjdangelo commented on code in PR #7896: URL: https://github.com/apache/arrow-datafusion/pull/7896#discussion_r1367742418 ## datafusion/core/src/datasource/file_format/write/demux.rs: ## @@ -310,7 +311,37 @@ fn compute_partition_keys_by_row<'a>( for i in 0..

Re: [PR] Support Partitioning Data by Dictionary Encoded String Array Types [arrow-datafusion]

2023-10-21 Thread via GitHub
devinjdangelo commented on code in PR #7896: URL: https://github.com/apache/arrow-datafusion/pull/7896#discussion_r1367741674 ## datafusion/core/src/datasource/file_format/write/demux.rs: ## @@ -310,7 +311,37 @@ fn compute_partition_keys_by_row<'a>( for i in 0..

Re: [PR] Support Partitioning Data by Dictionary Encoded String Array Types [arrow-datafusion]

2023-10-21 Thread via GitHub
devinjdangelo commented on code in PR #7896: URL: https://github.com/apache/arrow-datafusion/pull/7896#discussion_r1367742418 ## datafusion/core/src/datasource/file_format/write/demux.rs: ## @@ -310,7 +311,37 @@ fn compute_partition_keys_by_row<'a>( for i in 0..

Re: [PR] Support Partitioning Data by Dictionary Encoded String Array Types [arrow-datafusion]

2023-10-21 Thread via GitHub
devinjdangelo commented on code in PR #7896: URL: https://github.com/apache/arrow-datafusion/pull/7896#discussion_r1367742418 ## datafusion/core/src/datasource/file_format/write/demux.rs: ## @@ -310,7 +311,37 @@ fn compute_partition_keys_by_row<'a>( for i in 0..

Re: [PR] Support Partitioning Data by Dictionary Encoded String Array Types [arrow-datafusion]

2023-10-21 Thread via GitHub
devinjdangelo commented on code in PR #7896: URL: https://github.com/apache/arrow-datafusion/pull/7896#discussion_r1367741674 ## datafusion/core/src/datasource/file_format/write/demux.rs: ## @@ -310,7 +311,37 @@ fn compute_partition_keys_by_row<'a>( for i in 0..

Re: [PR] Support Partitioning Data by Dictionary Encoded String Array Types [arrow-datafusion]

2023-10-21 Thread via GitHub
devinjdangelo commented on code in PR #7896: URL: https://github.com/apache/arrow-datafusion/pull/7896#discussion_r1367741674 ## datafusion/core/src/datasource/file_format/write/demux.rs: ## @@ -310,7 +311,37 @@ fn compute_partition_keys_by_row<'a>( for i in 0..

Re: [I] writing to partitioned table uses the wrong column as partition key [arrow-datafusion]

2023-10-21 Thread via GitHub
devinjdangelo commented on issue #7892: URL: https://github.com/apache/arrow-datafusion/issues/7892#issuecomment-1773815266 @alamb I opened a PR (https://github.com/apache/arrow-datafusion/pull/7896) which in my testing works correctly for your example case, but only in pure rust. In data

Re: [I] Error writing to a partitioned table: : it is not yet supported to write to hive partitions with datatype `Dictionary(UInt16, Utf8)` [arrow-datafusion]

2023-10-21 Thread via GitHub
devinjdangelo commented on issue #7891: URL: https://github.com/apache/arrow-datafusion/issues/7891#issuecomment-1773814667 Ok, I read the `arrow-rs` docs on dictionary array types, so I understand what that means now... I took a stab at solving this in https://github.com/apache/arrow-data

[PR] Support Partitioning Data by Dictionary Encoded String Array Types [arrow-datafusion]

2023-10-21 Thread via GitHub
devinjdangelo opened a new pull request, #7896: URL: https://github.com/apache/arrow-datafusion/pull/7896 ## Which issue does this PR close? Closes #7891 ## Rationale for this change The initial implementation of hive style partition demux code only supported plain UTF8

Re: [I] Unexpected results with group by and random() [arrow-datafusion]

2023-10-21 Thread via GitHub
jonahgao commented on issue #7876: URL: https://github.com/apache/arrow-datafusion/issues/7876#issuecomment-1773813625 @alamb @haohuaijin . I think we can start with the projection plan first because: - It should be relatively simple. - It covers the case mentioned in this issue

Re: [I] Unexpected results with group by and random() [arrow-datafusion]

2023-10-21 Thread via GitHub
jonahgao commented on issue #7876: URL: https://github.com/apache/arrow-datafusion/issues/7876#issuecomment-1773812088 @alamb Agreed. I think we can start with the projection plan first because: - It should be relatively simple. - It covers the case mentioned in this issue, whe

Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

2023-10-21 Thread via GitHub
berkaysynnada commented on code in PR #7885: URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367734649 ## datafusion/physical-expr/src/intervals/interval_aritmetic.rs: ## @@ -1079,6 +1094,117 @@ mod tests { Interval::make(lower, upper, (false, fal

Re: [I] Unexpected results with group by and random() [arrow-datafusion]

2023-10-21 Thread via GitHub
haohuaijin commented on issue #7876: URL: https://github.com/apache/arrow-datafusion/issues/7876#issuecomment-1773811464 > Perhaps we can't push such predicate through Group or Join at all 🤔 When I use a subquery in place of the HAVING clause to meet the requirements in the ticket,

Re: [I] [EPIC] A collection of Interval arithmetic (not Intervals of time) improvements [arrow-datafusion]

2023-10-21 Thread via GitHub
berkaysynnada commented on issue #7882: URL: https://github.com/apache/arrow-datafusion/issues/7882#issuecomment-1773810387 > Thanks @berkaysynnada -- note I just proposed three relatively small changes . Perhaps you have some time to look at them: > > * [Minor: Clarify Boolean `Inte

Re: [PR] GH-38281: [Go] Ensure CData imported arrays are freed on release [arrow]

2023-10-21 Thread via GitHub
conbench-apache-arrow[bot] commented on PR #38314: URL: https://github.com/apache/arrow/pull/38314#issuecomment-1773808722 After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 0428c5ea35c78aff658fd5783a67ba4a6e90703e. There were no

Re: [PR] Support Interval analysis for `OR` expressions [arrow-datafusion]

2023-10-21 Thread via GitHub
berkaysynnada commented on code in PR #7884: URL: https://github.com/apache/arrow-datafusion/pull/7884#discussion_r1367737282 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -344,8 +344,8 @@ impl PhysicalExpr for BinaryExpr { let right_interval = children[1

Re: [PR] Support Interval analysis for `OR` expressions [arrow-datafusion]

2023-10-21 Thread via GitHub
berkaysynnada commented on code in PR #7884: URL: https://github.com/apache/arrow-datafusion/pull/7884#discussion_r1367736695 ## datafusion/physical-expr/src/intervals/interval_aritmetic.rs: ## @@ -1240,6 +1270,32 @@ mod tests { Ok(()) } +#[test] +fn or_t

Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

2023-10-21 Thread via GitHub
berkaysynnada commented on code in PR #7885: URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367735130 ## datafusion/physical-expr/src/intervals/interval_aritmetic.rs: ## @@ -1079,6 +1094,117 @@ mod tests { Interval::make(lower, upper, (false, fal

Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

2023-10-21 Thread via GitHub
berkaysynnada commented on code in PR #7885: URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367734649 ## datafusion/physical-expr/src/intervals/interval_aritmetic.rs: ## @@ -1079,6 +1094,117 @@ mod tests { Interval::make(lower, upper, (false, fal

Re: [PR] feat(csharp/drivers/bigquery): add BigQuery ADBC driver [arrow-adbc]

2023-10-21 Thread via GitHub
CurtHagenlocher commented on code in PR #1192: URL: https://github.com/apache/arrow-adbc/pull/1192#discussion_r1367734177 ## csharp/src/Drivers/BigQuery/BigQueryStatement.cs: ## @@ -0,0 +1,394 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contri

Re: [PR] feat(csharp/drivers/bigquery): add BigQuery ADBC driver [arrow-adbc]

2023-10-21 Thread via GitHub
CurtHagenlocher commented on code in PR #1192: URL: https://github.com/apache/arrow-adbc/pull/1192#discussion_r1367733275 ## csharp/src/Drivers/BigQuery/BigQueryStatement.cs: ## @@ -0,0 +1,394 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contri

Re: [PR] feat(csharp/drivers/bigquery): add BigQuery ADBC driver [arrow-adbc]

2023-10-21 Thread via GitHub
CurtHagenlocher commented on code in PR #1192: URL: https://github.com/apache/arrow-adbc/pull/1192#discussion_r1367732769 ## csharp/src/Drivers/BigQuery/BigQueryStatement.cs: ## @@ -0,0 +1,422 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contri

Re: [PR] Minor: Clarify Boolean `Interval` handling and verify it with a test [arrow-datafusion]

2023-10-21 Thread via GitHub
berkaysynnada commented on code in PR #7885: URL: https://github.com/apache/arrow-datafusion/pull/7885#discussion_r1367732159 ## datafusion/physical-expr/src/intervals/interval_aritmetic.rs: ## @@ -235,18 +247,16 @@ impl Display for Interval { impl Interval { /// Creates a

Re: [PR] feat(csharp/drivers/bigquery): add BigQuery ADBC driver [arrow-adbc]

2023-10-21 Thread via GitHub
CurtHagenlocher commented on code in PR #1192: URL: https://github.com/apache/arrow-adbc/pull/1192#discussion_r1367731304 ## csharp/src/Drivers/BigQuery/BigQueryStatement.cs: ## @@ -0,0 +1,422 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contri

Re: [I] Enable `bloom_filter` on `parquet::arrow::ArrowWriter` increase file size by multiple times. [arrow-rs]

2023-10-21 Thread via GitHub
nooberfsh commented on issue #4970: URL: https://github.com/apache/arrow-rs/issues/4970#issuecomment-1773794055 Thanks for the quick confirmation and the extra information. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Enable `bloom_filter` on `parquet::arrow::ArrowWriter` increase file size by multiple times. [arrow-rs]

2023-10-21 Thread via GitHub
nooberfsh closed issue #4970: Enable `bloom_filter` on `parquet::arrow::ArrowWriter` increase file size by multiple times. URL: https://github.com/apache/arrow-rs/issues/4970 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Change `FileScanConfig.table_partition_cols` from `(String, DataType)` to `Field`s [arrow-datafusion]

2023-10-21 Thread via GitHub
tustvold commented on code in PR #7890: URL: https://github.com/apache/arrow-datafusion/pull/7890#discussion_r1367731053 ## datafusion/core/src/datasource/physical_plan/file_scan_config.rs: ## @@ -101,7 +101,7 @@ pub struct FileScanConfig { /// all records after filtering a

Re: [I] Consolidate interval analysies from `Interval` and `PruningPredicate` [arrow-datafusion]

2023-10-21 Thread via GitHub
berkaysynnada commented on issue #7887: URL: https://github.com/apache/arrow-datafusion/issues/7887#issuecomment-1773792948 I am not sure this is what you searched for but there was an issue https://github.com/apache/arrow-datafusion/issues/5535. Actually, I have tried to apply cp_so

Re: [PR] Add MultiPartStore (#4961) (#4608) [arrow-rs]

2023-10-21 Thread via GitHub
tustvold commented on PR #4971: URL: https://github.com/apache/arrow-rs/pull/4971#issuecomment-1773791599 It would appear localstack doesn't return the ETag when it should, will think about how best to handle this -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Update base64 lib and fix compilation failure in flight_sql.rs [arrow-ballista]

2023-10-21 Thread via GitHub
ahmedriza commented on PR #896: URL: https://github.com/apache/arrow-ballista/pull/896#issuecomment-1773791070 Updated to_proto.rs where there were build failures due to deprecations from the use of timestamp_nanos() which is now deprecated in chrono crate. Replaced with timestamp_nan

Re: [I] Error writing to a partitioned table: : it is not yet supported to write to hive partitions with datatype `Dictionary(UInt16, Utf8)` [arrow-datafusion]

2023-10-21 Thread via GitHub
devinjdangelo commented on issue #7891: URL: https://github.com/apache/arrow-datafusion/issues/7891#issuecomment-1773789915 Hm, I am a little confused why Datafusion is inferring the schema of UTF8 data as Dictionary(some int type, UTF8). :thinking: will have to look into it. It does

Re: [I] Enable `bloom_filter` on `parquet::arrow::ArrowWriter` increase file size by multiple times. [arrow-rs]

2023-10-21 Thread via GitHub
tustvold commented on issue #4970: URL: https://github.com/apache/arrow-rs/issues/4970#issuecomment-1773787778 Yes, the defaults are for a reasonably sized parquet file. You can tweak the bloom filter generation with https://docs.rs/parquet/latest/parquet/file/properties/struct.Write

[PR] Add MultiPartStore (#4961) (#4608) [arrow-rs]

2023-10-21 Thread via GitHub
tustvold opened a new pull request, #4971: URL: https://github.com/apache/arrow-rs/pull/4971 # Which issue does this PR close? Closes #4691 Closes #4608 # Rationale for this change Some use-cases wish to interact with the multipart upload machinery di

Re: [I] Support as_datetime:: [arrow-rs]

2023-10-21 Thread via GitHub
tustvold commented on issue #4969: URL: https://github.com/apache/arrow-rs/issues/4969#issuecomment-1773782566 A DateTime is a point in time, whereas an interval is not, I'm not entirely sure how this would work? -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] Build failure in flight_sql.rs [arrow-ballista]

2023-10-21 Thread via GitHub
ahmedriza commented on issue #895: URL: https://github.com/apache/arrow-ballista/issues/895#issuecomment-1773782161 Updated `to_proto.rs` where there were build failures due to deprecations from the use of `timestamp_nano()` which is now deprecated in `chrono` crate. -- This is an automa

Re: [PR] feat(csharp/drivers/bigquery): add BigQuery ADBC driver [arrow-adbc]

2023-10-21 Thread via GitHub
davidhcoe commented on code in PR #1192: URL: https://github.com/apache/arrow-adbc/pull/1192#discussion_r1367724442 ## csharp/src/Drivers/BigQuery/BigQueryStatement.cs: ## @@ -0,0 +1,422 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor

Re: [PR] GH-35785: [C++] Support null type non-key columns for join operation [arrow]

2023-10-21 Thread via GitHub
github-actions[bot] commented on PR #38383: URL: https://github.com/apache/arrow/pull/38383#issuecomment-1773781617 :warning: GitHub issue #35785 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] feat(csharp/drivers/bigquery): add BigQuery ADBC driver [arrow-adbc]

2023-10-21 Thread via GitHub
davidhcoe commented on code in PR #1192: URL: https://github.com/apache/arrow-adbc/pull/1192#discussion_r1367724442 ## csharp/src/Drivers/BigQuery/BigQueryStatement.cs: ## @@ -0,0 +1,422 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor

Re: [I] Allow multiply input files for table [arrow-datafusion-python]

2023-10-21 Thread via GitHub
andygrove closed issue #518: Allow multiply input files for table URL: https://github.com/apache/arrow-datafusion-python/issues/518 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Allow for multiple input files per table instead of a single file [arrow-datafusion-python]

2023-10-21 Thread via GitHub
andygrove merged PR #519: URL: https://github.com/apache/arrow-datafusion-python/pull/519 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [PR] GH-35785: [C++] Support null type non-key columns for join operation [arrow]

2023-10-21 Thread via GitHub
llama90 commented on PR #38383: URL: https://github.com/apache/arrow/pull/38383#issuecomment-1773775211 There were policy considerations and a personal lack of understanding regarding the code when it came to supporting `Null` types for `key` field columns. Therefore, although there

[PR] GH-35785: [C++] Support null type columns for join operation [arrow]

2023-10-21 Thread via GitHub
llama90 opened a new pull request, #38383: URL: https://github.com/apache/arrow/pull/38383 ### Rationale for this change Supports join on null types for non-key columns. * Issue raised: 35785 ### What changes are included in this PR? * Separated the

Re: [PR] GH-35785: [C++] Support null type non-key columns for join operation [arrow]

2023-10-21 Thread via GitHub
github-actions[bot] commented on PR #38383: URL: https://github.com/apache/arrow/pull/38383#issuecomment-1773774303 :warning: GitHub issue #35785 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

  1   2   >