Re: [PR] GH-37242: [Python][Parquet] Parquet Support write and validate Page CRC [arrow]

2023-11-10 Thread via GitHub
frazar commented on PR #38360: URL: https://github.com/apache/arrow/pull/38360#issuecomment-1806736708 Fixed now! The check for legacy dataset was failing _always_ due to the default value being not None. The check is now fixed. -- This is an automated message from the Apache Git Service.

Re: [PR] Convert list array and non-list array to scalars [arrow-datafusion]

2023-11-10 Thread via GitHub
jayzhan211 commented on code in PR #7862: URL: https://github.com/apache/arrow-datafusion/pull/7862#discussion_r1390146278 ## datafusion/common/src/scalar.rs: ## @@ -1203,7 +1202,7 @@ impl ScalarValue { scalars.into_iter().map(|x| match x {

Re: [PR] Introduce `array_except` function [arrow-datafusion]

2023-11-10 Thread via GitHub
jayzhan211 commented on code in PR #8135: URL: https://github.com/apache/arrow-datafusion/pull/8135#discussion_r1390135541 ## datafusion/proto/src/logical_plan/from_proto.rs: ## @@ -1342,6 +1347,10 @@ pub fn parse_expr( parse_expr(&args[0], registry)?,

Re: [PR] GH-37199: [C++] Expose a span converter for Buffer and ArraySpan [arrow]

2023-11-10 Thread via GitHub
jsjtxietian commented on code in PR #38027: URL: https://github.com/apache/arrow/pull/38027#discussion_r1390143751 ## cpp/src/arrow/array/data.h: ## @@ -434,6 +434,14 @@ struct ARROW_EXPORT ArraySpan { return GetValues(i, this->offset); } + // Access a buffer's data a

Re: [I] [EPIC] Unify Function Interface (remove `BuiltInScalarFunction`) [arrow-datafusion]

2023-11-10 Thread via GitHub
2010YOUY01 commented on issue #8045: URL: https://github.com/apache/arrow-datafusion/issues/8045#issuecomment-1806701696 > Thanks for driving this forward @2010YOUY01 -- it is very much appreciated. > > I am planning to merge #8079 in 4 more days (after it has been open for a week to

Re: [PR] Introduce `array_except` function [arrow-datafusion]

2023-11-10 Thread via GitHub
jayzhan211 commented on code in PR #8135: URL: https://github.com/apache/arrow-datafusion/pull/8135#discussion_r1390135726 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2454,6 +2454,113 @@ select list_has_all(make_array(1,2,3), make_array(4,5,6)), false true fa

Re: [PR] Introduce `array_except` function [arrow-datafusion]

2023-11-10 Thread via GitHub
jayzhan211 commented on code in PR #8135: URL: https://github.com/apache/arrow-datafusion/pull/8135#discussion_r1390135726 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2454,6 +2454,113 @@ select list_has_all(make_array(1,2,3), make_array(4,5,6)), false true fa

Re: [PR] Introduce `array_except` function [arrow-datafusion]

2023-11-10 Thread via GitHub
jayzhan211 commented on code in PR #8135: URL: https://github.com/apache/arrow-datafusion/pull/8135#discussion_r1390135541 ## datafusion/proto/src/logical_plan/from_proto.rs: ## @@ -1342,6 +1347,10 @@ pub fn parse_expr( parse_expr(&args[0], registry)?,

Re: [PR] Introduce `array_except` function [arrow-datafusion]

2023-11-10 Thread via GitHub
jayzhan211 commented on code in PR #8135: URL: https://github.com/apache/arrow-datafusion/pull/8135#discussion_r1390135541 ## datafusion/proto/src/logical_plan/from_proto.rs: ## @@ -1342,6 +1347,10 @@ pub fn parse_expr( parse_expr(&args[0], registry)?,

[PR] Minor: fix ci break [arrow-datafusion]

2023-11-10 Thread via GitHub
haohuaijin opened a new pull request, #8136: URL: https://github.com/apache/arrow-datafusion/pull/8136 ## Which issue does this PR close? Closes #. ## Rationale for this change fix ci break after #8081 merge ## What changes are included in this PR?

Re: [PR] GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats [arrow]

2023-11-10 Thread via GitHub
felipecrv commented on code in PR #35345: URL: https://github.com/apache/arrow/pull/35345#discussion_r1389980823 ## cpp/src/arrow/util/list_util.cc: ## @@ -0,0 +1,353 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. S

[PR] Introduce `array_except` function [arrow-datafusion]

2023-11-10 Thread via GitHub
jayzhan211 opened a new pull request, #8135: URL: https://github.com/apache/arrow-datafusion/pull/8135 ## Which issue does this PR close? Closes #6979 ## Rationale for this change ## What changes are included in this PR? ## Are these change

[PR] Fix typo in partitioning.rs [arrow-datafusion]

2023-11-10 Thread via GitHub
lewiszlw opened a new pull request, #8134: URL: https://github.com/apache/arrow-datafusion/pull/8134 ## Which issue does this PR close? Fix typo in partitioning.rs ## Rationale for this change ## What changes are included in this PR? ## Are

[PR] test: Added more tests for verifying GetObjects (Depths and Patterns) [arrow-adbc]

2023-11-10 Thread via GitHub
ryan-syed opened a new pull request, #1284: URL: https://github.com/apache/arrow-adbc/pull/1284 Added tests to verify GetObjects for different depths like: Catalogs DbSchemas Tables All Also added tests to verify GetObjects with catalog, schema, and table names passed as patt

Re: [PR] GetObjects call is slow even when filters are provided [arrow-adbc]

2023-11-10 Thread via GitHub
github-actions[bot] commented on PR #1285: URL: https://github.com/apache/arrow-adbc/pull/1285#issuecomment-1806638489 :warning: Please follow the [Conventional Commits format in CONTRIBUTING.md](https://github.com/apache/arrow-adbc/blob/main/CONTRIBUTING.md) for PR titles. -- This is an

[PR] GetObjects call is slow even when filters are provided [arrow-adbc]

2023-11-10 Thread via GitHub
ryan-syed opened a new pull request, #1285: URL: https://github.com/apache/arrow-adbc/pull/1285 replaced cursor calls with static calls and filtered early wherever possible -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Implement `array_intersect` function [arrow-datafusion]

2023-11-10 Thread via GitHub
xudong963 closed issue #6978: Implement `array_intersect` function URL: https://github.com/apache/arrow-datafusion/issues/6978 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Implementation of `array_intersect` [arrow-datafusion]

2023-11-10 Thread via GitHub
xudong963 merged PR #8081: URL: https://github.com/apache/arrow-datafusion/pull/8081 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] MINOR: [Dev] Add Bryce Mecum to list of collaborators [arrow]

2023-11-10 Thread via GitHub
raulcd opened a new pull request, #38678: URL: https://github.com/apache/arrow/pull/38678 ### Rationale for this change Disclaimer: is a colleague of mine. Opening PR as documented on: https://arrow.apache.org/docs/developers/reviewing.html#collaborators Bryce Mecum (@amoeb

Re: [PR] GH-38653: Raise the minimum macOS version to 10.15 catalina to allow using new APIs in C++17 [arrow]

2023-11-10 Thread via GitHub
assignUser commented on PR #38677: URL: https://github.com/apache/arrow/pull/38677#issuecomment-1806613089 We build arrow C++ on 10.13 for the R package, because CRAN still builds on that (hopefully only until next april...). We do this succesfully by disabeling availability checks by addin

Re: [PR] GH-38255: [Go][C++] Implement Flight SQL Bulk Ingestion [arrow]

2023-11-10 Thread via GitHub
lidavidm commented on code in PR #38385: URL: https://github.com/apache/arrow/pull/38385#discussion_r1390085868 ## format/FlightSql.proto: ## @@ -1778,6 +1794,47 @@ message CommandPreparedStatementUpdate { bytes prepared_statement_handle = 1; } +/* + * Represents a bulk in

Re: [PR] Convert Null to Int32(None) for MakeArray at type coercion step [arrow-datafusion]

2023-11-10 Thread via GitHub
jayzhan211 commented on code in PR #7995: URL: https://github.com/apache/arrow-datafusion/pull/7995#discussion_r1390053054 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -579,26 +580,51 @@ fn coerce_arguments_for_fun( .collect::>>()?; } +//

Re: [I] All array functions should represent `NULL` as an element [arrow-datafusion]

2023-11-10 Thread via GitHub
jayzhan211 commented on issue #7142: URL: https://github.com/apache/arrow-datafusion/issues/7142#issuecomment-1806616312 @alamb Should we introduce another Expr::ArrayFunction that differs from other Expr::ScalarFunctions? I think null handling is unique to ArrayFunction, instead of patt

Re: [PR] GH-38255: [Go][C++] Implement Flight SQL Bulk Ingestion [arrow]

2023-11-10 Thread via GitHub
joellubi commented on code in PR #38385: URL: https://github.com/apache/arrow/pull/38385#discussion_r1390066091 ## format/FlightSql.proto: ## @@ -1778,6 +1794,47 @@ message CommandPreparedStatementUpdate { bytes prepared_statement_handle = 1; } +/* + * Represents a bulk in

Re: [PR] Parquet: read/write f16 for Arrow [arrow-rs]

2023-11-10 Thread via GitHub
Jefffrey commented on code in PR #5003: URL: https://github.com/apache/arrow-rs/pull/5003#discussion_r1390063433 ## parquet/src/column/writer/mod.rs: ## @@ -1170,6 +1188,7 @@ fn increment_utf8(mut data: Vec) -> Option> { mod tests { use crate::{file::properties::DEFAULT_CO

Re: [PR] GH-38653: Raise the minimum macOS version to 10.15 catalina to allow using new APIs in C++17 [arrow]

2023-11-10 Thread via GitHub
niyue commented on PR #38677: URL: https://github.com/apache/arrow/pull/38677#issuecomment-1806590562 I am not very familiar with CI scripts in the project, and I only search for "10.14" and replace the possible usages from "10.14" to "10.15", and if there is some usage missing in the PR, p

Re: [PR] Convert Null to Int32(None) for MakeArray at type coercion step [arrow-datafusion]

2023-11-10 Thread via GitHub
jayzhan211 commented on code in PR #7995: URL: https://github.com/apache/arrow-datafusion/pull/7995#discussion_r1390053054 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -579,26 +580,51 @@ fn coerce_arguments_for_fun( .collect::>>()?; } +//

Re: [I] adbc_driver_manager.OperationalError: UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory [arrow-adbc]

2023-11-10 Thread via GitHub
lidavidm commented on issue #1283: URL: https://github.com/apache/arrow-adbc/issues/1283#issuecomment-1806576961 Ok. The Go traceback indicates an actual bug that needs fixing. The dataset described doesn't sound big enough to be worthy of causing memory issues, either. (Unless it's v

Re: [I] [C++] Reading empty CSV file in parallel hangs [arrow]

2023-11-10 Thread via GitHub
WillAyd commented on issue #38676: URL: https://github.com/apache/arrow/issues/38676#issuecomment-1806575667 Here is some info from helgrind that might be of use: ```sh ==87545== Possible data race during write of size 4 at 0x642320 by thread #1 ==87545== Locks held: non

Re: [I] [Java][CI] Java Jars failing on MacOS [arrow]

2023-11-10 Thread via GitHub
niyue commented on issue #38653: URL: https://github.com/apache/arrow/issues/38653#issuecomment-1806575047 I submitted a PR trying to address this issue: https://github.com/apache/arrow/pull/38677 But it requires raising the minimum macOS deployment target from 10.14 to 10.15, and yo

Re: [PR] GH-38653: Raise the minimum macOS version to 10.15 catalina to allow using new APIs in C++17 [arrow]

2023-11-10 Thread via GitHub
github-actions[bot] commented on PR #38677: URL: https://github.com/apache/arrow/pull/38677#issuecomment-1806574024 :warning: GitHub issue #38653 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[PR] GH-38653: Raise the minimum macOS version to 10.15 catalina to allow using new APIs in C++17 [arrow]

2023-11-10 Thread via GitHub
niyue opened a new pull request, #38677: URL: https://github.com/apache/arrow/pull/38677 ### Rationale for this change For macOS, some new APIs in C++17, e.g. `std::filesystem::path` is only visible when the compiling target is >= macOS 10.15 (Catalina). But currently the minimum macOS t

Re: [PR] Fix join order for TPCH Q17 [arrow-datafusion]

2023-11-10 Thread via GitHub
andygrove commented on code in PR #8126: URL: https://github.com/apache/arrow-datafusion/pull/8126#discussion_r1390040161 ## datafusion/physical-plan/src/filter.rs: ## @@ -194,11 +194,13 @@ impl ExecutionPlan for FilterExec { fn statistics(&self) -> Result { let pr

Re: [PR] feat: show statistics in explain verbose [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on code in PR #8113: URL: https://github.com/apache/arrow-datafusion/pull/8113#discussion_r1390015438 ## datafusion/proto/src/logical_plan/to_proto.rs: ## @@ -352,6 +352,11 @@ impl From<&StringifiedPlan> for protobuf::StringifiedPlan { PlanType::

Re: [PR] GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats [arrow]

2023-11-10 Thread via GitHub
felipecrv commented on code in PR #35345: URL: https://github.com/apache/arrow/pull/35345#discussion_r1390012764 ## cpp/src/arrow/util/list_util.h: ## @@ -0,0 +1,75 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [PR] Fix join order for TPCH Q17 [arrow-datafusion]

2023-11-10 Thread via GitHub
andygrove commented on code in PR #8126: URL: https://github.com/apache/arrow-datafusion/pull/8126#discussion_r1390004784 ## datafusion/physical-plan/src/filter.rs: ## @@ -194,11 +194,13 @@ impl ExecutionPlan for FilterExec { fn statistics(&self) -> Result { let pr

Re: [PR] Convert Null to Int32(None) for MakeArray at type coercion step [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on code in PR #7995: URL: https://github.com/apache/arrow-datafusion/pull/7995#discussion_r1380730813 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -579,26 +580,51 @@ fn coerce_arguments_for_fun( .collect::>>()?; } +// Conve

[I] Make default filter selectivity estimate configurable [arrow-datafusion]

2023-11-10 Thread via GitHub
andygrove opened a new issue, #8133: URL: https://github.com/apache/arrow-datafusion/issues/8133 ### Is your feature request related to a problem or challenge? PR https://github.com/apache/arrow-datafusion/pull/8126 updates FilterExec statistics method so that it will assume a highly

Re: [PR] rewrite `array_append/array_prepend` to remove deplicate codes [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb merged PR #8108: URL: https://github.com/apache/arrow-datafusion/pull/8108 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [I] [Java][CI] Java Jars failing on MacOS [arrow]

2023-11-10 Thread via GitHub
niyue commented on issue #38653: URL: https://github.com/apache/arrow/issues/38653#issuecomment-1806526820 Sure. I took a brief look and the CI seems to complain `std::filesystem::path` cannot be used on macOS 14.0 aarch64, but this env is also what I use locally and I don't find such issue

Re: [PR] GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats [arrow]

2023-11-10 Thread via GitHub
felipecrv commented on code in PR #35345: URL: https://github.com/apache/arrow/pull/35345#discussion_r1389988996 ## cpp/src/arrow/util/list_util.h: ## @@ -0,0 +1,75 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

Re: [PR] GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats [arrow]

2023-11-10 Thread via GitHub
felipecrv commented on code in PR #35345: URL: https://github.com/apache/arrow/pull/35345#discussion_r1389980823 ## cpp/src/arrow/util/list_util.cc: ## @@ -0,0 +1,353 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. S

Re: [PR] GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats [arrow]

2023-11-10 Thread via GitHub
felipecrv commented on code in PR #35345: URL: https://github.com/apache/arrow/pull/35345#discussion_r1389979629 ## cpp/src/arrow/util/list_util.cc: ## @@ -0,0 +1,353 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. S

Re: [PR] feat: support UDAF in substrait producer/consumer [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on code in PR #8119: URL: https://github.com/apache/arrow-datafusion/pull/8119#discussion_r1389978492 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -674,23 +677,37 @@ pub async fn from_substrait_agg_func( args.push(arg_expr?.as_ref().clone(

Re: [PR] GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats [arrow]

2023-11-10 Thread via GitHub
felipecrv commented on code in PR #35345: URL: https://github.com/apache/arrow/pull/35345#discussion_r1389975712 ## cpp/src/arrow/testing/random.cc: ## @@ -608,6 +609,218 @@ std::shared_ptr OffsetsFromLengthsArray(OffsetArrayType* lengths, std::make_shared(), size, buffe

Re: [I] [C++] Use the random array generators from testing/random.h in the concatenation tests [arrow]

2023-11-10 Thread via GitHub
felipecrv commented on issue #38656: URL: https://github.com/apache/arrow/issues/38656#issuecomment-1806507591 A related issue: there are two algorithms for generating random list and list-like arrays in `testing/random.cc`. The two algorithms take different parameters and have different us

Re: [I] adbc_driver_manager.OperationalError: UNKNOWN: [Snowflake] arrow/ipc: unknown error while reading: cannot allocate memory [arrow-adbc]

2023-11-10 Thread via GitHub
bascheibler commented on issue #1283: URL: https://github.com/apache/arrow-adbc/issues/1283#issuecomment-1806505069 > "fails after fetching a few tables" > > The other thing I would suspect here is if we somehow are keeping a reference to the dataset when we should not be... >

Re: [PR] GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats [arrow]

2023-11-10 Thread via GitHub
felipecrv commented on code in PR #35345: URL: https://github.com/apache/arrow/pull/35345#discussion_r1389970408 ## cpp/src/arrow/testing/random.cc: ## @@ -608,6 +609,218 @@ std::shared_ptr OffsetsFromLengthsArray(OffsetArrayType* lengths, std::make_shared(), size, buffe

Re: [PR] GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats [arrow]

2023-11-10 Thread via GitHub
felipecrv commented on code in PR #35345: URL: https://github.com/apache/arrow/pull/35345#discussion_r1389968805 ## cpp/src/arrow/testing/random.cc: ## @@ -608,6 +609,218 @@ std::shared_ptr OffsetsFromLengthsArray(OffsetArrayType* lengths, std::make_shared(), size, buffe

Re: [PR] feat: show statistics in explain verbose [arrow-datafusion]

2023-11-10 Thread via GitHub
NGA-TRAN commented on PR #8113: URL: https://github.com/apache/arrow-datafusion/pull/8113#issuecomment-1806498233 @alamb I have addressed your comments but I have a new question in the code -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] feat: show statistics in explain verbose [arrow-datafusion]

2023-11-10 Thread via GitHub
NGA-TRAN commented on code in PR #8113: URL: https://github.com/apache/arrow-datafusion/pull/8113#discussion_r1389965551 ## datafusion/proto/src/logical_plan/to_proto.rs: ## @@ -352,6 +352,11 @@ impl From<&StringifiedPlan> for protobuf::StringifiedPlan { PlanTyp

Re: [PR] GH-38659: [CI][MATLAB] Add MATLAB packaging and release scripts [arrow]

2023-11-10 Thread via GitHub
github-actions[bot] commented on PR #38660: URL: https://github.com/apache/arrow/pull/38660#issuecomment-1806494635 Revision: 0d46d1684d8bb3b9313feafd28ab5481a8a3b7e0 Submitted crossbow builds: [ursacomputing/crossbow @ actions-581606b59c](https://github.com/ursacomputing/crossbow/bra

Re: [PR] GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats [arrow]

2023-11-10 Thread via GitHub
felipecrv commented on code in PR #35345: URL: https://github.com/apache/arrow/pull/35345#discussion_r1389966515 ## cpp/src/arrow/ipc/writer.cc: ## @@ -442,6 +502,37 @@ class RecordBatchSerializer { // Must also slice the values values = values->Slice(values_offset

Re: [PR] GH-38659: [CI][MATLAB] Add MATLAB packaging and release scripts [arrow]

2023-11-10 Thread via GitHub
sgilmore10 commented on PR #38660: URL: https://github.com/apache/arrow/pull/38660#issuecomment-1806492533 @github-actions crossbow submit matlab -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Add subtrait support for `IS NULL` and `IS NOT NULL` [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on code in PR #8093: URL: https://github.com/apache/arrow-datafusion/pull/8093#discussion_r1389951459 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -880,6 +886,42 @@ pub async fn from_substrait_rex( ScalarFunctionType::ILike => {

Re: [PR] GH-38659: [CI][MATLAB] Add MATLAB packaging and release scripts [arrow]

2023-11-10 Thread via GitHub
github-actions[bot] commented on PR #38660: URL: https://github.com/apache/arrow/pull/38660#issuecomment-1806477710 Revision: 0137b0b32ce9026ca089108d5d6423425d4b3f78 Submitted crossbow builds: [ursacomputing/crossbow @ actions-ce7ab243c8](https://github.com/ursacomputing/crossbow/bra

Re: [PR] GH-38659: [CI][MATLAB] Add MATLAB packaging and release scripts [arrow]

2023-11-10 Thread via GitHub
sgilmore10 commented on PR #38660: URL: https://github.com/apache/arrow/pull/38660#issuecomment-1806474509 @github-actions crossbow submit matlab -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] docs: show creation of DFSchema [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on PR #8132: URL: https://github.com/apache/arrow-datafusion/pull/8132#issuecomment-1806468224 Thank you very much @wjones127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] [EPIC] Unify Function Interface (remove `BuiltInScalarFunction`) [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on issue #8045: URL: https://github.com/apache/arrow-datafusion/issues/8045#issuecomment-1806462998 Thanks for driving this forward @2010YOUY01 -- it is very much apprecated. I am planning to merge https://github.com/apache/arrow-datafusion/pull/8079 in 4 more days

Re: [PR] GH-35344: [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats [arrow]

2023-11-10 Thread via GitHub
felipecrv commented on code in PR #35345: URL: https://github.com/apache/arrow/pull/35345#discussion_r1389942849 ## cpp/src/arrow/ipc/writer.cc: ## @@ -350,6 +350,67 @@ class RecordBatchSerializer { return Status::OK(); } + template + Status GetZeroBasedListViewOffs

Re: [PR] Minor: Cleanup BuiltinScalarFunction's phys-expr creation [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on code in PR #8114: URL: https://github.com/apache/arrow-datafusion/pull/8114#discussion_r1389940188 ## datafusion/physical-expr/src/datetime_expressions.rs: ## @@ -17,6 +17,8 @@ //! DateTime expressions +use crate::datetime_expressions; Review Comment:

Re: [PR] Minor: Cleanup BuiltinScalarFunction's phys-expr creation [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb merged PR #8114: URL: https://github.com/apache/arrow-datafusion/pull/8114 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] GH-34865: [C++][Flight RPC] Add Session management messages [arrow]

2023-11-10 Thread via GitHub
indigophox commented on code in PR #34817: URL: https://github.com/apache/arrow/pull/34817#discussion_r1389939153 ## cpp/src/arrow/flight/types.h: ## @@ -742,6 +746,164 @@ struct ARROW_FLIGHT_EXPORT CancelFlightInfoRequest { static arrow::Result Deserialize(std::string_view

Re: [I] Show Statistics in explain Verbose without enabling `show statistics` [arrow-datafusion]

2023-11-10 Thread via GitHub
NGA-TRAN commented on issue #8111: URL: https://github.com/apache/arrow-datafusion/issues/8111#issuecomment-1806454366 Yeah, I will add 2 line lines in this PR and implement `VERY VERBOSE` id=f we need it -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [I] Show Statistics in explain Verbose without enabling `show statistics` [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on issue #8111: URL: https://github.com/apache/arrow-datafusion/issues/8111#issuecomment-1806452815 > @alamb : can we go with 2 InitialPhysicalPlan and FinalPhysicalPlan? I would like to make sure stats won't get lost during optimization rules I think the usecase of

Re: [PR] feat: show statistics in explain verbose [arrow-datafusion]

2023-11-10 Thread via GitHub
NGA-TRAN commented on PR #8113: URL: https://github.com/apache/arrow-datafusion/pull/8113#issuecomment-1806450315 > Something like this perhaps: https://github.com/apache/arrow-datafusion/issues/8111#issuecomment-180663 Sounds good. I will add stats for initial and final physical

Re: [I] Show Statistics in explain Verbose without enabling `show statistics` [arrow-datafusion]

2023-11-10 Thread via GitHub
NGA-TRAN commented on issue #8111: URL: https://github.com/apache/arrow-datafusion/issues/8111#issuecomment-1806446711 @alamb : can we go with 2 `InitialPhysicalPlan` and `FinalPhysicalPlan`? I would like to make sure stats won't get lost during optimization rules -- This is an automated

Re: [PR] feat: show statistics in explain verbose [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on PR #8113: URL: https://github.com/apache/arrow-datafusion/pull/8113#issuecomment-1806444585 Something like this perhaps: https://github.com/apache/arrow-datafusion/issues/8111#issuecomment-180663 -- This is an automated message from the Apache Git Service. To respo

Re: [I] Show Statistics in explain Verbose without enabling `show statistics` [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on issue #8111: URL: https://github.com/apache/arrow-datafusion/issues/8111#issuecomment-180663 I would like to propose a slightly different interface ``` ❯ explain verbose select * from t1 where time <= to_timestamp(350); +---+--

Re: [PR] GH-34865: [C++][Flight RPC] Add Session management messages [arrow]

2023-11-10 Thread via GitHub
indigophox commented on code in PR #34817: URL: https://github.com/apache/arrow/pull/34817#discussion_r1389926912 ## cpp/src/arrow/flight/sql/server_session_middleware.h: ## @@ -0,0 +1,87 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [PR] Fix join order for TPCH Q17 [arrow-datafusion]

2023-11-10 Thread via GitHub
ozankabak commented on code in PR #8126: URL: https://github.com/apache/arrow-datafusion/pull/8126#discussion_r1389924893 ## datafusion/physical-plan/src/filter.rs: ## @@ -194,11 +194,13 @@ impl ExecutionPlan for FilterExec { fn statistics(&self) -> Result { let pr

Re: [PR] GH-34865: [C++][Flight RPC] Add Session management messages [arrow]

2023-11-10 Thread via GitHub
indigophox commented on code in PR #34817: URL: https://github.com/apache/arrow/pull/34817#discussion_r1389924673 ## cpp/src/arrow/flight/sql/server_session_middleware.cc: ## @@ -0,0 +1,179 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] Fix join order for TPCH Q17 [arrow-datafusion]

2023-11-10 Thread via GitHub
andygrove commented on code in PR #8126: URL: https://github.com/apache/arrow-datafusion/pull/8126#discussion_r1389924479 ## datafusion/physical-plan/src/filter.rs: ## @@ -194,11 +194,13 @@ impl ExecutionPlan for FilterExec { fn statistics(&self) -> Result { let pr

Re: [PR] Support remaining functions in protobuf serialization, add `expr_fn` for `StructFunction` [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb merged PR #8100: URL: https://github.com/apache/arrow-datafusion/pull/8100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [I] Unsupported functions in protobuf serializatin [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb closed issue #8098: Unsupported functions in protobuf serializatin URL: https://github.com/apache/arrow-datafusion/issues/8098 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Simplify ProjectionPushdown and make it more general [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb merged PR #8109: URL: https://github.com/apache/arrow-datafusion/pull/8109 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] Minor: clean up the code regarding clippy [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on PR #8122: URL: https://github.com/apache/arrow-datafusion/pull/8122#issuecomment-1806433594 Thank you @comphead and @Weijun-H -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Minor: clean up the code regarding clippy [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb merged PR #8122: URL: https://github.com/apache/arrow-datafusion/pull/8122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

Re: [PR] GH-34865: [C++][Flight RPC] Add Session management messages [arrow]

2023-11-10 Thread via GitHub
indigophox commented on code in PR #34817: URL: https://github.com/apache/arrow/pull/34817#discussion_r1389909857 ## cpp/src/arrow/flight/sql/server_session_middleware.cc: ## @@ -0,0 +1,179 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [I] Queries that lost statistics or their statistics become inexact [arrow-datafusion]

2023-11-10 Thread via GitHub
NGA-TRAN commented on issue #8099: URL: https://github.com/apache/arrow-datafusion/issues/8099#issuecomment-1806419419 I used the change in this PR https://github.com/apache/arrow-datafusion/pull/8112/ to see stats of columns. Summary: 1. Lowest scan step: column only has stats `Nu

Re: [PR] Encapsulate `EquivalenceClass` into a struct [arrow-datafusion]

2023-11-10 Thread via GitHub
ozankabak commented on code in PR #8034: URL: https://github.com/apache/arrow-datafusion/pull/8034#discussion_r1389895645 ## datafusion/physical-expr/src/equivalence.rs: ## @@ -20,26 +20,114 @@ use std::hash::Hash; use std::sync::Arc; use crate::expressions::Column; -use cra

[PR] docs: show creation of DFSchema [arrow-datafusion]

2023-11-10 Thread via GitHub
wjones127 opened a new pull request, #8132: URL: https://github.com/apache/arrow-datafusion/pull/8132 ## Which issue does this PR close? Closes #8131. ## Rationale for this change I got stuck figuring this out, and noticed there was very little documentation here.

Re: [PR] Make fields of `ScalarUDF` , `AggregateUDF` and `WindowUDF` non `pub` [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on PR #8079: URL: https://github.com/apache/arrow-datafusion/pull/8079#issuecomment-1806408005 > When ScalarUDF fields needs to be extended, it should look like: Yes, that is a great idea! I am actually hoping it looks more like ```rust let udf = Scala

Re: [I] Change display of `RepartitionExec` from `SortPreservingRepartitionExec to `RepartitionExec preserve_order=true` [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on issue #8129: URL: https://github.com/apache/arrow-datafusion/issues/8129#issuecomment-1806401554 Thanks @JacobOgle ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Change display of `RepartitionExec` from `SortPreservingRepartitionExec to `RepartitionExec preserve_order=true` [arrow-datafusion]

2023-11-10 Thread via GitHub
ozankabak commented on issue #8129: URL: https://github.com/apache/arrow-datafusion/issues/8129#issuecomment-1806392675 We are fine with this change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Fix: Do not try and preserve order when there is no order to preserve in RepartitionExec [arrow-datafusion]

2023-11-10 Thread via GitHub
ozankabak commented on code in PR #8127: URL: https://github.com/apache/arrow-datafusion/pull/8127#discussion_r1389878799 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -15,9 +15,9 @@ // specific language governing permissions and limitations // under the License.

Re: [PR] Fix: Do not try and preserve order when there is no order to preserve in RepartitionExec [arrow-datafusion]

2023-11-10 Thread via GitHub
ozankabak commented on code in PR #8127: URL: https://github.com/apache/arrow-datafusion/pull/8127#discussion_r1389878799 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -15,9 +15,9 @@ // specific language governing permissions and limitations // under the License.

Re: [I] Change display of `RepartitionExec` from `SortPreservingRepartitionExec to `RepartitionExec preserve_order=true` [arrow-datafusion]

2023-11-10 Thread via GitHub
JacobOgle commented on issue #8129: URL: https://github.com/apache/arrow-datafusion/issues/8129#issuecomment-1806388138 I wouldn't mind taking a look into this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [I] Support udf `range` [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on issue #8028: URL: https://github.com/apache/arrow-datafusion/issues/8028#issuecomment-1806388013 sure -- thanks @Veeupup -- sorry for the delays in reviewing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] rewrite `array_append/array_prepend` to remove deplicate codes [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on PR #8108: URL: https://github.com/apache/arrow-datafusion/pull/8108#issuecomment-1806387386 Very nice! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Fix: Do not try and preserve order when there is no order to preserve in RepartitionExec [arrow-datafusion]

2023-11-10 Thread via GitHub
ozankabak commented on code in PR #8127: URL: https://github.com/apache/arrow-datafusion/pull/8127#discussion_r1389875831 ## datafusion/core/src/physical_optimizer/enforce_sorting.rs: ## @@ -1053,6 +1074,28 @@ mod tests { Ok(()) } +#[tokio::test] +async f

Re: [PR] Add support for utilization complex aggregate exprs inside group by [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on code in PR #8107: URL: https://github.com/apache/arrow-datafusion/pull/8107#discussion_r1389877302 ## datafusion/physical-expr/src/equivalence.rs: ## @@ -869,9 +986,27 @@ impl EquivalenceProperties { self.ordering_satisfy_requirement(&sort_requirement

Re: [PR] GH-34865: [C++][Flight RPC] Add Session management messages [arrow]

2023-11-10 Thread via GitHub
indigophox commented on code in PR #34817: URL: https://github.com/apache/arrow/pull/34817#discussion_r1389876419 ## cpp/src/arrow/flight/types.cc: ## @@ -463,6 +463,318 @@ arrow::Result CancelFlightInfoRequest::Deserialize( return out; } +std::ostream& operator<<(std::os

Re: [PR] Add support for utilization complex aggregate exprs inside group by [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on PR #8107: URL: https://github.com/apache/arrow-datafusion/pull/8107#issuecomment-1806383909 > By the way, I want to stress that this PR is beneficial even if it doesn't solve the issue in https://github.com/apache/arrow-datafusion/issues/8043. Yes, I 100% agree --

Re: [PR] GH-38659: [CI][MATLAB] Add MATLAB packaging and release scripts [arrow]

2023-11-10 Thread via GitHub
github-actions[bot] commented on PR #38660: URL: https://github.com/apache/arrow/pull/38660#issuecomment-1806379303 Revision: dc256eafd14ecb7184d48a29fd63c7062cb07a3a Submitted crossbow builds: [ursacomputing/crossbow @ actions-ab37048e95](https://github.com/ursacomputing/crossbow/bra

Re: [PR] Implement `DISTINCT ON` from Postgres [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on PR #7981: URL: https://github.com/apache/arrow-datafusion/pull/7981#issuecomment-1806377857 Thank you @gruuya -- this is on my review queue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] GH-38659: [CI][MATLAB] Add MATLAB packaging and release scripts [arrow]

2023-11-10 Thread via GitHub
sgilmore10 commented on PR #38660: URL: https://github.com/apache/arrow/pull/38660#issuecomment-1806376638 @github-actions crossbow submit matlab -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Fix join order for TPCH Q17 [arrow-datafusion]

2023-11-10 Thread via GitHub
Dandandan commented on code in PR #8126: URL: https://github.com/apache/arrow-datafusion/pull/8126#discussion_r1389869661 ## datafusion/physical-plan/src/filter.rs: ## @@ -194,11 +194,13 @@ impl ExecutionPlan for FilterExec { fn statistics(&self) -> Result { let pr

Re: [I] Introduce a way to represent constrained statistics / bounds on values in Statistics [arrow-datafusion]

2023-11-10 Thread via GitHub
alamb commented on issue #8078: URL: https://github.com/apache/arrow-datafusion/issues/8078#issuecomment-1806373051 It is unlikely I will have time to work on this today -- I will pick it up next week -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] Fix join order for TPCH Q17 [arrow-datafusion]

2023-11-10 Thread via GitHub
Dandandan commented on code in PR #8126: URL: https://github.com/apache/arrow-datafusion/pull/8126#discussion_r138986 ## datafusion/physical-plan/src/filter.rs: ## @@ -194,11 +194,13 @@ impl ExecutionPlan for FilterExec { fn statistics(&self) -> Result { let pr

  1   2   3   >