Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 commented on code in PR #10560: URL: https://github.com/apache/datafusion/pull/10560#discussion_r1611638831 ## datafusion-examples/examples/udaf_expr.rs: ## @@ -0,0 +1,45 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 commented on code in PR #10560: URL: https://github.com/apache/datafusion/pull/10560#discussion_r1611638831 ## datafusion-examples/examples/udaf_expr.rs: ## @@ -0,0 +1,45 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] refactor: simplify converting List DataTypes to `ScalarValue` [datafusion]

2024-05-25 Thread via GitHub
jonahgao commented on code in PR #10675: URL: https://github.com/apache/datafusion/pull/10675#discussion_r1615025038 ## datafusion/common/src/scalar/mod.rs: ## @@ -3089,46 +3089,21 @@ impl TryFrom<> for ScalarValue { Box::new(value_type.as_ref().try_into()?),

[PR] refactor: simplify converting List DataTypes to `ScalarValue` [datafusion]

2024-05-25 Thread via GitHub
jonahgao opened a new pull request, #10675: URL: https://github.com/apache/datafusion/pull/10675 ## Which issue does this PR close? N/A ## Rationale for this change `ScalarValue::try_from(dt: )` is commonly used to create a NULL `ScalarValue` for a given DataType. The

Re: [I] Upgrade object_store to 0.10.1 [datafusion]

2024-05-25 Thread via GitHub
Samrose-Ahmed commented on issue #10673: URL: https://github.com/apache/datafusion/issues/10673#issuecomment-2132048170 I think `parquet` needs to be updated too, I tried it quickly locally and needed to update `pyo3` to 0.21 but get some weird type errors I couldn't understand. --

Re: [PR] Introduce Sum UDAF [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 commented on code in PR #10651: URL: https://github.com/apache/datafusion/pull/10651#discussion_r1614978844 ## datafusion/physical-expr-common/src/aggregate/mod.rs: ## @@ -292,16 +307,38 @@ impl AggregateExpr for AggregateFunctionExpr { is_distinct:

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-25 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1614977230 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,20 @@ class CometExpressionSuite extends CometTestBase with

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-25 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1614977230 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,20 @@ class CometExpressionSuite extends CometTestBase with

[I] `test_sort_1k_mem` failed with too many open file error when I run locally [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 opened a new issue, #10674: URL: https://github.com/apache/datafusion/issues/10674 ### Describe the bug It has been annoyed me a while I get this error when run locally, and the failure **is not shown in CI** ``` failures:

[I] Upgrade object_store to 0.10.1 [datafusion]

2024-05-25 Thread via GitHub
Samrose-Ahmed opened a new issue, #10673: URL: https://github.com/apache/datafusion/issues/10673 ### Is your feature request related to a problem or challenge? object_store 0.10.1 includes some updates for better retrying, with multipart uplads ### Describe the solution you'd

Re: [PR] Move Median to `functions-aggregate` and Introduce Numeric signature [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 commented on code in PR #10644: URL: https://github.com/apache/datafusion/pull/10644#discussion_r1614957716 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -871,6 +871,33 @@ select median(distinct c) from t; statement ok drop table t; +# optimize

Re: [I] ColumnReader.loadVector should initiate CometDictionary after re-import arrays [datafusion-comet]

2024-05-25 Thread via GitHub
viirya closed issue #474: ColumnReader.loadVector should initiate CometDictionary after re-import arrays URL: https://github.com/apache/datafusion-comet/issues/474 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] fix: `ColumnReader.loadVector` should initiate `CometDictionary` after re-import arrays [datafusion-comet]

2024-05-25 Thread via GitHub
viirya commented on PR #473: URL: https://github.com/apache/datafusion-comet/pull/473#issuecomment-2131456605 Merged. Thanks @andygrove @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] fix: `ColumnReader.loadVector` should initiate `CometDictionary` after re-import arrays [datafusion-comet]

2024-05-25 Thread via GitHub
viirya merged PR #473: URL: https://github.com/apache/datafusion-comet/pull/473 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] fix: `ColumnReader.loadVector` should initiate `CometDictionary` after re-import arrays [datafusion-comet]

2024-05-25 Thread via GitHub
codecov-commenter commented on PR #473: URL: https://github.com/apache/datafusion-comet/pull/473#issuecomment-2131417599 ##

Re: [PR] build(deps): bump mimalloc from 0.1.39 to 0.1.41 [datafusion-python]

2024-05-25 Thread via GitHub
dependabot[bot] commented on PR #657: URL: https://github.com/apache/datafusion-python/pull/657#issuecomment-2131417589 Superseded by #719. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] build(deps): bump mimalloc from 0.1.39 to 0.1.41 [datafusion-python]

2024-05-25 Thread via GitHub
dependabot[bot] closed pull request #657: build(deps): bump mimalloc from 0.1.39 to 0.1.41 URL: https://github.com/apache/datafusion-python/pull/657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] build(deps): bump mimalloc from 0.1.39 to 0.1.42 [datafusion-python]

2024-05-25 Thread via GitHub
dependabot[bot] opened a new pull request, #719: URL: https://github.com/apache/datafusion-python/pull/719 Bumps [mimalloc](https://github.com/purpleprotocol/mimalloc_rust) from 0.1.39 to 0.1.42. Release notes Sourced from

[PR] build(deps): bump parking_lot from 0.12.2 to 0.12.3 [datafusion-python]

2024-05-25 Thread via GitHub
dependabot[bot] opened a new pull request, #718: URL: https://github.com/apache/datafusion-python/pull/718 Bumps [parking_lot](https://github.com/Amanieu/parking_lot) from 0.12.2 to 0.12.3. Changelog Sourced from

Re: [PR] Improve `ParquetExec` and related documentation [datafusion]

2024-05-25 Thread via GitHub
comphead commented on code in PR #10647: URL: https://github.com/apache/datafusion/pull/10647#discussion_r1614830679 ## datafusion/core/src/datasource/physical_plan/parquet/schema_adapter.rs: ## @@ -20,35 +20,38 @@ use arrow_schema::{Schema, SchemaRef}; use std::fmt::Debug;

Re: [PR] fix: `ColumnReader.loadVector` should initiate `CometDictionary` after re-import arrays [datafusion-comet]

2024-05-25 Thread via GitHub
viirya commented on code in PR #473: URL: https://github.com/apache/datafusion-comet/pull/473#discussion_r1614819511 ## common/src/main/java/org/apache/comet/parquet/ColumnReader.java: ## @@ -230,15 +230,15 @@ public CometDecodedVector loadVector() { // return plain

Re: [PR] fix: `ColumnReader.loadVector` should initiate `CometDictionary` after re-import arrays [datafusion-comet]

2024-05-25 Thread via GitHub
viirya commented on PR #473: URL: https://github.com/apache/datafusion-comet/pull/473#issuecomment-2131403965 cc @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[I] ColumnReader.loadVector should initiate CometDictionary after re-import arrays [datafusion-comet]

2024-05-25 Thread via GitHub
viirya opened a new issue, #474: URL: https://github.com/apache/datafusion-comet/issues/474 ### Describe the bug During debugging some test failures in #437, I found that `CometScanExec` returns empty dictionary values after first batch in failed queries. It is because in

[PR] fix: `ColumnReader.loadVector` should initiate `CometDictionary` after re-import arrays [datafusion-comet]

2024-05-25 Thread via GitHub
viirya opened a new pull request, #473: URL: https://github.com/apache/datafusion-comet/pull/473 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-25 Thread via GitHub
andygrove merged PR #461: URL: https://github.com/apache/datafusion-comet/pull/461 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Add quote-style parameter for CSV options [datafusion]

2024-05-25 Thread via GitHub
DDtKey commented on issue #10669: URL: https://github.com/apache/datafusion/issues/10669#issuecomment-2131388179 I think this might be labeled with `good first issue`, there are links to the code that needs to be changed and it is also possible to write `sqllogictest` similar to

Re: [I] ArrayList data is leaked to outside unnest logical plan from the inner unnest logical plan that comes before a Limit [datafusion]

2024-05-25 Thread via GitHub
duongcongtoai commented on issue #10672: URL: https://github.com/apache/datafusion/issues/10672#issuecomment-2131375997 I'm taking it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[I] ArrayList data is leaked to outside unnest logical plan from the inner unnest logical plan with limit applied [datafusion]

2024-05-25 Thread via GitHub
duongcongtoai opened a new issue, #10672: URL: https://github.com/apache/datafusion/issues/10672 ### Describe the bug Given this query ``` select unnest(struct_c1c0), unnest(list_c2c0) from ( select unnest(column1) as struct_c1c0, unnest(column2)['c0'] as

[PR] fix: pass `quote` parameter to CSV writer [datafusion]

2024-05-25 Thread via GitHub
DDtKey opened a new pull request, #10671: URL: https://github.com/apache/datafusion/pull/10671 ## Which issue does this PR close? Closes #10670 ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-25 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1614789127 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -622,14 +590,89 @@ impl Cast { self.eval_mode,

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-25 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1614788921 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -622,14 +590,89 @@ impl Cast { self.eval_mode,

[PR] feat: Add "Comet Fuzz" fuzz-testing utility [datafusion-comet]

2024-05-25 Thread via GitHub
andygrove opened a new pull request, #472: URL: https://github.com/apache/datafusion-comet/pull/472 ## Which issue does this PR close? N/A ## Rationale for this change Comet Fuzz is a standalone project for generating random data and queries and

[I] CSV `quote` parameter is ignored [datafusion]

2024-05-25 Thread via GitHub
DDtKey opened a new issue, #10670: URL: https://github.com/apache/datafusion/issues/10670 ### Describe the bug quote` parameter of CSV writer is not getting passed to `arrow-csv` writer:

[I] Add quote-style parameter for CSV options [datafusion]

2024-05-25 Thread via GitHub
DDtKey opened a new issue, #10669: URL: https://github.com/apache/datafusion/issues/10669 ### Is your feature request related to a problem or challenge? CSV writers usually supports configuration of quote style/mode with the following options: - `Always` - `Necessary` -

Re: [PR] Minor: Improve ObjectStoreUrl docs + examples [datafusion]

2024-05-25 Thread via GitHub
goldmedal commented on PR #10619: URL: https://github.com/apache/datafusion/pull/10619#issuecomment-2131302454 On a related note, I found that the usage flow of ObjectStore in DataFusion involves registering the source as a table and then querying the table. This approach makes sense.

Re: [I] Library Guide: Building LogicalPlans [datafusion]

2024-05-25 Thread via GitHub
edmondop commented on issue #7306: URL: https://github.com/apache/datafusion/issues/7306#issuecomment-2131289288 > > What do you think we should do? I can add that paragraph, but maybe we also want to link the examples? What else do you think it's needed to close this > > Those both

[PR] feat: Implement ANSI support for UnaryMinus [datafusion-comet]

2024-05-25 Thread via GitHub
vaibhawvipul opened a new pull request, #471: URL: https://github.com/apache/datafusion-comet/pull/471 ## Which issue does this PR close? Closes #465 . ## Rationale for this change ## What changes are included in this PR? ## How are these

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-25 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1614673020 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -685,6 +685,19 @@ async fn roundtrip_literal_struct() -> Result<()> { .await }

[I] Convert Variance Population to UDAF [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 opened a new issue, #10668: URL: https://github.com/apache/datafusion/issues/10668 ### Is your feature request related to a problem or challenge? Similar to #10667 but for Variance Population function ### Describe the solution you'd like _No response_

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-25 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1614671030 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -685,6 +685,19 @@ async fn roundtrip_literal_struct() -> Result<()> { .await }

[I] Convert `Variance Sample` to UDAF [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 opened a new issue, #10667: URL: https://github.com/apache/datafusion/issues/10667 ### Is your feature request related to a problem or challenge? Part of #8708 1. Move variance aggregate expression in `datafusion/physical-expr/src/aggregate/variance.rs` to

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10543: URL: https://github.com/apache/datafusion/pull/10543#issuecomment-2131268529 Awesome! I plan to merge this PR tomorrow unless anyone else would like time to review. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Improve `ParquetExec` and related documentation [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10647: URL: https://github.com/apache/datafusion/pull/10647#discussion_r1614666750 ## datafusion/core/src/datasource/physical_plan/parquet/mod.rs: ## @@ -642,11 +714,22 @@ fn should_enable_page_index( .unwrap_or(false) } -///

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-25 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1614666495 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -512,6 +572,62 @@ pub fn to_substrait_rel( } } +fn to_substrait_named_struct(schema: ) ->

Re: [I] [Epic] Unify `AggregateFunction` Interface (remove built in list of `AggregateFunction` s), improve the system [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 commented on issue #8708: URL: https://github.com/apache/datafusion/issues/8708#issuecomment-2131265763 > After #10648 and #10389 I think we have a pretty good set of examples of how to move aggregates out of the core (thanks to all the foundations layed by @jayzhan211 ) >

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [datafusion]

2024-05-25 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1614657172 ## datafusion/physical-plan/src/work_table.rs: ## @@ -169,7 +169,7 @@ impl ExecutionPlan for WorkTableExec { } -fn children() -> Vec> { +

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 commented on code in PR #10268: URL: https://github.com/apache/datafusion/pull/10268#discussion_r1614656536 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -1778,7 +1778,7 @@ AS VALUES ('BB', 6, 1); query TII -select col1, col2, coalesce(sum_col3, 0) as

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 commented on code in PR #10268: URL: https://github.com/apache/datafusion/pull/10268#discussion_r1614656536 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -1778,7 +1778,7 @@ AS VALUES ('BB', 6, 1); query TII -select col1, col2, coalesce(sum_col3, 0) as

Re: [PR] Minor: Remove `GetFieldAccessSchema` [datafusion]

2024-05-25 Thread via GitHub
alamb merged PR #10665: URL: https://github.com/apache/datafusion/pull/10665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[PR] Minor: Use slice in `ConcreteTreeNode` [datafusion]

2024-05-25 Thread via GitHub
peter-toth opened a new pull request, #10666: URL: https://github.com/apache/datafusion/pull/10666 The idea of using slices to return a node's children came up here: https://github.com/apache/datafusion/pull/10543#discussion_r1614057653. While it is not possible in all `TreeNode`

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-25 Thread via GitHub
alamb commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2131254591 Thank you @tustvold and @Abdullahsab3 and @mhilton and @appletreeisyellow for the thoughts. From my perspective, the current (non timezone aware) `date_bin` function has

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10268: URL: https://github.com/apache/datafusion/pull/10268#discussion_r1614654486 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -1778,7 +1778,7 @@ AS VALUES ('BB', 6, 1); query TII -select col1, col2, coalesce(sum_col3, 0) as

Re: [PR] Fix typo in Cargo.toml (unused manifest key: dependencies.regex.worksapce) [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10662: URL: https://github.com/apache/datafusion/pull/10662#issuecomment-2131244635 > oops.. my mistake. :P Many thanks. No worries! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] build(deps): bump syn from 2.0.63 to 2.0.64 [datafusion-python]

2024-05-25 Thread via GitHub
dependabot[bot] commented on PR #706: URL: https://github.com/apache/datafusion-python/pull/706#issuecomment-2131243514 Superseded by #717. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] build(deps): bump syn from 2.0.63 to 2.0.66 [datafusion-python]

2024-05-25 Thread via GitHub
dependabot[bot] opened a new pull request, #717: URL: https://github.com/apache/datafusion-python/pull/717 Bumps [syn](https://github.com/dtolnay/syn) from 2.0.63 to 2.0.66. Release notes Sourced from https://github.com/dtolnay/syn/releases;>syn's releases. 2.0.66

Re: [PR] build(deps): bump syn from 2.0.63 to 2.0.64 [datafusion-python]

2024-05-25 Thread via GitHub
dependabot[bot] closed pull request #706: build(deps): bump syn from 2.0.63 to 2.0.64 URL: https://github.com/apache/datafusion-python/pull/706 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Move Median to `functions-aggregate` and Introduce Numeric signature [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 commented on code in PR #10644: URL: https://github.com/apache/datafusion/pull/10644#discussion_r1614640283 ## datafusion/functions-aggregate/Cargo.toml: ## @@ -39,6 +39,7 @@ path = "src/lib.rs" [dependencies] arrow = { workspace = true } +arrow-schema = {

Re: [PR] change version to 38.0.1 [datafusion-python]

2024-05-25 Thread via GitHub
andygrove merged PR #716: URL: https://github.com/apache/datafusion-python/pull/716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-25 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1614639502 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -622,14 +590,89 @@ impl Cast { self.eval_mode,

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-25 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1614638361 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -622,14 +590,89 @@ impl Cast { self.eval_mode,

[PR] Remove `GetFieldAccessSchema` [datafusion]

2024-05-25 Thread via GitHub
jayzhan211 opened a new pull request, #10665: URL: https://github.com/apache/datafusion/pull/10665 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] Remove duplicate function name in its aliases list [datafusion]

2024-05-25 Thread via GitHub
goldmedal commented on PR #10661: URL: https://github.com/apache/datafusion/pull/10661#issuecomment-2131238078 Thanks, @alamb! Actually, I tried to add some tests for `MemoryFunctionRegistry` before I fixed it, but I found everything was fine. I guess the reason is that we don't

Re: [PR] Minor: Add tests showing aggregate behavior for NaNs [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10634: URL: https://github.com/apache/datafusion/pull/10634#discussion_r1614628009 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -4374,6 +4374,42 @@ GROUP BY dummy text1, text1, text1 +# Tests for aggregating with NaN values

Re: [PR] More properly handle nullability of types/literals in Substrait [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10640: URL: https://github.com/apache/datafusion/pull/10640#issuecomment-2131235829 Thanks @Blizzara and @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10573: URL: https://github.com/apache/datafusion/pull/10573#issuecomment-2131235504 > This shouldn't have passed checks. > > ``` > + cargo fmt --all -- --check > `cargo metadata` exited with an error: error: failed to load manifest for workspace member

Re: [PR] Minor: Csv Options Clean-up [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10650: URL: https://github.com/apache/datafusion/pull/10650#issuecomment-2131235103 > Maybe we can do the following: Instead of making every TableProviderFactory responsible for implementing this behavior in create, we can add some general mechanism at the trait

Re: [I] Library Guide: Building LogicalPlans [datafusion]

2024-05-25 Thread via GitHub
alamb commented on issue #7306: URL: https://github.com/apache/datafusion/issues/7306#issuecomment-2131234075 > What do you think we should do? I can add that paragraph, but maybe we also want to link the examples? What else do you think it's needed to close this Those both sound

Re: [I] Selecting struct field within field produces unexpected results [datafusion-python]

2024-05-25 Thread via GitHub
timsaucer commented on issue #715: URL: https://github.com/apache/datafusion-python/issues/715#issuecomment-2131229316 My statement above about testing on rust side is likely incorrect. I ran the same test above but loading the dataframe from a parquet file instead of creating in memory

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-25 Thread via GitHub
Abdullahsab3 commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2131223669 Thanks for filing the ticket and for all the detailed explanations! very enriching I wonder whether the Postgres behavior is actually that bad. Though it looks

Re: [I] [EPIC] Improved support for nested / structured types (`Struct` , `List`, `ListArray`, and other Composite types) [datafusion]

2024-05-25 Thread via GitHub
alamb commented on issue #2326: URL: https://github.com/apache/datafusion/issues/2326#issuecomment-2131214534 > I added an issue to support recursive unnest: #10660, i think it shoul belong to this epic Added -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Add `FileScanConfig::new()` API [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10623: URL: https://github.com/apache/datafusion/pull/10623#issuecomment-2131213174 Thanks everyone for the reviews! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Add `FileScanConfig::new()` API [datafusion]

2024-05-25 Thread via GitHub
alamb merged PR #10623: URL: https://github.com/apache/datafusion/pull/10623 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Support convert LogicalPlan JOIN with `Using` constraint to SQL String [datafusion]

2024-05-25 Thread via GitHub
alamb commented on issue #10652: URL: https://github.com/apache/datafusion/issues/10652#issuecomment-2131211386 > Is there any plan to support them? I think we should make plans to support them! I started collecting issues on https://github.com/apache/datafusion/issues/8661

[I] Suport unparsing `LogicalPlan::Window` to SQL [datafusion]

2024-05-25 Thread via GitHub
alamb opened a new issue, #10664: URL: https://github.com/apache/datafusion/issues/10664 ### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/9726 to complete the LogialPlan --> SQL conversion Converting

[I] Suport unparsing `LogicalPlan::Distinct` to `DISTINCT` [datafusion]

2024-05-25 Thread via GitHub
alamb opened a new issue, #10663: URL: https://github.com/apache/datafusion/issues/10663 ### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/9726 to complete the LogialPlan --> SQL conversion Converting

Re: [PR] Fix typo in Cargo.toml (unused manifest key: dependencies.regex.worksapce) [datafusion]

2024-05-25 Thread via GitHub
jonahgao merged PR #10662: URL: https://github.com/apache/datafusion/pull/10662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Make TaskContext wrap SessionState [datafusion]

2024-05-25 Thread via GitHub
tustvold commented on issue #10631: URL: https://github.com/apache/datafusion/issues/10631#issuecomment-2131201943 IIRC SessionConfig is the static configuration used to create a SessionContext, which is an interior mutable wrapper around SessionState. The idea was a query is planned

Re: [I] Make TaskContext wrap SessionState [datafusion]

2024-05-25 Thread via GitHub
alamb commented on issue #10631: URL: https://github.com/apache/datafusion/issues/10631#issuecomment-2131197268 > I see TaskContext is just part of SessionState. Could we just make TaskContext wrap SessionState? I can't remember why TaskContext doesn't wrap SessionState -- maybe

Re: [I] Implement protobuf serialization for LogicalPlan::Unnest [datafusion]

2024-05-25 Thread via GitHub
alamb commented on issue #10639: URL: https://github.com/apache/datafusion/issues/10639#issuecomment-2131196191 Thanks @jamesmcm -- this would be a good first issue I think for someone with Rust experience looking to help with DataFusion. The protobuf serialization code is a bit hairy I

Re: [PR] Remove duplicate function name in its aliases list [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10661: URL: https://github.com/apache/datafusion/pull/10661#issuecomment-2131195735 Thank you @goldmedal  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Remove duplicate function name in its aliases list [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10661: URL: https://github.com/apache/datafusion/pull/10661#discussion_r1614546056 ## datafusion/core/src/execution/context/mod.rs: ## @@ -2860,6 +2863,57 @@ mod tests { Ok(()) } +#[tokio::test] +async fn

Re: [PR] Update substrait requirement from 0.33.3 to 0.34.0 [datafusion]

2024-05-25 Thread via GitHub
alamb merged PR #10632: URL: https://github.com/apache/datafusion/pull/10632 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Move Median to `functions-aggregate` and Introduce Numeric signature [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10644: URL: https://github.com/apache/datafusion/pull/10644#discussion_r1614539590 ## datafusion/functions-aggregate/src/median.rs: ## @@ -15,71 +15,105 @@ // specific language governing permissions and limitations // under the License. -//! #

Re: [PR] Remove duplicate function name in its aliases list [datafusion]

2024-05-25 Thread via GitHub
goldmedal commented on code in PR #10661: URL: https://github.com/apache/datafusion/pull/10661#discussion_r1614539989 ## datafusion/core/src/execution/context/mod.rs: ## @@ -2893,21 +2893,21 @@ mod tests { for function in all_default_functions() { let udf

Re: [PR] Remove duplicate function name in its aliases list [datafusion]

2024-05-25 Thread via GitHub
goldmedal commented on code in PR #10661: URL: https://github.com/apache/datafusion/pull/10661#discussion_r1614539989 ## datafusion/core/src/execution/context/mod.rs: ## @@ -2893,21 +2893,21 @@ mod tests { for function in all_default_functions() { let udf

Re: [PR] Remove duplicate function name in its aliases list [datafusion]

2024-05-25 Thread via GitHub
goldmedal commented on code in PR #10661: URL: https://github.com/apache/datafusion/pull/10661#discussion_r1614539494 ## datafusion/optimizer/src/push_down_limit.rs: ## @@ -65,7 +65,6 @@ impl OptimizerRule for PushDownLimit { }; let Limit { skip, fetch,

Re: [PR] Fix `NULL["field"]` for expr_API [datafusion]

2024-05-25 Thread via GitHub
alamb commented on PR #10655: URL: https://github.com/apache/datafusion/pull/10655#issuecomment-2131186851 Thank you for the speedy review @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Fix `NULL["field"]` for expr_API [datafusion]

2024-05-25 Thread via GitHub
alamb merged PR #10655: URL: https://github.com/apache/datafusion/pull/10655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Error on `NULL["field_name"]`: The expression to get an indexed field is only valid for `List`, `Struct`, or `Map` types, got Null [datafusion]

2024-05-25 Thread via GitHub
alamb closed issue #10654: Error on `NULL["field_name"]`: The expression to get an indexed field is only valid for `List`, `Struct`, or `Map` types, got Null URL: https://github.com/apache/datafusion/issues/10654 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Fix `NULL["field"]` for expr_API [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10655: URL: https://github.com/apache/datafusion/pull/10655#discussion_r1614536504 ## datafusion/functions/src/core/getfield.rs: ## @@ -106,6 +106,9 @@ impl ScalarUDFImpl for GetFieldFunc { }; let access_schema =

Re: [I] Incorrect statistics read for binary columns in parquet [datafusion]

2024-05-25 Thread via GitHub
alamb closed issue #10605: Incorrect statistics read for binary columns in parquet URL: https://github.com/apache/datafusion/issues/10605 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Fix incorrect statistics read for binary columns in parquet [datafusion]

2024-05-25 Thread via GitHub
alamb merged PR #10645: URL: https://github.com/apache/datafusion/pull/10645 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] feat: add substrait support for Interval types and literals [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10646: URL: https://github.com/apache/datafusion/pull/10646#discussion_r1614534288 ## datafusion/substrait/src/variation_const.rs: ## @@ -37,3 +38,58 @@ pub const DEFAULT_CONTAINER_TYPE_REF: u32 = 0; pub const LARGE_CONTAINER_TYPE_REF: u32 = 1;

Re: [PR] fix wrong type validation on unnest expr [datafusion]

2024-05-25 Thread via GitHub
alamb merged PR #10657: URL: https://github.com/apache/datafusion/pull/10657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Wrong error thrown when unnesting a list of struct [datafusion]

2024-05-25 Thread via GitHub
alamb closed issue #10656: Wrong error thrown when unnesting a list of struct URL: https://github.com/apache/datafusion/issues/10656 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-25 Thread via GitHub
advancedxy commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1614532699 ## spark/src/test/scala/org/apache/comet/CometExpressionCoverageSuite.scala: ## @@ -217,6 +325,25 @@ class CometExpressionCoverageSuite extends CometTestBase

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [datafusion]

2024-05-25 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1614522462 ## datafusion/physical-plan/src/work_table.rs: ## @@ -169,7 +169,7 @@ impl ExecutionPlan for WorkTableExec { } -fn children() -> Vec> { +

Re: [PR] Add reference visitor `TreeNode` APIs, change `ExecutionPlan::children()` and `PhysicalExpr::children()` return references [datafusion]

2024-05-25 Thread via GitHub
peter-toth commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1614503459 ## datafusion/physical-plan/src/work_table.rs: ## @@ -169,7 +169,7 @@ impl ExecutionPlan for WorkTableExec { } -fn children() -> Vec> { +

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-25 Thread via GitHub
alamb commented on code in PR #10268: URL: https://github.com/apache/datafusion/pull/10268#discussion_r1614530863 ## datafusion/sqllogictest/test_files/select.slt: ## @@ -1473,7 +1473,7 @@ DROP TABLE t; # related to https://github.com/apache/datafusion/issues/8814 statement

Re: [PR] fix: substring with negative indices should produce correct result [datafusion-comet]

2024-05-25 Thread via GitHub
sonhmai commented on PR #470: URL: https://github.com/apache/datafusion-comet/pull/470#issuecomment-2131173613 @viirya @andygrove can you take a look. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

  1   2   >