Re: [PR] Moving min and max to new API and removing from protobuf [datafusion]

2024-06-23 Thread via GitHub
jayzhan211 commented on PR #11013: URL: https://github.com/apache/datafusion/pull/11013#issuecomment-2185750124 > Thanks. I guess I wasn't clear in my comment here [#11013 (comment)](https://github.com/apache/datafusion/pull/11013#issuecomment-2183027880) . How should that test failure be a

Re: [PR] Moving min and max to new API and removing from protobuf [datafusion]

2024-06-23 Thread via GitHub
jayzhan211 commented on PR #11013: URL: https://github.com/apache/datafusion/pull/11013#issuecomment-2185740296 > Thanks. I guess I wasn't clear in my comment here [#11013 (comment)](https://github.com/apache/datafusion/pull/11013#issuecomment-2183027880) . How should that test failure be a

Re: [PR] Change wildcard qualifier type from `String` to `TableReference` [datafusion]

2024-06-23 Thread via GitHub
linhr commented on PR #11073: URL: https://github.com/apache/datafusion/pull/11073#issuecomment-2185719725 Thank you @alamb ! I've updated the protobuf code as suggested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Improve filter predicates with `Utf8View` literals [datafusion]

2024-06-23 Thread via GitHub
XiangpengHao commented on code in PR #11043: URL: https://github.com/apache/datafusion/pull/11043#discussion_r1650374927 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -160,11 +160,47 @@ impl<'a> TypeCoercionRewriter<'a> { op: Operator, right: Exp

Re: [PR] Update sqlparser requirement from 0.41.0 to 0.43.1 [datafusion-ballista]

2024-06-23 Thread via GitHub
dependabot[bot] closed pull request #962: Update sqlparser requirement from 0.41.0 to 0.43.1 URL: https://github.com/apache/datafusion-ballista/pull/962 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Update sqlparser requirement from 0.41.0 to 0.43.1 [datafusion-ballista]

2024-06-23 Thread via GitHub
dependabot[bot] commented on PR #962: URL: https://github.com/apache/datafusion-ballista/pull/962#issuecomment-2185608449 Superseded by #1028. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Update sqlparser requirement from 0.43.0 to 0.47.0 [datafusion-ballista]

2024-06-23 Thread via GitHub
dependabot[bot] commented on PR #1028: URL: https://github.com/apache/datafusion-ballista/pull/1028#issuecomment-2185608422 The following labels could not be found: `auto-dependencies`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[PR] Update sqlparser requirement from 0.43.0 to 0.47.0 [datafusion-ballista]

2024-06-23 Thread via GitHub
dependabot[bot] opened a new pull request, #1028: URL: https://github.com/apache/datafusion-ballista/pull/1028 Updates the requirements on [sqlparser](https://github.com/sqlparser-rs/sqlparser-rs) to permit the latest version. Changelog Sourced from https://github.com/sqlparser-rs

[PR] Add ANSI support for Subtract #535 [datafusion-comet]

2024-06-23 Thread via GitHub
dharanad opened a new pull request, #593: URL: https://github.com/apache/datafusion-comet/pull/593 ## Which issue does this PR close? Closes #53 ## Rationale for this change Part of #313 ## What changes are included in this PR? ## How are these changes t

Re: [PR] decimal support for unparser [datafusion]

2024-06-23 Thread via GitHub
phillipleblanc commented on code in PR #11092: URL: https://github.com/apache/datafusion/pull/11092#discussion_r1650324640 ## datafusion/sql/src/unparser/expr.rs: ## @@ -15,6 +15,9 @@ // specific language governing permissions and limitations // under the License. +use arrow

Re: [I] Add example for writing an `AnalyzerRule` [datafusion]

2024-06-23 Thread via GitHub
alamb commented on issue #10855: URL: https://github.com/apache/datafusion/issues/10855#issuecomment-2185526847 Here is my proposed analyzer rule example: https://github.com/apache/datafusion/pull/11089 It isn't quite as fancy as the wren example, but I think it is an understandable

Re: [PR] Strip table qualifiers from schema in `UNION ALL` for unparser [datafusion]

2024-06-23 Thread via GitHub
phillipleblanc commented on code in PR #11082: URL: https://github.com/apache/datafusion/pull/11082#discussion_r1650294633 ## datafusion/sql/src/unparser/rewrite.rs: ## @@ -0,0 +1,96 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] Strip table qualifiers from schema in `UNION ALL` for unparser [datafusion]

2024-06-23 Thread via GitHub
phillipleblanc commented on code in PR #11082: URL: https://github.com/apache/datafusion/pull/11082#discussion_r1650294182 ## datafusion/sql/src/unparser/rewrite.rs: ## @@ -0,0 +1,96 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] Strip table qualifiers from schema in `UNION ALL` for unparser [datafusion]

2024-06-23 Thread via GitHub
phillipleblanc commented on code in PR #11082: URL: https://github.com/apache/datafusion/pull/11082#discussion_r1650294115 ## datafusion/sql/src/unparser/rewrite.rs: ## @@ -0,0 +1,96 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] Basic comparison for List [datafusion]

2024-06-23 Thread via GitHub
jayzhan211 commented on PR #11091: URL: https://github.com/apache/datafusion/pull/11091#issuecomment-2185498282 It seems we need to rewrite `array || element` to `array_append` first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Strip table qualifiers from schema in `UNION ALL` for unparser [datafusion]

2024-06-23 Thread via GitHub
phillipleblanc commented on code in PR #11082: URL: https://github.com/apache/datafusion/pull/11082#discussion_r1650279381 ## datafusion/sql/src/unparser/rewrite.rs: ## @@ -0,0 +1,96 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] Strip table qualifiers from schema in `UNION ALL` for unparser [datafusion]

2024-06-23 Thread via GitHub
alamb commented on code in PR #11082: URL: https://github.com/apache/datafusion/pull/11082#discussion_r1650278660 ## datafusion/sql/src/unparser/rewrite.rs: ## @@ -0,0 +1,96 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreeme

Re: [PR] Overflow in negate operator [datafusion]

2024-06-23 Thread via GitHub
alamb commented on code in PR #11084: URL: https://github.com/apache/datafusion/pull/11084#discussion_r1650272446 ## datafusion/common/src/scalar/mod.rs: ## @@ -5494,6 +5566,69 @@ mod tests { Ok(()) } +#[test] +#[allow(arithmetic_overflow)] // we want to

Re: [PR] Support COPY TO Externally Defined File Formats, add FileType trait [datafusion]

2024-06-23 Thread via GitHub
devinjdangelo commented on code in PR #11060: URL: https://github.com/apache/datafusion/pull/11060#discussion_r1650270938 ## datafusion/common/src/config.rs: ## @@ -1116,6 +1116,16 @@ macro_rules! extensions_options { } } +/// These file types have special built in behav

Re: [I] Update ListingTable to use `StatisticsConverter` [datafusion]

2024-06-23 Thread via GitHub
xinlifoobar commented on issue #10923: URL: https://github.com/apache/datafusion/issues/10923#issuecomment-2185479282 I did not notice there was a PR already. Feel free to close mine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Tracking issue: Overflow bugs in scalar math functions found by SQLancer [datafusion]

2024-06-23 Thread via GitHub
alamb commented on issue #11078: URL: https://github.com/apache/datafusion/issues/11078#issuecomment-2185477868 Amazing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Implement statistics support for Substrait [datafusion]

2024-06-23 Thread via GitHub
xinlifoobar commented on issue #8698: URL: https://github.com/apache/datafusion/issues/8698#issuecomment-2185475570 > So in other words, can we simply handle importing the basic statistics and not try to handle column level statistics? Hi @alamb, as attached in #9347, the complete phy

Re: [PR] Minor: Examples cleanup + more docs in pruning example [datafusion]

2024-06-23 Thread via GitHub
alamb merged PR #11086: URL: https://github.com/apache/datafusion/pull/11086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Minor: Examples cleanup + more docs in pruning example [datafusion]

2024-06-23 Thread via GitHub
alamb commented on PR #11086: URL: https://github.com/apache/datafusion/pull/11086#issuecomment-2185472393 Thanks @lewiszlw ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] WIP: directly pass Vec instead of &[Expr] to avoid clone [datafusion]

2024-06-23 Thread via GitHub
github-actions[bot] commented on PR #9917: URL: https://github.com/apache/datafusion/pull/9917#issuecomment-2185439493 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] Support multi-columns expr [datafusion]

2024-06-23 Thread via GitHub
github-actions[bot] commented on PR #10222: URL: https://github.com/apache/datafusion/pull/10222#issuecomment-2185439449 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] Allow providing Arrow schema when scanning Parquet files [datafusion]

2024-06-23 Thread via GitHub
HawaiianSpork commented on issue #5950: URL: https://github.com/apache/datafusion/issues/5950#issuecomment-2185426233 This should be fixed now by #10515. You can now override the schema used in the file scanner. -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] Support comparison operators on nested data types (Struct, List, ..) [datafusion]

2024-06-23 Thread via GitHub
jayzhan211 commented on issue #10856: URL: https://github.com/apache/datafusion/issues/10856#issuecomment-2185421377 Note that the comparison for nested type is not supported in arrow-rs https://github.com/apache/arrow-rs/pull/5942, so we should implement them in datafusion. First attemp

[PR] Basic comparison for List [datafusion]

2024-06-23 Thread via GitHub
jayzhan211 opened a new pull request, #11091: URL: https://github.com/apache/datafusion/pull/11091 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/9857 Other TODOs before closing #9857 1. Is Distinct 2. Nested List ## R

Re: [PR] Moving min and max to new API and removing from protobuf [datafusion]

2024-06-23 Thread via GitHub
edmondop commented on PR #11013: URL: https://github.com/apache/datafusion/pull/11013#issuecomment-2185375210 Thanks. I guess I wasn't clear in my comment here https://github.com/apache/datafusion/pull/11013#issuecomment-2183027880 . How should that test failure be addressed? It seems that

Re: [PR] Moving min and max to new API and removing from protobuf [datafusion]

2024-06-23 Thread via GitHub
jayzhan211 commented on PR #11013: URL: https://github.com/apache/datafusion/pull/11013#issuecomment-2185367844 > eliminate_distinct_from_min_expr You can take `single_distinct_groupby` as reference, there is `alias` to remain schema equivalence. Also, I suggest we introduce this r

Re: [I] Add more support for ScalarValue::Float16 where Float32 and Float64 are supported [datafusion]

2024-06-23 Thread via GitHub
Lordworms commented on issue #11083: URL: https://github.com/apache/datafusion/issues/11083#issuecomment-2185348902 I'll take it since it is an issue related to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] WIP Improve planning Examples [datafusion]

2024-06-23 Thread via GitHub
alamb closed pull request #10953: WIP Improve planning Examples URL: https://github.com/apache/datafusion/pull/10953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] WIP Improve planning Examples [datafusion]

2024-06-23 Thread via GitHub
alamb commented on PR #10953: URL: https://github.com/apache/datafusion/pull/10953#issuecomment-2185344398 Folded into https://github.com/apache/datafusion/pull/11085 and linked PRs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Add standalone example of using the SQL frontend [datafusion]

2024-06-23 Thread via GitHub
alamb commented on PR #11088: URL: https://github.com/apache/datafusion/pull/11088#issuecomment-2185344164 @andygrove I wonder if you might have some time to review this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] adding config to control Varchar behavior [datafusion]

2024-06-23 Thread via GitHub
Lordworms opened a new pull request, #11090: URL: https://github.com/apache/datafusion/pull/11090 ## Which issue does this PR close? Closes #10743 ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [PR] Moving min and max to new API and removing from protobuf [datafusion]

2024-06-23 Thread via GitHub
edmondop commented on PR #11013: URL: https://github.com/apache/datafusion/pull/11013#issuecomment-2185314780 @jayzhan211 I have started experimenting with an optimizer rule, but removing the distinct result in such an error: ``` running 2 tests test eliminate_distinct::tests::e

[PR] Add standlone example AnalyzerRule [datafusion]

2024-06-23 Thread via GitHub
alamb opened a new pull request, #11089: URL: https://github.com/apache/datafusion/pull/11089 Draft until: - [ ] update the example implement row level access control via a filter ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/10855

Re: [PR] Add standalone example for `OptimizerRule` [datafusion]

2024-06-23 Thread via GitHub
alamb commented on code in PR #11087: URL: https://github.com/apache/datafusion/pull/11087#discussion_r1650176954 ## datafusion/core/src/execution/session_state.rs: ## @@ -402,6 +402,16 @@ impl SessionState { self } +// the add_optimizer_rule takes an owned r

[PR] Minor: Examples cleanup + more docs in pruning example [datafusion]

2024-06-23 Thread via GitHub
alamb opened a new pull request, #11086: URL: https://github.com/apache/datafusion/pull/11086 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/10855 ## Rationale for this change While working on https://github.com/apache/datafusion/i

[PR] Alamb/remove rewrite expr [datafusion]

2024-06-23 Thread via GitHub
alamb opened a new pull request, #11085: URL: https://github.com/apache/datafusion/pull/11085 Draft while we fix the other examples ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/10855 ## Rationale for this change `rewrit

Re: [I] Optimizer is slow: Avoid too many string cloning in the optimizer [datafusion]

2024-06-23 Thread via GitHub
alamb closed issue #5157: Optimizer is slow: Avoid too many string cloning in the optimizer URL: https://github.com/apache/datafusion/issues/5157 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Optimizer is slow: Avoid too many string cloning in the optimizer [datafusion]

2024-06-23 Thread via GitHub
alamb commented on issue #5157: URL: https://github.com/apache/datafusion/issues/5157#issuecomment-2185277343 > @alarmb Since there is an epic issue(#5637), this issue can be closed in favor of more specific issues. Thanks again @zeodtr BTW depending on the usecase we have mad

Re: [I] Order of Interval Addition Should Affect Final Output [datafusion]

2024-06-23 Thread via GitHub
alamb commented on issue #11055: URL: https://github.com/apache/datafusion/issues/11055#issuecomment-2185275095 > Thanks so much. Should I fix it in sqlparser? I think that is probably the best way -- though it is likely pretty tricky (likely related to operator precidence). I think t

Re: [PR] Convert Correlation to UDAF [datafusion]

2024-06-23 Thread via GitHub
pingsutw commented on PR #11064: URL: https://github.com/apache/datafusion/pull/11064#issuecomment-2185264046 Thank you all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Add example for writing a `FileFormat` [datafusion]

2024-06-23 Thread via GitHub
alamb commented on issue #11079: URL: https://github.com/apache/datafusion/issues/11079#issuecomment-2185259811 > I'll take this as I want to try it out. I do this for one of my libraries via the parser plus user defined nodes, so I want to see what this looks like as an alternative.

Re: [I] Add example for writing a `FileFormat` [datafusion]

2024-06-23 Thread via GitHub
tshauck commented on issue #11079: URL: https://github.com/apache/datafusion/issues/11079#issuecomment-2185256149 I'll take this as I want to try it out. I do this for one of my libraries via the parser plus user defined nodes, so I want to see what this looks like as an alternative. --

Re: [I] Add example for writing a `FileFormat` [datafusion]

2024-06-23 Thread via GitHub
tshauck commented on issue #11079: URL: https://github.com/apache/datafusion/issues/11079#issuecomment-2185255734 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] to_timestamp functions should preserve timezone [datafusion]

2024-06-23 Thread via GitHub
comphead commented on code in PR #11038: URL: https://github.com/apache/datafusion/pull/11038#discussion_r1650133844 ## datafusion/functions/src/datetime/to_timestamp.rs: ## @@ -240,8 +254,13 @@ impl ScalarUDFImpl for ToTimestampMillisFunc { &self.signature } -

Re: [PR] Migrate more code from `Expr::to_columns` to `Expr::column_refs` [datafusion]

2024-06-23 Thread via GitHub
comphead merged PR #11067: URL: https://github.com/apache/datafusion/pull/11067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Make modulos with negative float zero compat with other engines [datafusion]

2024-06-23 Thread via GitHub
comphead commented on issue #11051: URL: https://github.com/apache/datafusion/issues/11051#issuecomment-2185162530 This is a good point @jonahgao so this gives me even more confidence to introduce a new DF param which controls should DF go with IEEE 754 -- This is an automated message fr

Re: [PR] Improve `CommonSubexprEliminate` identifier management (10% faster planning) [datafusion]

2024-06-23 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1650123696 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -351,56 +460,108 @@ impl CommonSubexprEliminate { schema: orig_schema,

Re: [PR] Improve `CommonSubexprEliminate` identifier management (10% faster planning) [datafusion]

2024-06-23 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1650123696 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -351,56 +460,108 @@ impl CommonSubexprEliminate { schema: orig_schema,

Re: [PR] Strip table qualifiers from schema in `UNION ALL` [datafusion]

2024-06-23 Thread via GitHub
phillipleblanc closed pull request #10707: Strip table qualifiers from schema in `UNION ALL` URL: https://github.com/apache/datafusion/pull/10707 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Strip table qualifiers from schema in `UNION ALL` [datafusion]

2024-06-23 Thread via GitHub
phillipleblanc commented on code in PR #10707: URL: https://github.com/apache/datafusion/pull/10707#discussion_r1650118809 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1373,33 +1373,32 @@ pub fn union(left_plan: LogicalPlan, right_plan: LogicalPlan) -> Result

Re: [PR] Strip table qualifiers from schema in `UNION ALL` [datafusion]

2024-06-23 Thread via GitHub
phillipleblanc commented on PR #10707: URL: https://github.com/apache/datafusion/pull/10707#issuecomment-2185117290 Closing in favor of #11082 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Improve `CommonSubexprEliminate` identifier management (10% faster planning) [datafusion]

2024-06-23 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1650118870 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -118,21 +137,86 @@ type CommonExprs = IndexMap; /// ProjectionExec(exprs=[extract (day from new_

Re: [PR] Improve `CommonSubexprEliminate` identifier management (10% faster planning) [datafusion]

2024-06-23 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1650118774 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -43,18 +45,35 @@ const CSE_PREFIX: &str = "__common_expr"; /// Identifier that represents a su

[PR] Strip table qualifiers from schema in `UNION ALL` for unparser [datafusion]

2024-06-23 Thread via GitHub
phillipleblanc opened a new pull request, #11082: URL: https://github.com/apache/datafusion/pull/11082 ## Which issue does this PR close? Closes #10706 ## Rationale for this change The schema that is the result of a UNION ALL should not have any table qualifiers, as the

Re: [PR] Improve `CommonSubexprEliminate` identifier management (10% faster planning) [datafusion]

2024-06-23 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1650117705 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -43,18 +45,35 @@ const CSE_PREFIX: &str = "__common_expr"; /// Identifier that represents a su

Re: [PR] Improve `CommonSubexprEliminate` identifier management (10% faster planning) [datafusion]

2024-06-23 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1650117524 ## datafusion/expr/src/expr.rs: ## @@ -1461,6 +1462,176 @@ impl Expr { | Expr::Placeholder(..) => false, } } + +/// This method h

Re: [PR] Improve `CommonSubexprEliminate` identifier management (10% faster planning) [datafusion]

2024-06-23 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1650116827 ## datafusion/expr/src/expr.rs: ## @@ -1461,6 +1462,176 @@ impl Expr { | Expr::Placeholder(..) => false, } } + +/// This method h

Re: [PR] Improve `CommonSubexprEliminate` identifier management (10% faster planning) [datafusion]

2024-06-23 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1650116199 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -351,56 +468,108 @@ impl CommonSubexprEliminate { schema: orig_schema,

Re: [PR] fix: modulo op with negative zero divisor produces Nan [datafusion-comet]

2024-06-23 Thread via GitHub
vaibhawvipul commented on PR #585: URL: https://github.com/apache/datafusion-comet/pull/585#issuecomment-2185093456 can someone please re-run the CI? It seems like a task failed because of network error. cc @kazuyukitanimura @comphead -- This is an automated message from the Apac

Re: [I] Overflow bug in POW scalar function (found by SQLancer) [datafusion]

2024-06-23 Thread via GitHub
LorrensP-2158466 commented on issue #11075: URL: https://github.com/apache/datafusion/issues/11075#issuecomment-2185083331 > Now that only just a few types/functions have been fuzzed, I'm expecting more function-local bugs to appear. That's right, fixing GCD & LCM was very local, only

Re: [I] Make modulos with negative float zero compat with other engines [datafusion]

2024-06-23 Thread via GitHub
jonahgao commented on issue #11051: URL: https://github.com/apache/datafusion/issues/11051#issuecomment-2185068003 arrow-rs [follows](https://github.com/apache/arrow-rs/blob/a35214f92ad7c3bce19875bb091cb776447aa49e/arrow-arith/src/numeric.rs#L74) the rules of IEEE 754. If we intend to be co

Re: [PR] Support COPY TO Externally Defined File Formats, add FileType trait [datafusion]

2024-06-23 Thread via GitHub
alamb commented on code in PR #11060: URL: https://github.com/apache/datafusion/pull/11060#discussion_r1650104711 ## datafusion/core/src/datasource/file_format/file_compression_type.rs: ## @@ -245,90 +250,16 @@ pub trait FileTypeExt { fn get_ext_with_compression(&self, c: F

Re: [I] Add `union_extract` function [datafusion]

2024-06-23 Thread via GitHub
gstvg commented on issue #11081: URL: https://github.com/apache/datafusion/issues/11081#issuecomment-2185066330 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[I] Add `union_extract` function [datafusion]

2024-06-23 Thread via GitHub
gstvg opened a new issue, #11081: URL: https://github.com/apache/datafusion/issues/11081 ### Is your feature request related to a problem or challenge? Retrieve the value of the given union variant, or `NULL` if it's not currently selected ### Describe the solution you'd like

Re: [PR] Support COPY TO Externally Defined File Formats, add FileType trait [datafusion]

2024-06-23 Thread via GitHub
alamb commented on code in PR #11060: URL: https://github.com/apache/datafusion/pull/11060#discussion_r1650084547 ## datafusion/core/src/datasource/file_format/mod.rs: ## @@ -41,12 +42,27 @@ use crate::error::Result; use crate::execution::context::SessionState; use crate::phys

[I] Add example for writing a `FileFormat` [datafusion]

2024-06-23 Thread via GitHub
alamb opened a new issue, #11079: URL: https://github.com/apache/datafusion/issues/11079 ### Is your feature request related to a problem or challenge? Now that @devinjdangelo has added better support for user defined file formats (see https://github.com/apache/datafusion/pull/11060)

Re: [PR] feat: Create datafusion-distributed crate with shuffle reader/writer [datafusion]

2024-06-23 Thread via GitHub
andygrove commented on PR #11070: URL: https://github.com/apache/datafusion/pull/11070#issuecomment-2185005357 > From my perspective if there is more than one user of this code (e.g more than Ballista) then it makes sense to put it in datafusion. If there is realistically only one user of t

Re: [PR] Resolve empty relation opt for join types [datafusion]

2024-06-23 Thread via GitHub
LorrensP-2158466 commented on code in PR #11066: URL: https://github.com/apache/datafusion/pull/11066#discussion_r1650084165 ## datafusion/optimizer/src/propagate_empty_relation.rs: ## @@ -142,13 +146,19 @@ impl OptimizerRule for PropagateEmptyRelation {

[I] Overflow bugs in scalar math functions found by SQLancer [datafusion]

2024-06-23 Thread via GitHub
LorrensP-2158466 opened a new issue, #11078: URL: https://github.com/apache/datafusion/issues/11078 SQLancer #11030 has found several overflow bugs in the math functions, this issue tracks those: - [x] overflow in GCD & LCM #11053 - [ ] Overflow bug in negate arithmetic operator #11076

Re: [PR] Resolve empty relation opt for join types [datafusion]

2024-06-23 Thread via GitHub
alamb commented on code in PR #11066: URL: https://github.com/apache/datafusion/pull/11066#discussion_r1650081511 ## datafusion/optimizer/src/propagate_empty_relation.rs: ## @@ -142,13 +146,19 @@ impl OptimizerRule for PropagateEmptyRelation { schema

Re: [I] Overflow bug in negate arithmetic operator (found by SQLancer) [datafusion]

2024-06-23 Thread via GitHub
LorrensP-2158466 commented on issue #11076: URL: https://github.com/apache/datafusion/issues/11076#issuecomment-2184999383 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Resolve empty relation opt for join types [datafusion]

2024-06-23 Thread via GitHub
alamb commented on code in PR #11066: URL: https://github.com/apache/datafusion/pull/11066#discussion_r1650081511 ## datafusion/optimizer/src/propagate_empty_relation.rs: ## @@ -142,13 +146,19 @@ impl OptimizerRule for PropagateEmptyRelation { schema

Re: [I] Overflow bug in FACTORIAL scalar function (found by SQLancer) [datafusion]

2024-06-23 Thread via GitHub
LorrensP-2158466 commented on issue #11074: URL: https://github.com/apache/datafusion/issues/11074#issuecomment-2184999306 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] Overflow bug in POW scalar function (found by SQLancer) [datafusion]

2024-06-23 Thread via GitHub
LorrensP-2158466 commented on issue #11075: URL: https://github.com/apache/datafusion/issues/11075#issuecomment-2184998761 this is the third, scalar function that panics instead of returning an error, maybe it's a good idea to open up a separate tracking issue to find all of these cases?

Re: [I] Overflow bug in POW scalar function (found by SQLancer) [datafusion]

2024-06-23 Thread via GitHub
LorrensP-2158466 commented on issue #11075: URL: https://github.com/apache/datafusion/issues/11075#issuecomment-2184998506 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] feat: Support duplicate column names in Joins in Substrait consumer [datafusion]

2024-06-23 Thread via GitHub
alamb commented on code in PR #11049: URL: https://github.com/apache/datafusion/pull/11049#discussion_r1650079670 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -317,16 +317,27 @@ fn rename_expressions( .into_iter() Review Comment: As `rename_expression

Re: [PR] feat: Implement more efficient version of xxhash64 [datafusion-comet]

2024-06-23 Thread via GitHub
andygrove commented on code in PR #575: URL: https://github.com/apache/datafusion-comet/pull/575#discussion_r1650079025 ## core/src/execution/datafusion/expressions/xxhash64.rs: ## @@ -0,0 +1,186 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

Re: [PR] Improve `CommonSubexprEliminate` identifier management (10% faster planning) [datafusion]

2024-06-23 Thread via GitHub
alamb commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1650073589 ## datafusion/expr/src/expr.rs: ## @@ -1461,6 +1462,176 @@ impl Expr { | Expr::Placeholder(..) => false, } } + +/// This method hashes

Re: [I] Release DataFusion `39.0.0` [datafusion]

2024-06-23 Thread via GitHub
alamb closed issue #10517: Release DataFusion `39.0.0` URL: https://github.com/apache/datafusion/issues/10517 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [I] Release DataFusion `39.0.0` [datafusion]

2024-06-23 Thread via GitHub
alamb commented on issue #10517: URL: https://github.com/apache/datafusion/issues/10517#issuecomment-2184983071 Released on june 10: https://crates.io/crates/datafusion/39.0.0 ticket for 40: https://github.com/apache/datafusion/issues/11077 -- This is an automated message from t

[I] Release DataFusion `40.0.0` [datafusion]

2024-06-23 Thread via GitHub
alamb opened a new issue, #11077: URL: https://github.com/apache/datafusion/issues/11077 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [PR] Better CSE identifier [datafusion]

2024-06-23 Thread via GitHub
alamb commented on PR #10473: URL: https://github.com/apache/datafusion/pull/10473#issuecomment-2184968464 I ran the benchmarks again: Looks to me like this PR makes planning 10% faster ``` physical_plan_tpcds_all 1.00 1074.8±5.94ms? ?/sec

Re: [PR] Migrate more code from `Expr::to_columns` to `Expr::column_refs` [datafusion]

2024-06-23 Thread via GitHub
alamb commented on code in PR #11067: URL: https://github.com/apache/datafusion/pull/11067#discussion_r1650060628 ## datafusion/optimizer/src/utils.rs: ## @@ -66,6 +66,16 @@ pub fn optimize_children( } } +/// Returns true if all columns in col_refs are in `schema_cols` +

Re: [PR] Give `OptimizerRule::try_optimize` default implementation and cleanup duplicated custom implementations [datafusion]

2024-06-23 Thread via GitHub
alamb commented on PR #11059: URL: https://github.com/apache/datafusion/pull/11059#issuecomment-2184957876 🎉 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] Data set which is much bigger than RAM [datafusion]

2024-06-23 Thread via GitHub
alamb commented on issue #10897: URL: https://github.com/apache/datafusion/issues/10897#issuecomment-2184957285 Hi @Smotrov -- I agree the use of 20-30 GB seems not good. Perhaps there is something in DataFusion that is not accounting for memory correctly (perhaps it is the decoding of the

Re: [PR] feat: Create datafusion-distributed crate with shuffle reader/writer [datafusion]

2024-06-23 Thread via GitHub
alamb commented on PR #11070: URL: https://github.com/apache/datafusion/pull/11070#issuecomment-2184956533 FWIW we (InfluxData) would likely (never) end up using the shuffle reader / writer, nor a distributed query planner (we would have our own). From my perspective if there is more

Re: [I] Implement statistics support for Substrait [datafusion]

2024-06-23 Thread via GitHub
alamb commented on issue #8698: URL: https://github.com/apache/datafusion/issues/8698#issuecomment-2184955952 So in other words, can we simply handle importing the basic statistics and not try to handle column level statistics? -- This is an automated message from the Apache Git Service.

Re: [I] Implement statistics support for Substrait [datafusion]

2024-06-23 Thread via GitHub
alamb commented on issue #8698: URL: https://github.com/apache/datafusion/issues/8698#issuecomment-2184955782 Hi @xinlifoobar -- I would personally suggest we don't try to encode statistics yet in Substrait because: 1. We may want to change Statistics in DataFusion in the future 2. I d

Re: [I] Order of Interval Addition Should Affect Final Output [datafusion]

2024-06-23 Thread via GitHub
alamb commented on issue #11055: URL: https://github.com/apache/datafusion/issues/11055#issuecomment-2184954245 I agree with @Lordworms the root cause is related to the precidence rules in sqlparser which control if expressions like ```sl DATE '2019-02-28' + INTERVAL '1 Y

Re: [I] Convert `Correlation` to UDAF [datafusion]

2024-06-23 Thread via GitHub
jayzhan211 closed issue #10884: Convert `Correlation` to UDAF URL: https://github.com/apache/datafusion/issues/10884 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] Convert Correlation to UDAF [datafusion]

2024-06-23 Thread via GitHub
jayzhan211 merged PR #11064: URL: https://github.com/apache/datafusion/pull/11064 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Convert Correlation to UDAF [datafusion]

2024-06-23 Thread via GitHub
jayzhan211 commented on PR #11064: URL: https://github.com/apache/datafusion/pull/11064#issuecomment-2184952046 Thanks @alamb and @pingsutw -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Change wildcard qualifier type from `String` to `TableReference` [datafusion]

2024-06-23 Thread via GitHub
alamb commented on code in PR #11073: URL: https://github.com/apache/datafusion/pull/11073#discussion_r1650041705 ## datafusion/proto/proto/datafusion.proto: ## @@ -369,7 +369,8 @@ message LogicalExprNode { } message Wildcard { - string qualifier = 1; + string qualifier =

Re: [PR] handle overflow in gcd and return this as an error [datafusion]

2024-06-23 Thread via GitHub
alamb merged PR #11057: URL: https://github.com/apache/datafusion/pull/11057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] handle overflow in gcd and return this as an error [datafusion]

2024-06-23 Thread via GitHub
alamb commented on PR #11057: URL: https://github.com/apache/datafusion/pull/11057#issuecomment-2184934361 Thanks again @LorrensP-2158466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

  1   2   >