[I] Seperate out common types from Datafusion Proto [datafusion]

2024-05-12 Thread via GitHub
mustafasrepo opened a new issue, #10477: URL: https://github.com/apache/datafusion/issues/10477 ### Is your feature request related to a problem or challenge? Currently `datafusion-proto` crate, have serialization and deserialization support for - Common Core Types (`ColumnRelatio

Re: [PR] Support explicit type and name during table creation [datafusion]

2024-05-12 Thread via GitHub
duongcongtoai commented on PR #10273: URL: https://github.com/apache/datafusion/pull/10273#issuecomment-2106723240 waiting for sql-parser upgrade in this PR: https://github.com/apache/datafusion/pull/10392 -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] build: Switch back to released version of DataFusion and arrow-rs after Arrow Java 16 is released [datafusion-comet]

2024-05-12 Thread via GitHub
viirya commented on PR #403: URL: https://github.com/apache/datafusion-comet/pull/403#issuecomment-2106677456 I fixed the issue at arrow-rs in the PR https://github.com/apache/arrow-rs/issues/5756. We can continue this PR after the fix is released. Keep this as draft for now. -

[PR] fix: parsing timestamp with date format [datafusion]

2024-05-12 Thread via GitHub
shanretoo opened a new pull request, #10476: URL: https://github.com/apache/datafusion/pull/10476 ## Which issue does this PR close? Closes #10471. ## Rationale for this change Fallback to `Parsed::to_naive_date` when `Parsed::to_naive_datetime_with_offset` failed if the

Re: [I] Implement `LogicalPlanBuilder::from` for `Arc` [datafusion]

2024-05-12 Thread via GitHub
ClSlaid commented on issue #10465: URL: https://github.com/apache/datafusion/issues/10465#issuecomment-2106599908 > So perhaps you are suggesting `LogicalPlanBuilder` has an `Arc` as its internal state (rather than a `LogicalPlan`) This is surely the most straight forward way. -

Re: [PR] Count distinct support multiple expressions [datafusion]

2024-05-12 Thread via GitHub
github-actions[bot] commented on PR #5939: URL: https://github.com/apache/datafusion/pull/5939#issuecomment-2106491000 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] add EliminateSubqueryAliases [datafusion]

2024-05-12 Thread via GitHub
github-actions[bot] closed pull request #: add EliminateSubqueryAliases URL: https://github.com/apache/datafusion/pull/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] refactor: Reduce string allocations in Expr::display_name (use write instead of format!) [datafusion]

2024-05-12 Thread via GitHub
erratic-pattern commented on PR #10454: URL: https://github.com/apache/datafusion/pull/10454#issuecomment-2106390379 Nice! Those results look a lot better than what I found on my laptop. Very hard to get consistent benchmark results on a personal computer when there's so much process schedu

[PR] refactor: reduce string allocations with Cow [datafusion]

2024-05-12 Thread via GitHub
erratic-pattern opened a new pull request, #10475: URL: https://github.com/apache/datafusion/pull/10475 This is an experiment to see if reducing string allocations with `Cow` has any performance improvement. My local benchmarks show mixed results. My guess is that there is a small ad

Re: [I] Implement `LogicalPlanBuilder::from` for `Arc` [datafusion]

2024-05-12 Thread via GitHub
alamb commented on issue #10465: URL: https://github.com/apache/datafusion/issues/10465#issuecomment-2106377308 That is an interesting point @ClSlaid So perhaps you are suggesting `LogicalPlanBuilder` has an `Arc` as its internal state (rather than a `LogicalPlan`) -- This is an a

Re: [PR] Add `Expr::try_as_col`, deprecate `Expr::try_into_col` [datafusion]

2024-05-12 Thread via GitHub
alamb commented on PR #10448: URL: https://github.com/apache/datafusion/pull/10448#issuecomment-2106376912 Thanks @sadboy 🙏 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] fix `null_count` on `compute_record_batch_statistics` [datafusion]

2024-05-12 Thread via GitHub
samuelcolvin commented on code in PR #10468: URL: https://github.com/apache/datafusion/pull/10468#discussion_r1597704992 ## datafusion/physical-plan/src/common.rs: ## @@ -153,16 +153,23 @@ pub fn compute_record_batch_statistics( }) .sum(); -let mut column

Re: [I] bug: `CAST()` causes internal error [datafusion]

2024-05-12 Thread via GitHub
viirya commented on issue #10464: URL: https://github.com/apache/datafusion/issues/10464#issuecomment-2106346214 I did a quick test using sqllogictest. The two queries are running okay with current `main` branch. ``` query ? SELECT CAST(MAKE_ARRAY(1, 2, 3) AS VARCHAR[])

[PR] Add cast array test to sqllogictest [datafusion]

2024-05-12 Thread via GitHub
viirya opened a new pull request, #10474: URL: https://github.com/apache/datafusion/pull/10474 ## Which issue does this PR close? Closes #10464. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [I] Make `CommonSubexprEliminate` faster by avoiding the use of strings [datafusion]

2024-05-12 Thread via GitHub
peter-toth commented on issue #10426: URL: https://github.com/apache/datafusion/issues/10426#issuecomment-2106334918 I've opened a draft PR: https://github.com/apache/datafusion/pull/10473 and will try to wrap it up in the following days. -- This is an automated message from the Apache G

[I] Docker CLI build fails in WSL2 - "Ubuntu 22.04.4 LTS" [datafusion]

2024-05-12 Thread via GitHub
Omega359 opened a new issue, #10472: URL: https://github.com/apache/datafusion/issues/10472 ### Describe the bug ``` 55.32 The following warnings were emitted during compilation: 55.33 55.33 warning: In file included from c_src/mimalloc/src/alloc.c:14, 55.33 warning:

[I] to_date with a date string and format fails with error parsing timestamp [datafusion]

2024-05-12 Thread via GitHub
Omega359 opened a new issue, #10471: URL: https://github.com/apache/datafusion/issues/10471 ### Describe the bug to_date parsing of a date string with a format fails with an error parsing timestamp. ### To Reproduce ``` ❯ docker run -it -v /tmp:/data datafusion-cli

Re: [I] Implement Nicer / DuckDB style explain plans [datafusion]

2024-05-12 Thread via GitHub
tinfoil-knight commented on issue #9371: URL: https://github.com/apache/datafusion/issues/9371#issuecomment-2106289301 I'm un-assigning myself since I didn't get the time to plan and implement this feature. For other contributors, these links might help: - https://github.com/du

[PR] fix: avoid compressed json files repartitioning [datafusion]

2024-05-12 Thread via GitHub
korowa opened a new pull request, #10470: URL: https://github.com/apache/datafusion/pull/10470 ## Which issue does this PR close? Closes #10435. ## Rationale for this change Turning off repartitioning for compressed JSON files allows not to fail on readin

Re: [I] `array_slice` panics with `stride=1` [datafusion]

2024-05-12 Thread via GitHub
jonahgao commented on issue #10425: URL: https://github.com/apache/datafusion/issues/10425#issuecomment-2106277761 I plan to work on this after #10424 is resolved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] `array_slice` panics with `stride=1` [datafusion]

2024-05-12 Thread via GitHub
jonahgao commented on issue #10425: URL: https://github.com/apache/datafusion/issues/10425#issuecomment-2106275257 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-12 Thread via GitHub
jonahgao commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-2106272710 I submitted a formal PR #10469 to address this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[PR] feat: allow `array_slice` to take an optional stride parameter [datafusion]

2024-05-12 Thread via GitHub
jonahgao opened a new pull request, #10469: URL: https://github.com/apache/datafusion/pull/10469 ## Which issue does this PR close? Closes #10424. ## Rationale for this change Based on the draft PR #10450, and reuse the existing `make_udf_function` macro as much as p

Re: [PR] Add examples from TPC-H [datafusion-python]

2024-05-12 Thread via GitHub
Michael-J-Ward commented on code in PR #666: URL: https://github.com/apache/datafusion-python/pull/666#discussion_r1597650461 ## examples/tpch/q05_local_supplier_volume.py: ## @@ -0,0 +1,102 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor

Re: [PR] fix: make `columnize_expr` resistant to display_name collisions [datafusion]

2024-05-12 Thread via GitHub
jonahgao commented on code in PR #10459: URL: https://github.com/apache/datafusion/pull/10459#discussion_r1597639097 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -210,7 +210,6 @@ async fn test_count_wildcard_on_aggregate() -> Result<()> { let sql_results = ctx

Re: [PR] fix: make `columnize_expr` resistant to display_name collisions [datafusion]

2024-05-12 Thread via GitHub
jonahgao commented on code in PR #10459: URL: https://github.com/apache/datafusion/pull/10459#discussion_r1597638606 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -1154,12 +1154,12 @@ mod test { let table_scan = test_table_scan()?; let plan =

Re: [PR] fix: make `columnize_expr` resistant to display_name collisions [datafusion]

2024-05-12 Thread via GitHub
jonahgao commented on code in PR #10459: URL: https://github.com/apache/datafusion/pull/10459#discussion_r1597639097 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -210,7 +210,6 @@ async fn test_count_wildcard_on_aggregate() -> Result<()> { let sql_results = ctx

Re: [PR] fix: make `columnize_expr` resistant to display_name collisions [datafusion]

2024-05-12 Thread via GitHub
jonahgao commented on code in PR #10459: URL: https://github.com/apache/datafusion/pull/10459#discussion_r1597638606 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -1154,12 +1154,12 @@ mod test { let table_scan = test_table_scan()?; let plan =

Re: [PR] fix: make `columnize_expr` resistant to display_name collisions [datafusion]

2024-05-12 Thread via GitHub
jonahgao commented on code in PR #10459: URL: https://github.com/apache/datafusion/pull/10459#discussion_r1597638606 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -1154,12 +1154,12 @@ mod test { let table_scan = test_table_scan()?; let plan =

Re: [PR] fix: make `columnize_expr` resistant to display_name collisions [datafusion]

2024-05-12 Thread via GitHub
jonahgao commented on code in PR #10459: URL: https://github.com/apache/datafusion/pull/10459#discussion_r1597637521 ## datafusion/optimizer/src/single_distinct_to_groupby.rs: ## @@ -280,11 +261,7 @@ impl OptimizerRule for SingleDistinctToGroupBy { outer

[PR] fix `null_count` on `compute_record_batch_statistics` [datafusion]

2024-05-12 Thread via GitHub
samuelcolvin opened a new pull request, #10468: URL: https://github.com/apache/datafusion/pull/10468 ## Rationale for this change Maybe I'm missing something or being dumb, but while reading `datafusion::physical_plan::common::compute_record_batch_statistics` I noticed that the `null

Re: [PR] Add support for reading CSV files with comments [datafusion]

2024-05-12 Thread via GitHub
bbannier commented on PR #10467: URL: https://github.com/apache/datafusion/pull/10467#issuecomment-2106229273 This is currently a sketch for a possible implementation for #10262. The approach taken push interpretation of comment lines into `arrow-csv` apache/arrow-rs#5759 adding support for

Re: [PR] Standardize the separator in name [datafusion]

2024-05-12 Thread via GitHub
Weijun-H commented on code in PR #10363: URL: https://github.com/apache/datafusion/pull/10363#discussion_r1597623051 ## datafusion/common/src/test_util.rs: ## @@ -52,6 +52,22 @@ macro_rules! assert_batches_eq { }; } +#[macro_export] +macro_rules! assert_snapshot { +(

Re: [I] Move optimizer rule that has aggregate function out of core [datafusion]

2024-05-12 Thread via GitHub
jayzhan211 commented on issue #10455: URL: https://github.com/apache/datafusion/issues/10455#issuecomment-2106220976 let me close the issue first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] Move optimizer rule that has aggregate function out of core [datafusion]

2024-05-12 Thread via GitHub
jayzhan211 closed issue #10455: Move optimizer rule that has aggregate function out of core URL: https://github.com/apache/datafusion/issues/10455 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Move optimizer rule that has aggregate function out of core [datafusion]

2024-05-12 Thread via GitHub
jayzhan211 commented on issue #10455: URL: https://github.com/apache/datafusion/issues/10455#issuecomment-2106220210 I thought I need to import `functions-aggregate` to create new aggregate function expression, but I found out that I could just modify AggregateFunction directly, since the f

[PR] Add support for reading CSV files with comments [datafusion]

2024-05-12 Thread via GitHub
bbannier opened a new pull request, #10467: URL: https://github.com/apache/datafusion/pull/10467 This PR adds support for parsing CSV files containing comment lines. Closes #10262. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[PR] implement From> for LogicalPlanBuilder [datafusion]

2024-05-12 Thread via GitHub
AbrarNitk opened a new pull request, #10466: URL: https://github.com/apache/datafusion/pull/10466 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/10465 ## Rationale for this change ## What changes are included in this PR?

Re: [PR] refactor: Reduce string allocations in Expr::display_name (use write instead of format!) [datafusion]

2024-05-12 Thread via GitHub
alamb merged PR #10454: URL: https://github.com/apache/datafusion/pull/10454 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] refactor: Reduce string allocations in Expr::display_name (use write instead of format!) [datafusion]

2024-05-12 Thread via GitHub
alamb commented on PR #10454: URL: https://github.com/apache/datafusion/pull/10454#issuecomment-2106192280 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] make some datasource listing helper functions public? [datafusion]

2024-05-12 Thread via GitHub
alamb commented on issue #10449: URL: https://github.com/apache/datafusion/issues/10449#issuecomment-2106191715 I think making the functions public would be fine. In general I think it would be super valuable to pull the ListingTable code out of the core crate (into `datafusion-listi

Re: [I] Move optimizer rule that has aggregate function out of core [datafusion]

2024-05-12 Thread via GitHub
alamb commented on issue #10455: URL: https://github.com/apache/datafusion/issues/10455#issuecomment-2106190817 What part of the rule was aggregate specific? I think the single distinct aggregate removal rule is general to all aggreagetes (it isn't aggregate specific) including user d

Re: [I] Apply guarantee rewriter to sql workflow [datafusion]

2024-05-12 Thread via GitHub
alamb commented on issue #10456: URL: https://github.com/apache/datafusion/issues/10456#issuecomment-2106189621 I think this may be another example of what @samuelcolvin was suggesting on https://github.com/apache/datafusion/issues/10400 I think we could use [ExecutionPlan::statisti

Re: [I] bug: `CAST()` causes internal error [datafusion]

2024-05-12 Thread via GitHub
alamb commented on issue #10464: URL: https://github.com/apache/datafusion/issues/10464#issuecomment-2106188918 definitely looks like a bug -- thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Stop copying LogicalPlan and Exprs in `ReplaceDistinctWithAggregate` [datafusion]

2024-05-12 Thread via GitHub
alamb commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597601176 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

[I] Ah, I think you are asking about getting `LogicalPlan` from `Arc` [datafusion]

2024-05-12 Thread via GitHub
alamb opened a new issue, #10465: URL: https://github.com/apache/datafusion/issues/10465 Ah, I think you are asking about getting `LogicalPlan` from `Arc` One way to do it today is using `unwrap_arc` (source below) which will only copy when necessary ```rust

Re: [PR] Stop copying LogicalPlan and Exprs in `ReplaceDistinctWithAggregate` [datafusion]

2024-05-12 Thread via GitHub
alamb commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597600173 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [I] Support skipping comments in `CsvReader` [datafusion]

2024-05-12 Thread via GitHub
bbannier commented on issue #10262: URL: https://github.com/apache/datafusion/issues/10262#issuecomment-2106180515 I opened https://github.com/apache/arrow-rs/pull/5759 to add comment support to Arrow's CSV reader. With that the work here is mostly around passing that flag from user code to

Re: [PR] Standardize the separator in name [datafusion]

2024-05-12 Thread via GitHub
jayzhan211 commented on code in PR #10363: URL: https://github.com/apache/datafusion/pull/10363#discussion_r1597583440 ## datafusion/expr/src/expr.rs: ## @@ -1859,7 +1859,7 @@ pub(crate) fn create_name(e: &Expr) -> Result { AggregateFunctionDefinition::UDF(..) =

[I] bug: `CAST()` causes internal error [datafusion]

2024-05-12 Thread via GitHub
NickCrews opened a new issue, #10464: URL: https://github.com/apache/datafusion/issues/10464 ### Describe the bug The SQL `SELECT CAST(MAKE_ARRAY(1, 2, 3)` works just fine. But if you add a cast in there, eg `SELECT CAST(MAKE_ARRAY(1, 2, 3) AS VARCHAR[])` or `SELECT CAST(MAKE_ARRAY