Re: [PR] Sort out testcases in `aggregation.slt` [datafusion]

2025-01-26 Thread via GitHub
Rachelint commented on code in PR #14301: URL: https://github.com/apache/datafusion/pull/14301#discussion_r1929703073 ## datafusion/sqllogictest/test_files/aggregate/string.slt: ## Review Comment: Seems switching to use data type to split rather than function? ##

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-26 Thread via GitHub
shehabgamin commented on code in PR #14268: URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929718209 ## datafusion/sqllogictest/test_files/expr.slt: ## @@ -344,6 +344,16 @@ SELECT ascii('💯a') 128175 +query I +SELECT ascii('222') + +50 + +query I +

Re: [I] User Defined Coercion Rules [datafusion]

2025-01-26 Thread via GitHub
alamb commented on issue #14296: URL: https://github.com/apache/datafusion/issues/14296#issuecomment-2614324472 > We have type coercion in logical plan now, consider the case where we want to separate logical types and physical types, should we add another type coercion layer in physical op

Re: [PR] Sort out testcases in `aggregation.slt` [datafusion]

2025-01-26 Thread via GitHub
logan-keede commented on code in PR #14301: URL: https://github.com/apache/datafusion/pull/14301#discussion_r1929718734 ## datafusion/sqllogictest/test_files/aggregate/string.slt: ## Review Comment: function name is `STRING_AGG` maybe we should switch that. I think it might

Re: [I] User Defined Coercion Rules [datafusion]

2025-01-26 Thread via GitHub
alamb commented on issue #14296: URL: https://github.com/apache/datafusion/issues/14296#issuecomment-2614325170 > For instance, with the built-in UDFs that DataFusion offers, it would be powerful if users could customize various components of a UDF. That is an interesting idea @shehab

Re: [I] `UnwrapCastInComparison` produces incorrect results [datafusion]

2025-01-26 Thread via GitHub
Spaarsh commented on issue #14303: URL: https://github.com/apache/datafusion/issues/14303#issuecomment-2614438236 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[PR] fix: pass scale to DF round in spark_round [datafusion-comet]

2025-01-26 Thread via GitHub
cht42 opened a new pull request, #1341: URL: https://github.com/apache/datafusion-comet/pull/1341 ## Which issue does this PR close? Closes #1340. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] `UnwrapCastInComparison` produces incorrect results [datafusion]

2025-01-26 Thread via GitHub
Spaarsh commented on issue #14303: URL: https://github.com/apache/datafusion/issues/14303#issuecomment-2614438204 I have analyzed the code in [unwrap_cast_in_comparison.rs](https://github.com/apache/datafusion/blob/7c07948358eac81c4b297fa2400cba3c9ca55dc2/datafusion/optimizer/src/unwrap_cast

[I] Scale argument is not passed to DF `round` in `spark_round` [datafusion-comet]

2025-01-26 Thread via GitHub
cht42 opened a new issue, #1340: URL: https://github.com/apache/datafusion-comet/issues/1340 ### Describe the bug Scale argument is not passed to DF `round` in `spark_round` ### Steps to reproduce _No response_ ### Expected behavior _No response_ ###

[PR] test: attempt to analyze boundaries for select columns [datafusion]

2025-01-26 Thread via GitHub
hiltontj opened a new pull request, #14308: URL: https://github.com/apache/datafusion/pull/14308 ## Which issue does this PR close? This does not close any issues. It is a reproducer (issue incoming). ## Rationale for this change ## What changes are includ

Re: [I] Allow for bounds analysis on selective columns in a schema [datafusion]

2025-01-26 Thread via GitHub
hiltontj commented on issue #14309: URL: https://github.com/apache/datafusion/issues/14309#issuecomment-2614450399 Another thing worth noting is that from what I've seen, DataFusion splits filter expressions on `AND` conjunctions. So in practice, _I think_ this is less of an issue, because

Re: [PR] Feat/ffi enter tokio runtime [datafusion]

2025-01-26 Thread via GitHub
timsaucer commented on code in PR #13937: URL: https://github.com/apache/datafusion/pull/13937#discussion_r1929789647 ## datafusion/ffitest/src/async_provider.rs: ## @@ -0,0 +1,272 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] bench: add array_agg benchmark [datafusion]

2025-01-26 Thread via GitHub
rluvaton commented on PR #14302: URL: https://github.com/apache/datafusion/pull/14302#issuecomment-2614449109 I'll see if I can help, later this week. if you ran the benchmark I would appreciate positing it on the improvement PR and not here of course -- This is an automated messag

Re: [I] [DISCUSSION]: Unified approach for joins to output batches close to `batch_size` [datafusion]

2025-01-26 Thread via GitHub
korowa commented on issue #14238: URL: https://github.com/apache/datafusion/issues/14238#issuecomment-2614450912 Simple embedding of coalescer into filter ([branch](https://github.com/korowa/arrow-datafusion/tree/coalesce-filter) [commit](https://github.com/korowa/arrow-datafusion/commit/ee

Re: [PR] Feat/ffi enter tokio runtime [datafusion]

2025-01-26 Thread via GitHub
timsaucer commented on PR #13937: URL: https://github.com/apache/datafusion/pull/13937#issuecomment-2614455411 CI repaired after moving the crate out. @kevinjqliu would you be able to review? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [I] Why uuid is only assigned for create_dataframe, not assigned for read_xxx [datafusion-python]

2025-01-26 Thread via GitHub
timsaucer commented on issue #996: URL: https://github.com/apache/datafusion-python/issues/996#issuecomment-2614453761 I believe this PR will resolve it, but it's been waiting for code review: https://github.com/apache/datafusion-python/pull/964 -- This is an automated message from the A

Re: [PR] Remove core/physical_optimizer (merge after #14298) [datafusion]

2025-01-26 Thread via GitHub
logan-keede commented on PR #14300: URL: https://github.com/apache/datafusion/pull/14300#issuecomment-2614275635 Closes https://github.com/apache/datafusion/issues/11502 Related to https://github.com/apache/datafusion/issues/13814 -- This is an automated message from the Apache Git Serv

Re: [PR] refactor aggregate [datafusion]

2025-01-26 Thread via GitHub
Rachelint commented on PR #14301: URL: https://github.com/apache/datafusion/pull/14301#issuecomment-2614315029 Main thing I am worried about is that this pr seems too large, it seems hard to ensure all exists testcases are moved rightly. Maybe we can push it forward more incrementally? I ha

Re: [PR] refactor aggregate [datafusion]

2025-01-26 Thread via GitHub
Rachelint commented on PR #14301: URL: https://github.com/apache/datafusion/pull/14301#issuecomment-2614315536 > Main thing I am worried about is that this pr seems too large, it seems hard to ensure all exists testcases are moved rightly. Maybe we can push it forward more incrementally? I

Re: [PR] Move All Physical Optimizer Tests to core/tests and Remove functions-aggregate Dependency [datafusion]

2025-01-26 Thread via GitHub
alamb commented on PR #14298: URL: https://github.com/apache/datafusion/pull/14298#issuecomment-2614337401 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Move All Physical Optimizer Tests to core/tests and Remove functions-aggregate Dependency [datafusion]

2025-01-26 Thread via GitHub
alamb merged PR #14298: URL: https://github.com/apache/datafusion/pull/14298 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-26 Thread via GitHub
shehabgamin commented on code in PR #14268: URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929718209 ## datafusion/sqllogictest/test_files/expr.slt: ## @@ -344,6 +344,16 @@ SELECT ascii('💯a') 128175 +query I +SELECT ascii('222') + +50 + +query I +

Re: [PR] Move All Physical Optimizer Tests to core/tests and Remove functions-aggregate Dependency [datafusion]

2025-01-26 Thread via GitHub
alamb commented on code in PR #14298: URL: https://github.com/apache/datafusion/pull/14298#discussion_r1929725541 ## datafusion-cli/Cargo.lock: ## @@ -1593,7 +1593,6 @@ dependencies = [ "datafusion-execution", "datafusion-expr", "datafusion-expr-common", - "datafusion-func

Re: [PR] Remove core/physical_optimizer (merge after #14298) [datafusion]

2025-01-26 Thread via GitHub
alamb commented on code in PR #14300: URL: https://github.com/apache/datafusion/pull/14300#discussion_r1929726700 ## datafusion-examples/Cargo.toml: ## @@ -66,6 +66,7 @@ datafusion-expr = { workspace = true } datafusion-functions-window-common = { workspace = true } datafusion

Re: [PR] Fix Float and Decimal coercion [datafusion]

2025-01-26 Thread via GitHub
alamb commented on code in PR #14273: URL: https://github.com/apache/datafusion/pull/14273#discussion_r1929724090 ## datafusion/sqllogictest/test_files/tpch/plans/q6.slt.part: ## @@ -31,13 +31,13 @@ logical_plan 01)Projection: sum(lineitem.l_extendedprice * lineitem.l_discount)

Re: [PR] Fix Float and Decimal coercion [datafusion]

2025-01-26 Thread via GitHub
alamb commented on code in PR #14273: URL: https://github.com/apache/datafusion/pull/14273#discussion_r1929723752 ## datafusion/core/tests/parquet/mod.rs: ## @@ -184,7 +184,13 @@ impl TestOutput { /// and the appropriate scenario impl ContextWithParquet { async fn new(sce

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-26 Thread via GitHub
shehabgamin commented on code in PR #14268: URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929723820 ## datafusion/expr/src/type_coercion/functions.rs: ## @@ -584,23 +544,7 @@ fn get_valid_types( match target_type_class {

Re: [PR] bench: add array_agg benchmark [datafusion]

2025-01-26 Thread via GitHub
alamb commented on code in PR #14302: URL: https://github.com/apache/datafusion/pull/14302#discussion_r1929725153 ## datafusion/functions-aggregate/benches/array_agg.rs: ## @@ -0,0 +1,186 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor l

Re: [I] User Defined Coercion Rules [datafusion]

2025-01-26 Thread via GitHub
shehabgamin commented on issue #14296: URL: https://github.com/apache/datafusion/issues/14296#issuecomment-2614337663 > I do want to point out that users can customize the entire UDF, by implementing their own version of `ascii` (including coercion rules via [`ScalarUDFImpl::coerce_types`](

Re: [I] Remove dependency on physical-optimizer on functions-aggregates [datafusion]

2025-01-26 Thread via GitHub
alamb closed issue #14243: Remove dependency on physical-optimizer on functions-aggregates URL: https://github.com/apache/datafusion/issues/14243 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-26 Thread via GitHub
shehabgamin commented on code in PR #14268: URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929718209 ## datafusion/sqllogictest/test_files/expr.slt: ## @@ -344,6 +344,16 @@ SELECT ascii('💯a') 128175 +query I +SELECT ascii('222') + +50 + +query I +

Re: [PR] Add more tests showing coercing behavior with literals [datafusion]

2025-01-26 Thread via GitHub
alamb commented on PR #14270: URL: https://github.com/apache/datafusion/pull/14270#issuecomment-2614340218 As this is entirely tests and demonstrates the actual behavior I am going to merge it as is -- I am happy to add more tests if we discover additional ones to add -- This is an autom

Re: [PR] Add more tests showing coercing behavior with literals [datafusion]

2025-01-26 Thread via GitHub
alamb merged PR #14270: URL: https://github.com/apache/datafusion/pull/14270 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add more tests showing coercing behavior with literals [datafusion]

2025-01-26 Thread via GitHub
alamb commented on PR #14270: URL: https://github.com/apache/datafusion/pull/14270#issuecomment-2614340265 Thanks again @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

2025-01-26 Thread via GitHub
alamb commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2614342453 FWIW we have now completed migrating all functions to User Defined Functions and I think there is growing interest in BTW I think there are many people interested in spark co

Re: [I] User Defined Coercion Rules [datafusion]

2025-01-26 Thread via GitHub
alamb commented on issue #14296: URL: https://github.com/apache/datafusion/issues/14296#issuecomment-2614342687 > @alamb Right, for UDFs that overlap between Spark and DataFusion, Sail already customizes several versions that differ only slightly from the DataFusion implementation, simply b

Re: [PR] Last Accumulator `update_batch` doesn't take the last value if the order by value are equals [datafusion]

2025-01-26 Thread via GitHub
korowa commented on code in PR #14232: URL: https://github.com/apache/datafusion/pull/14232#discussion_r1929797759 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -701,9 +713,98 @@ fn convert_to_sort_cols(arrs: &[ArrayRef], sort_exprs: &LexOrdering) -> Vec Result<()>

Re: [PR] Last Accumulator `update_batch` doesn't take the last value if the order by value are equals [datafusion]

2025-01-26 Thread via GitHub
korowa commented on code in PR #14232: URL: https://github.com/apache/datafusion/pull/14232#discussion_r1929797759 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -701,9 +713,98 @@ fn convert_to_sort_cols(arrs: &[ArrayRef], sort_exprs: &LexOrdering) -> Vec Result<()>

Re: [PR] start refactoring process by setting up base + init [datafusion]

2025-01-26 Thread via GitHub
logan-keede commented on code in PR #14306: URL: https://github.com/apache/datafusion/pull/14306#discussion_r1929840951 ## datafusion/sqllogictest/test_files/aggregate/complete_aggregate.slt: ## Review Comment: > It seems the tests will be executed twice, how about we just

Re: [PR] Minor: Update documentation about crate organization [datafusion]

2025-01-26 Thread via GitHub
comphead commented on code in PR #14304: URL: https://github.com/apache/datafusion/pull/14304#discussion_r1929840682 ## datafusion/core/src/lib.rs: ## @@ -624,19 +624,41 @@ //! //! ## Crate Organization //! -//! DataFusion is organized into multiple crates to enforce modulari

Re: [PR] Add relation to alias expr in schema display [datafusion]

2025-01-26 Thread via GitHub
phisn commented on PR #14311: URL: https://github.com/apache/datafusion/pull/14311#issuecomment-2614616485 Needs rework, see https://github.com/apache/datafusion/issues/14310#issuecomment-2614616137. -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] Last Accumulator `update_batch` doesn't take the last value if the order by value are equals [datafusion]

2025-01-26 Thread via GitHub
jayzhan211 commented on code in PR #14232: URL: https://github.com/apache/datafusion/pull/14232#discussion_r1929924628 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -701,9 +713,98 @@ fn convert_to_sort_cols(arrs: &[ArrayRef], sort_exprs: &LexOrdering) -> Vec Result

Re: [PR] Last Accumulator `update_batch` doesn't take the last value if the order by value are equals [datafusion]

2025-01-26 Thread via GitHub
jayzhan211 commented on code in PR #14232: URL: https://github.com/apache/datafusion/pull/14232#discussion_r1929924628 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -701,9 +713,98 @@ fn convert_to_sort_cols(arrs: &[ArrayRef], sort_exprs: &LexOrdering) -> Vec Result

Re: [PR] [not improved] Filter coalesce [datafusion]

2025-01-26 Thread via GitHub
github-actions[bot] closed pull request #13450: [not improved] Filter coalesce URL: https://github.com/apache/datafusion/pull/13450 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Support custom field metadata in UDF [datafusion]

2025-01-26 Thread via GitHub
github-actions[bot] commented on PR #13458: URL: https://github.com/apache/datafusion/pull/13458#issuecomment-2614719631 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Add hook for sharing join state in distributed execution [datafusion]

2025-01-26 Thread via GitHub
github-actions[bot] closed pull request #12523: Add hook for sharing join state in distributed execution URL: https://github.com/apache/datafusion/pull/12523 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[I] `UnwrapCastInComparison` produces incorrect results [datafusion]

2025-01-26 Thread via GitHub
jonahgao opened a new issue, #14303: URL: https://github.com/apache/datafusion/issues/14303 ### Describe the bug I found that `UnwrapCastInComparison` always assumes the cast operation can succeed, but when it cannot, it results in incorrect optimization results. ### To Reprodu

Re: [PR] Add support for mysql table hints [datafusion-sqlparser-rs]

2025-01-26 Thread via GitHub
AvivDavid-Satori commented on code in PR #1675: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1675#discussion_r1929695871 ## tests/sqlparser_mysql.rs: ## @@ -2898,6 +2900,21 @@ fn parse_lock_tables() { mysql().verified_stmt("UNLOCK TABLES"); } +#[test] +fn

Re: [PR] Support underscore separators in numbers for Clickhouse. Fixes #1659 [datafusion-sqlparser-rs]

2025-01-26 Thread via GitHub
iffyio commented on code in PR #1677: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1677#discussion_r1929721586 ## src/tokenizer.rs: ## @@ -1147,7 +1147,11 @@ impl<'a> Tokenizer<'a> { s.push('.'); chars.next();

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-26 Thread via GitHub
shehabgamin commented on code in PR #14268: URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929721662 ## datafusion/sqllogictest/test_files/expr.slt: ## @@ -571,8 +601,10 @@ select repeat('-1.2', arrow_cast(3, 'Int32')); -1.2-1.2-1.2 -query error DataF

Re: [PR] Support underscore separators in numbers for Clickhouse. Fixes #1659 [datafusion-sqlparser-rs]

2025-01-26 Thread via GitHub
iffyio commented on code in PR #1677: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1677#discussion_r1929721523 ## tests/sqlparser_clickhouse.rs: ## @@ -1646,6 +1646,16 @@ fn parse_table_sample() { clickhouse().verified_stmt("SELECT * FROM tbl SAMPLE 1 / 10 O

Re: [PR] Feature: Monotonic Sets [datafusion]

2025-01-26 Thread via GitHub
alamb commented on code in PR #14271: URL: https://github.com/apache/datafusion/pull/14271#discussion_r1929720287 ## datafusion/expr/src/udaf.rs: ## @@ -39,6 +39,26 @@ use crate::utils::AggregateOrderSensitivity; use crate::{Accumulator, Expr}; use crate::{Documentation, Signa

Re: [I] Move `ProjectionPushdown` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-26 Thread via GitHub
alamb commented on issue #14184: URL: https://github.com/apache/datafusion/issues/14184#issuecomment-2614429582 - Done In https://github.com/apache/datafusion/pull/14300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] [EPIC] Extract remaining physical optimizer out of core [datafusion]

2025-01-26 Thread via GitHub
alamb closed issue #11502: [EPIC] Extract remaining physical optimizer out of core URL: https://github.com/apache/datafusion/issues/11502 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Move `ProjectionPushdown` into `datafusion-physical-optimizer` crate [datafusion]

2025-01-26 Thread via GitHub
alamb closed issue #14184: Move `ProjectionPushdown` into `datafusion-physical-optimizer` crate URL: https://github.com/apache/datafusion/issues/14184 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] [EPIC] Extract remaining physical optimizer out of core [datafusion]

2025-01-26 Thread via GitHub
alamb commented on issue #11502: URL: https://github.com/apache/datafusion/issues/11502#issuecomment-2614429804 We did it! Thank you @buraksenn @berkaysynnada and @logan-keede for pushing it the last little bit -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] minor: add unit tests for monotonicity.rs [datafusion]

2025-01-26 Thread via GitHub
buraksenn commented on PR #14307: URL: https://github.com/apache/datafusion/pull/14307#issuecomment-2614433107 The tests pass on my local env but fail in CI not sure why. I'll try to find the root cause and fix it in a few hours -- This is an automated message from the Apache Git Service.

Re: [PR] Fix Float and Decimal coercion [datafusion]

2025-01-26 Thread via GitHub
ozankabak commented on PR #14273: URL: https://github.com/apache/datafusion/pull/14273#issuecomment-2614434455 We definitely shouldn't merge anything that might introduce regressions without going through the very reasonable process you suggested. It didn't seem to me this would cause regre

Re: [PR] Support specific `GroupsAccumulator` for `median` [datafusion]

2025-01-26 Thread via GitHub
Rachelint commented on code in PR #13681: URL: https://github.com/apache/datafusion/pull/13681#discussion_r1929782170 ## datafusion/functions-aggregate/src/median.rs: ## @@ -230,6 +276,201 @@ impl Accumulator for MedianAccumulator { } } +/// The median groups accumulato

Re: [PR] Support specific `GroupsAccumulator` for `median` [datafusion]

2025-01-26 Thread via GitHub
Rachelint commented on code in PR #13681: URL: https://github.com/apache/datafusion/pull/13681#discussion_r1929782170 ## datafusion/functions-aggregate/src/median.rs: ## @@ -230,6 +276,201 @@ impl Accumulator for MedianAccumulator { } } +/// The median groups accumulato

Re: [I] Build time regression [datafusion]

2025-01-26 Thread via GitHub
waynexia commented on issue #14256: URL: https://github.com/apache/datafusion/issues/14256#issuecomment-2614436290 After removing the `WildcardOptions` (by replacing it with an empty structure) I can see the build time drops. Removing the rule itself and the change in `core` doesn't help. I

[I] Ballista 44.0.0 Release [datafusion-ballista]

2025-01-26 Thread via GitHub
milenkovicm opened a new issue, #1172: URL: https://github.com/apache/datafusion-ballista/issues/1172 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Release ballista 44 with support for datafusion 44 to track datafusion relea

Re: [I] Jan 18, 2025: This week(s) in DataFusion [datafusion]

2025-01-26 Thread via GitHub
adriangb commented on issue #14179: URL: https://github.com/apache/datafusion/issues/14179#issuecomment-2614529181 > * [@nuno-faria](https://github.com/nuno-faria) extended filter pushdown to cover `PARTITION BY` window clauses [feat(optimizer): Enable filter pushdown on window functions #1

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-01-26 Thread via GitHub
adriangb commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2614524613 I've successfully made a PR to integrate this internally. It was pretty straightforward. We'll have to scrutinize a bit to see if we can tell if anything is missing (this is very er

Re: [PR] Fix regression in CASE expression [datafusion]

2025-01-26 Thread via GitHub
andygrove commented on PR #14283: URL: https://github.com/apache/datafusion/pull/14283#issuecomment-2614533302 @jayzhan211 Thanks, I will look into using `get_coerce_type_for_case_expression`. @aweltsch Adding type checks in `try_new` makes sense to me. Tomorrow, I will try upd

Re: [I] SchemaDisplay for Alias Expr does currently not take TableReference in account [datafusion]

2025-01-26 Thread via GitHub
edmondop commented on issue #14310: URL: https://github.com/apache/datafusion/issues/14310#issuecomment-2614533444 Happy to work on this one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] SchemaDisplay for Alias Expr does currently not take TableReference in account [datafusion]

2025-01-26 Thread via GitHub
edmondop commented on issue #14310: URL: https://github.com/apache/datafusion/issues/14310#issuecomment-2614533566 /take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Fix regression in CASE expression [datafusion]

2025-01-26 Thread via GitHub
andygrove commented on PR #14283: URL: https://github.com/apache/datafusion/pull/14283#issuecomment-2614533646 Ideally, invoking a physical optimization rule to apply coercion as needed would be nice. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Minor: Update documentation about crate organization [datafusion]

2025-01-26 Thread via GitHub
alamb commented on PR #14304: URL: https://github.com/apache/datafusion/pull/14304#issuecomment-2614599624 Thanks for the review and fix @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] perf(array-agg): add fast path for array agg for `merge_batch` [datafusion]

2025-01-26 Thread via GitHub
alamb commented on PR #14299: URL: https://github.com/apache/datafusion/pull/14299#issuecomment-2614600265 I ran the newly added `array_agg` benchmark and got some pretty serious performance improvements: ``` group

Re: [I] Implement xxhash algorithms as part of the expression API [datafusion]

2025-01-26 Thread via GitHub
HectorPascual commented on issue #14044: URL: https://github.com/apache/datafusion/issues/14044#issuecomment-2614607006 > [@HectorPascual](https://github.com/HectorPascual) is this the output that you expect? > > ``` > > SELECT > xxhash32(column1) AS xxhash32_result > FR

Re: [PR] perf(array-agg): add fast path for array agg for `merge_batch` [datafusion]

2025-01-26 Thread via GitHub
alamb commented on PR #14299: URL: https://github.com/apache/datafusion/pull/14299#issuecomment-2614619310 Here is my entire benchmark run ``` ++ critcmp main improve-performance-for-array-agg-merge-batch impr

Re: [I] SchemaDisplay for Alias Expr does currently not take TableReference in account [datafusion]

2025-01-26 Thread via GitHub
phisn commented on issue #14310: URL: https://github.com/apache/datafusion/issues/14310#issuecomment-2614616137 Note: I tried to just adjust the SchemaDisplay but that doesn't work because it is also used in [`create_project_physical_exec`](https://github.com/apache/datafusion/blob/7c079483

Re: [PR] perf(array-agg): add fast path for array agg for `merge_batch` [datafusion]

2025-01-26 Thread via GitHub
alamb commented on PR #14299: URL: https://github.com/apache/datafusion/pull/14299#issuecomment-2614619333 Nice work @rluvaton -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Feat/ffi enter tokio runtime [datafusion]

2025-01-26 Thread via GitHub
timsaucer commented on code in PR #13937: URL: https://github.com/apache/datafusion/pull/13937#discussion_r1929889441 ## .github/workflows/rust.yml: ## @@ -417,10 +417,10 @@ jobs: - name: Run tests (excluding doctests) shell: bash run: | - cargo

Re: [PR] Last Accumulator `update_batch` doesn't take the last value if the order by value are equals [datafusion]

2025-01-26 Thread via GitHub
jayzhan211 commented on code in PR #14232: URL: https://github.com/apache/datafusion/pull/14232#discussion_r1929917707 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -569,6 +573,13 @@ impl LastValueAccumulator { }) .collect::>(); +

Re: [PR] Last Accumulator `update_batch` doesn't take the last value if the order by value are equals [datafusion]

2025-01-26 Thread via GitHub
jayzhan211 commented on code in PR #14232: URL: https://github.com/apache/datafusion/pull/14232#discussion_r1929917707 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -569,6 +573,13 @@ impl LastValueAccumulator { }) .collect::>(); +

Re: [PR] Last Accumulator `update_batch` doesn't take the last value if the order by value are equals [datafusion]

2025-01-26 Thread via GitHub
jayzhan211 commented on code in PR #14232: URL: https://github.com/apache/datafusion/pull/14232#discussion_r1929914465 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -569,6 +573,13 @@ impl LastValueAccumulator { }) .collect::>(); +

Re: [PR] Last Accumulator `update_batch` doesn't take the last value if the order by value are equals [datafusion]

2025-01-26 Thread via GitHub
jayzhan211 closed pull request #14232: Last Accumulator `update_batch` doesn't take the last value if the order by value are equals URL: https://github.com/apache/datafusion/pull/14232 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] Implement xxhash algorithms as part of the expression API [datafusion]

2025-01-26 Thread via GitHub
Spaarsh commented on issue #14044: URL: https://github.com/apache/datafusion/issues/14044#issuecomment-2614584812 @HectorPascual is this the output that you expect? ```xxhash32 > SELECT xxhash32(column1) AS xxhash32_result FROM ( SELECT UNNEST(ARRAY[1, 2, 3, 4, 5])

[PR] fix: LogicalPlan::get_parameter_types fails to return all placeholders [datafusion]

2025-01-26 Thread via GitHub
dhegberg opened a new pull request, #14312: URL: https://github.com/apache/datafusion/pull/14312 ## Which issue does this PR close? Closes #13678. ## Rationale for this change Placeholders should be returned when calling `LogicalPlan::get_parameter_types()`

Re: [PR] bug: Fix NULL handling in array_slice [datafusion]

2025-01-26 Thread via GitHub
jkosh44 commented on code in PR #14289: URL: https://github.com/apache/datafusion/pull/14289#discussion_r1929903548 ## datafusion/functions-nested/src/extract.rs: ## @@ -330,7 +330,8 @@ pub(super) struct ArraySlice { impl ArraySlice { pub fn new() -> Self { Self {

Re: [PR] add tests to check precision loss fix [datafusion]

2025-01-26 Thread via GitHub
alamb commented on code in PR #14284: URL: https://github.com/apache/datafusion/pull/14284#discussion_r1929762107 ## datafusion/physical-expr/src/expressions/cast.rs: ## @@ -399,6 +399,50 @@ mod tests { Ok(()) } +#[test] +fn test_cast_decimal_to_decimal_o

Re: [PR] bench: add array_agg benchmark [datafusion]

2025-01-26 Thread via GitHub
rluvaton commented on PR #14302: URL: https://github.com/apache/datafusion/pull/14302#issuecomment-2614392671 @alamb can you please merge this and run the benchmark on my performance improvement PR? -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] chore(deps): update sqlparser requirement from 0.53.0 to 0.54.0 [datafusion]

2025-01-26 Thread via GitHub
alamb commented on PR #14269: URL: https://github.com/apache/datafusion/pull/14269#issuecomment-2614400106 - Superceded by https://github.com/apache/datafusion/pull/14255 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] chore(deps): update sqlparser requirement from 0.53.0 to 0.54.0 [datafusion]

2025-01-26 Thread via GitHub
alamb closed pull request #14269: chore(deps): update sqlparser requirement from 0.53.0 to 0.54.0 URL: https://github.com/apache/datafusion/pull/14269 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] chore(deps): update sqlparser requirement from 0.53.0 to 0.54.0 [datafusion]

2025-01-26 Thread via GitHub
dependabot[bot] commented on PR #14269: URL: https://github.com/apache/datafusion/pull/14269#issuecomment-2614400126 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version

Re: [PR] Sort out testcases in `aggregate.slt` [datafusion]

2025-01-26 Thread via GitHub
Rachelint commented on PR #14301: URL: https://github.com/apache/datafusion/pull/14301#issuecomment-2614410525 > > I think an alternative to this can be to divide this PR into increments and keep a aggregate_supplement.slt file and initially it will be same as aggregate.slt but as w

Re: [PR] Sort out testcases in `aggregate.slt` [datafusion]

2025-01-26 Thread via GitHub
logan-keede commented on PR #14301: URL: https://github.com/apache/datafusion/pull/14301#issuecomment-2614410968 Thanks for the review. > It seems workable too, but actullay a bit inconvenient to check if large changes in one pr? I meant to divide it into multiple PRs not commi

Re: [PR] Sort out testcases in `aggregate.slt` [datafusion]

2025-01-26 Thread via GitHub
logan-keede closed pull request #14301: Sort out testcases in `aggregate.slt` URL: https://github.com/apache/datafusion/pull/14301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Sort out testcases in `aggregate.slt` [datafusion]

2025-01-26 Thread via GitHub
Rachelint commented on PR #14301: URL: https://github.com/apache/datafusion/pull/14301#issuecomment-2614412807 > Thanks for the review. > > > It seems workable too, but actullay a bit inconvenient to check if large changes in one pr? > > I meant to divide it into multiple PRs n

Re: [PR] bench: add array_agg benchmark [datafusion]

2025-01-26 Thread via GitHub
alamb commented on PR #14302: URL: https://github.com/apache/datafusion/pull/14302#issuecomment-2614419246 Done @rluvaton -- thanks! BTW this is the script I use for benchmarking: https://github.com/alamb/datafusion-benchmarking/blob/main/compare_branch.sh I'll run it on your

Re: [I] Expand Test Coverage for ScalarUDF's [datafusion]

2025-01-26 Thread via GitHub
buraksenn commented on issue #10595: URL: https://github.com/apache/datafusion/issues/10595#issuecomment-2614427639 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Complete moving PhysicalOptimizer into `datafusion-physical-optimizer` [datafusion]

2025-01-26 Thread via GitHub
alamb commented on code in PR #14300: URL: https://github.com/apache/datafusion/pull/14300#discussion_r1929778965 ## datafusion-examples/Cargo.toml: ## @@ -66,6 +66,7 @@ datafusion-expr = { workspace = true } datafusion-functions-window-common = { workspace = true } datafusion

Re: [PR] Complete moving PhysicalOptimizer into `datafusion-physical-optimizer` [datafusion]

2025-01-26 Thread via GitHub
alamb merged PR #14300: URL: https://github.com/apache/datafusion/pull/14300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Complete moving PhysicalOptimizer into `datafusion-physical-optimizer` [datafusion]

2025-01-26 Thread via GitHub
alamb commented on PR #14300: URL: https://github.com/apache/datafusion/pull/14300#issuecomment-2614429347 Let's do this. Thanks again @berkaysynnada -- very nice -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-01-26 Thread via GitHub
adriangb commented on code in PR #14286: URL: https://github.com/apache/datafusion/pull/14286#discussion_r1929824035 ## datafusion-examples/examples/thread_pools_lib/dedicated_executor.rs: ## @@ -0,0 +1,1778 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// o

Re: [I] Unnest struct expression can't be aliased [datafusion]

2025-01-26 Thread via GitHub
duongcongtoai commented on issue #12794: URL: https://github.com/apache/datafusion/issues/12794#issuecomment-2614783715 If the alias does not do anything, it may indicate that the user is misusing the `UNNEST` operator. In such cases, is notifying the user with an appropriate error message

Re: [I] Error aliasing on double unnest on List[Struct] [datafusion]

2025-01-26 Thread via GitHub
duongcongtoai commented on issue #12162: URL: https://github.com/apache/datafusion/issues/12162#issuecomment-2614787632 i think this can be similar to an operation of aliasing an unnest on struct column, and this operation does not make much sense. It's clearer to see if in the example, t

[PR] Expose sqllogictest Error && `convert_schema_to_types` [datafusion]

2025-01-26 Thread via GitHub
xudong963 opened a new pull request, #14313: URL: https://github.com/apache/datafusion/pull/14313 ## Which issue does this PR close? Follow up #14233, hopefully this will be the last PR of the series. ## Rationale for this change ## What changes are includ

  1   2   >