Re: [I] Add `DataFrame::map` utility .map function for DataFrame for modifying internal LogicalPlan [datafusion]

2025-01-28 Thread via GitHub
Omega359 commented on issue #14317: URL: https://github.com/apache/datafusion/issues/14317#issuecomment-2619180799 I personally haven't had the need to go into a LogicalPlan from a dataframe and back again but I could see it being useful. -- This is an automated message from the A

Re: [I] Variant on `AnalysisContext` to represent empty-set [datafusion]

2025-01-28 Thread via GitHub
berkaysynnada closed issue #14226: Variant on `AnalysisContext` to represent empty-set URL: https://github.com/apache/datafusion/issues/14226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] support for `array_repeat` array function [datafusion-comet]

2025-01-28 Thread via GitHub
jatin510 commented on PR #1205: URL: https://github.com/apache/datafusion-comet/pull/1205#issuecomment-2619369218 > @jatin510 builds are failing: > > ``` > error[E0308]: mismatched types >--> core/src/execution/planner.rs:787:48 > | > 787 | matc

Re: [PR] fix: Fall back to Spark when hashing decimals with precision > 18 [datafusion-comet]

2025-01-28 Thread via GitHub
andygrove commented on PR #1325: URL: https://github.com/apache/datafusion-comet/pull/1325#issuecomment-2619399645 @parthchandra @wForget fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: FULL OUTER JOIN and LIMIT produces wrong results [datafusion]

2025-01-28 Thread via GitHub
xudong963 commented on code in PR #14338: URL: https://github.com/apache/datafusion/pull/14338#discussion_r1932426376 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -4274,16 +4270,155 @@ EXPLAIN SELECT * FROM t0 FULL JOIN t1 ON t0.c1 = t1.c1 AND t0.c2 >= t1.c2 LIMIT lo

Re: [PR] fix: LimitPushdown rule uncorrect remove some GlobalLimitExec [datafusion]

2025-01-28 Thread via GitHub
xudong963 commented on code in PR #14245: URL: https://github.com/apache/datafusion/pull/14245#discussion_r1932431727 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -5182,3 +5177,32 @@ async fn register_non_parquet_file() { "1.json' does not match the expected extensi

[PR] refactor: switch `BooleanBufferBuilder` to `NullBufferBuilder` in binary_map [datafusion]

2025-01-28 Thread via GitHub
Chen-Yuan-Lai opened a new pull request, #14341: URL: https://github.com/apache/datafusion/pull/14341 ## Which issue does this PR close? Closes #14115 . ## Rationale for this change As mentioned in #14115 , several examples in DataFusion codebase still using

Re: [PR] Feat: support array_except function [datafusion-comet]

2025-01-28 Thread via GitHub
kazantsev-maksim commented on code in PR #1343: URL: https://github.com/apache/datafusion-comet/pull/1343#discussion_r1932434647 ## native/core/src/execution/planner.rs: ## @@ -818,6 +819,22 @@ impl PhysicalPlanner { )); Ok(array_join_expr)

Re: [PR] fix: LimitPushdown rule uncorrect remove some GlobalLimitExec [datafusion]

2025-01-28 Thread via GitHub
xudong963 commented on code in PR #14245: URL: https://github.com/apache/datafusion/pull/14245#discussion_r1932436336 ## datafusion/physical-optimizer/src/limit_pushdown.rs: ## @@ -247,7 +246,15 @@ pub fn pushdown_limit_helper( } } else { // Ad

Re: [PR] Support underscore separators in numbers for Clickhouse. Fixes #1659 [datafusion-sqlparser-rs]

2025-01-28 Thread via GitHub
iffyio merged PR #1677: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1677 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[PR] Make numeric literal underscore test dialect agnostic [datafusion-sqlparser-rs]

2025-01-28 Thread via GitHub
iffyio opened a new pull request, #1685: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1685 Minor follow up from #1677 Support was added for both Clickhouse and Postgres but the test only covered Clickhouse. This move the test to common and to use the `all_dialects_where` f

Re: [PR] fix: LimitPushdown rule uncorrect remove some GlobalLimitExec [datafusion]

2025-01-28 Thread via GitHub
zhuqi-lucas commented on code in PR #14245: URL: https://github.com/apache/datafusion/pull/14245#discussion_r1932211238 ## datafusion/physical-optimizer/src/limit_pushdown.rs: ## @@ -247,7 +246,14 @@ pub fn pushdown_limit_helper( } } else { //

Re: [I] Suggestion: Move to SBT for faster development cycle? [datafusion-comet]

2025-01-28 Thread via GitHub
andygrove commented on issue #1344: URL: https://github.com/apache/datafusion-comet/issues/1344#issuecomment-2619316628 Thanks for raising this, @EmilyFlarionIO. The build times can be frustrating. If we were to start by adopting step 1, I assume we would not need to change any of the Mave

Re: [I] Release Comet 0.5.0 (Jan 2025) [datafusion-comet]

2025-01-28 Thread via GitHub
andygrove commented on issue #1233: URL: https://github.com/apache/datafusion-comet/issues/1233#issuecomment-2619321231 This was completed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Investigate implications of arrow-rs dropping support for tracking dictionary ids in schema [datafusion-comet]

2025-01-28 Thread via GitHub
andygrove closed issue #1333: Investigate implications of arrow-rs dropping support for tracking dictionary ids in schema URL: https://github.com/apache/datafusion-comet/issues/1333 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Investigate implications of arrow-rs dropping support for tracking dictionary ids in schema [datafusion-comet]

2025-01-28 Thread via GitHub
andygrove commented on issue #1333: URL: https://github.com/apache/datafusion-comet/issues/1333#issuecomment-2619318857 Closing this since there seems to be no impact -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] Release Comet 0.5.0 (Jan 2025) [datafusion-comet]

2025-01-28 Thread via GitHub
andygrove closed issue #1233: Release Comet 0.5.0 (Jan 2025) URL: https://github.com/apache/datafusion-comet/issues/1233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] fix: Fall back to Spark when hashing decimals with precision > 18 [datafusion-comet]

2025-01-28 Thread via GitHub
andygrove commented on PR #1325: URL: https://github.com/apache/datafusion-comet/pull/1325#issuecomment-2619596347 > Does this implicitly affect any data read that originated as `uint64`? I believe it gets converted to `DECIMAL(20,0)`. Yes, in the context of a user calling the `hash`

Re: [PR] Make TypedString contain Value instead of String to support and preserve other quote styles [datafusion-sqlparser-rs]

2025-01-28 Thread via GitHub
iffyio commented on code in PR #1679: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1679#discussion_r1932557587 ## src/ast/value.rs: ## @@ -97,6 +97,32 @@ pub enum Value { Placeholder(String), } +impl Into for Value { +fn into(self) -> String { +

Re: [PR] Customize window frame support for dialect [datafusion]

2025-01-28 Thread via GitHub
comphead merged PR #14288: URL: https://github.com/apache/datafusion/pull/14288 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Make TypedString contain Value instead of String to support and preserve other quote styles [datafusion-sqlparser-rs]

2025-01-28 Thread via GitHub
graup commented on code in PR #1679: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1679#discussion_r1932661673 ## tests/sqlparser_bigquery.rs: ## @@ -2214,6 +2214,30 @@ fn test_select_as_value() { assert_eq!(Some(ValueTableMode::AsValue), select.value_table_m

Re: [PR] Make TypedString contain Value instead of String to support and preserve other quote styles [datafusion-sqlparser-rs]

2025-01-28 Thread via GitHub
graup commented on code in PR #1679: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1679#discussion_r1932661673 ## tests/sqlparser_bigquery.rs: ## @@ -2214,6 +2214,30 @@ fn test_select_as_value() { assert_eq!(Some(ValueTableMode::AsValue), select.value_table_m

Re: [PR] feat: remove DataFusion pyarrow feat [datafusion-python]

2025-01-28 Thread via GitHub
kylebarron commented on code in PR #1000: URL: https://github.com/apache/datafusion-python/pull/1000#discussion_r1932699499 ## src/config.rs: ## @@ -40,7 +42,7 @@ impl PyConfig { #[staticmethod] pub fn from_env() -> PyResult { Review Comment: Instead of always cal

Re: [PR] chore: Prepare for DataFusion 45 (bump to DataFusion rev 5592834 + Arrow 54.0.0) [datafusion-comet]

2025-01-28 Thread via GitHub
andygrove commented on PR #1332: URL: https://github.com/apache/datafusion-comet/pull/1332#issuecomment-2619823980 > I think the PR is good in general but what concerns me is really lots of code added just to do the migration. I'm wondering was there breaking changes in DF or Arrow, as loo

Re: [PR] Make TypedString contain Value instead of String to support and preserve other quote styles [datafusion-sqlparser-rs]

2025-01-28 Thread via GitHub
graup commented on code in PR #1679: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1679#discussion_r1932661673 ## tests/sqlparser_bigquery.rs: ## @@ -2214,6 +2214,30 @@ fn test_select_as_value() { assert_eq!(Some(ValueTableMode::AsValue), select.value_table_m

Re: [PR] Improve speed of `median` by implementing special `GroupsAccumulator` [datafusion]

2025-01-28 Thread via GitHub
korowa commented on code in PR #13681: URL: https://github.com/apache/datafusion/pull/13681#discussion_r1932677883 ## datafusion/functions-aggregate/src/median.rs: ## @@ -230,6 +276,212 @@ impl Accumulator for MedianAccumulator { } } +/// The median groups accumulator a

Re: [PR] Improve speed of `median` by implementing special `GroupsAccumulator` [datafusion]

2025-01-28 Thread via GitHub
korowa commented on code in PR #13681: URL: https://github.com/apache/datafusion/pull/13681#discussion_r1932677883 ## datafusion/functions-aggregate/src/median.rs: ## @@ -230,6 +276,212 @@ impl Accumulator for MedianAccumulator { } } +/// The median groups accumulator a

Re: [PR] Make TypedString contain Value instead of String to support and preserve other quote styles [datafusion-sqlparser-rs]

2025-01-28 Thread via GitHub
graup commented on code in PR #1679: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1679#discussion_r1932685294 ## src/ast/value.rs: ## @@ -97,6 +97,32 @@ pub enum Value { Placeholder(String), } +impl Into for Value { +fn into(self) -> String { +

Re: [PR] Make TypedString contain Value instead of String to support and preserve other quote styles [datafusion-sqlparser-rs]

2025-01-28 Thread via GitHub
graup commented on code in PR #1679: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1679#discussion_r1932686299 ## tests/sqlparser_bigquery.rs: ## @@ -39,7 +39,7 @@ fn parse_literal_string() { r#"'''triple-single'unescaped''', "#, r#""double\"esca

[I] Prepared physical plan reusage [datafusion]

2025-01-28 Thread via GitHub
askalt opened a new issue, #14342: URL: https://github.com/apache/datafusion/issues/14342 ## Problem ## While implementing the saving of prepared statements in our storage based on the DataFusion, we encountered the following issue: It is inefficient to save the logical plan and r

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-01-28 Thread via GitHub
JanKaul commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2619873876 For your information, I was able to reproduce the IO stall error with a very [simple example](https://github.com/JanKaul/cpu-io-executor/blob/main/src/single_runtime.rs). In case an

Re: [I] Prepared physical plan reusage [datafusion]

2025-01-28 Thread via GitHub
askalt commented on issue #14342: URL: https://github.com/apache/datafusion/issues/14342#issuecomment-2619881519 About physical placeholders challenges. For example, postgresql uses the following heuristic to choose "use generic plan" or "rebuild plan with inlined parameters when they becom

Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-28 Thread via GitHub
gatesn commented on PR #14074: URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2619891822 Any other blockers @alamb ? Thanks for hustling this through -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Add `ColumnStatistics::Sum` [datafusion]

2025-01-28 Thread via GitHub
ozankabak commented on PR #14074: URL: https://github.com/apache/datafusion/pull/14074#issuecomment-2619923329 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] BigQuery: Fix column identifier reserved keywords list [datafusion-sqlparser-rs]

2025-01-28 Thread via GitHub
iffyio commented on code in PR #1678: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1678#discussion_r1931794118 ## src/dialect/mod.rs: ## @@ -821,11 +821,24 @@ pub trait Dialect: Debug + Any { false } +/// Returns reserved keywords when looking

Re: [PR] Add related source code locations to errors [datafusion]

2025-01-28 Thread via GitHub
eliaperantoni commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1931798459 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -68,11 +70,57 @@ impl Signature { } } -/// Returns a [`Signature`] for applying `op` to

Re: [PR] Add related source code locations to errors [datafusion]

2025-01-28 Thread via GitHub
eliaperantoni commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1931797862 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -68,11 +70,57 @@ impl Signature { } } -/// Returns a [`Signature`] for applying `op` to

Re: [PR] BigQuery: Add support for select expr star [datafusion-sqlparser-rs]

2025-01-28 Thread via GitHub
iffyio merged PR #1680: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1680 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] try to add manual trigger for extended tests in PRs [datafusion]

2025-01-28 Thread via GitHub
Omega359 commented on PR #14331: URL: https://github.com/apache/datafusion/pull/14331#issuecomment-2618612893 I took a look at the updates, I think they should work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Improve speed of `median` by implementing special `GroupsAccumulator` [datafusion]

2025-01-28 Thread via GitHub
alamb commented on code in PR #13681: URL: https://github.com/apache/datafusion/pull/13681#discussion_r1931936141 ## datafusion/functions-aggregate/src/median.rs: ## @@ -230,6 +276,201 @@ impl Accumulator for MedianAccumulator { } } +/// The median groups accumulator ac

Re: [PR] fix: pass scale to DF round in spark_round [datafusion-comet]

2025-01-28 Thread via GitHub
cht42 commented on code in PR #1341: URL: https://github.com/apache/datafusion-comet/pull/1341#discussion_r1931944408 ## native/spark-expr/src/math_funcs/round.rs: ## @@ -85,9 +85,10 @@ pub fn spark_round( let (precision, scale) = get_precision_scale(data_type);

Re: [I] Type Coercion fails for List with inner type struct which has large/view types [datafusion]

2025-01-28 Thread via GitHub
ion-elgreco commented on issue #14154: URL: https://github.com/apache/datafusion/issues/14154#issuecomment-2618645919 @alamb yes this indeed worked fine on DF43. I tested it against deltalake v0.23.3 -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] fix: pass scale to DF round in spark_round [datafusion-comet]

2025-01-28 Thread via GitHub
cht42 commented on code in PR #1341: URL: https://github.com/apache/datafusion-comet/pull/1341#discussion_r1931942453 ## native/spark-expr/src/math_funcs/round.rs: ## @@ -85,9 +85,10 @@ pub fn spark_round( let (precision, scale) = get_precision_scale(data_type);

Re: [PR] Add support for mysql table hints [datafusion-sqlparser-rs]

2025-01-28 Thread via GitHub
iffyio merged PR #1675: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1675 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Add related source code locations to errors [datafusion]

2025-01-28 Thread via GitHub
eliaperantoni commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1931890521 ## datafusion/common/src/column.rs: ## @@ -18,10 +18,11 @@ //! Column use arrow_schema::{Field, FieldRef}; +use sqlparser::tokenizer::Span; Review Comme

Re: [PR] Feature: Monotonic Sets [datafusion]

2025-01-28 Thread via GitHub
berkaysynnada commented on code in PR #14271: URL: https://github.com/apache/datafusion/pull/14271#discussion_r1932018364 ## datafusion/expr/src/udaf.rs: ## @@ -39,6 +39,26 @@ use crate::utils::AggregateOrderSensitivity; use crate::{Accumulator, Expr}; use crate::{Documentatio

Re: [PR] Feature: Monotonic Sets [datafusion]

2025-01-28 Thread via GitHub
berkaysynnada commented on code in PR #14271: URL: https://github.com/apache/datafusion/pull/14271#discussion_r1932023820 ## datafusion/expr/src/udaf.rs: ## @@ -635,6 +655,14 @@ pub trait AggregateUDFImpl: Debug + Send + Sync { fn documentation(&self) -> Option<&Documentati

Re: [PR] feat: remove DataFusion pyarrow feat [datafusion-python]

2025-01-28 Thread via GitHub
matko commented on code in PR #1000: URL: https://github.com/apache/datafusion-python/pull/1000#discussion_r1932009641 ## src/pyarrow_util.rs: ## Review Comment: it would be good to have the conversion functions in this file also available directly on a `ScalarValue`, whi

Re: [PR] Feature: Monotonic Sets [datafusion]

2025-01-28 Thread via GitHub
berkaysynnada commented on code in PR #14271: URL: https://github.com/apache/datafusion/pull/14271#discussion_r1932037809 ## datafusion/expr/src/udaf.rs: ## @@ -635,6 +655,14 @@ pub trait AggregateUDFImpl: Debug + Send + Sync { fn documentation(&self) -> Option<&Documentati

Re: [PR] Feature: Monotonic Sets [datafusion]

2025-01-28 Thread via GitHub
berkaysynnada commented on code in PR #14271: URL: https://github.com/apache/datafusion/pull/14271#discussion_r1932023820 ## datafusion/expr/src/udaf.rs: ## @@ -635,6 +655,14 @@ pub trait AggregateUDFImpl: Debug + Send + Sync { fn documentation(&self) -> Option<&Documentati

Re: [I] support: Date +/plus Int or date_add function [datafusion]

2025-01-28 Thread via GitHub
DanCodedThis commented on issue #6876: URL: https://github.com/apache/datafusion/issues/6876#issuecomment-2618807796 Hello, I have implemented `date_add` (also `date_diff`) akin to Snowflake spec. My company would like me to contribute to Datafusion (if it's needed). Do I need to open a dif

Re: [PR] fix: fetch is missed during EnforceDistribution [datafusion]

2025-01-28 Thread via GitHub
alamb commented on PR #14207: URL: https://github.com/apache/datafusion/pull/14207#issuecomment-2620061423 I plan to review this PR later today or tomorrow as it is on my "45 blockers" list Thank you for your patience @xudong963 -- This is an automated message from the Apache Git

Re: [PR] try to add manual trigger for extended tests in PRs [datafusion]

2025-01-28 Thread via GitHub
buraksenn commented on PR #14331: URL: https://github.com/apache/datafusion/pull/14331#issuecomment-2620086827 > Thanks @buraksenn > > When testing such PRs I normally put them on the `main` branch in my own fork and then try it out there Thanks @alamb. As you've said I've star

Re: [PR] perf(array-agg): add fast path for array agg for `merge_batch` [datafusion]

2025-01-28 Thread via GitHub
rluvaton commented on code in PR #14299: URL: https://github.com/apache/datafusion/pull/14299#discussion_r1932880307 ## datafusion/functions-aggregate/src/array_agg.rs: ## @@ -177,6 +177,67 @@ impl ArrayAggAccumulator { datatype: datatype.clone(), }) }

Re: [PR] Add regexp_extract func [datafusion]

2025-01-28 Thread via GitHub
rluvaton commented on PR #14282: URL: https://github.com/apache/datafusion/pull/14282#issuecomment-2620107324 FYI Spark regex is not the same as Rust regex and can have different results -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub
alamb commented on code in PR #14268: URL: https://github.com/apache/datafusion/pull/14268#discussion_r1932866009 ## datafusion/functions/src/string/bit_length.rs: ## @@ -55,7 +58,10 @@ impl Default for BitLengthFunc { impl BitLengthFunc { pub fn new() -> Self { S

Re: [PR] bug: Fix NULL handling in array_slice [datafusion]

2025-01-28 Thread via GitHub
rluvaton commented on PR #14289: URL: https://github.com/apache/datafusion/pull/14289#issuecomment-2620098993 I don't like the name strict as it can mean different things (like fail on parsing invalid strings), I think it should be an enum on null handling -- This is an automated message

Re: [PR] Add regexp_extract func [datafusion]

2025-01-28 Thread via GitHub
rluvaton commented on code in PR #14282: URL: https://github.com/apache/datafusion/pull/14282#discussion_r1932904240 ## datafusion/functions/src/regex/regexpextract.rs: ## @@ -0,0 +1,289 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Add regexp_extract func [datafusion]

2025-01-28 Thread via GitHub
rluvaton commented on code in PR #14282: URL: https://github.com/apache/datafusion/pull/14282#discussion_r1932901931 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -1919,6 +1920,31 @@ SELECT regexp_replace('aBc', '(b|d)', 'Ab\\1a', 'i'); Additional examples can be f

Re: [PR] Fix build "missing field `sum_value` in initializer of `ColumnStatistics`" [datafusion]

2025-01-28 Thread via GitHub
Omega359 commented on PR #14345: URL: https://github.com/apache/datafusion/pull/14345#issuecomment-2620125142 I actually saw this in the extended tests ... I am watching those :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] bug: Fix NULL handling in array_slice [datafusion]

2025-01-28 Thread via GitHub
jkosh44 commented on PR #14289: URL: https://github.com/apache/datafusion/pull/14289#issuecomment-2620139202 > I don't like the name strict as it can mean different things (like fail on parsing invalid strings), I think it should be an enum on null handling How about something like

Re: [PR] Add regexp_extract func [datafusion]

2025-01-28 Thread via GitHub
rluvaton commented on code in PR #14282: URL: https://github.com/apache/datafusion/pull/14282#discussion_r1932896635 ## datafusion/functions/src/regex/regexpextract.rs: ## @@ -0,0 +1,289 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Add related source code locations to errors [datafusion]

2025-01-28 Thread via GitHub
eliaperantoni commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1932058436 ## datafusion/common/src/diagnostic.rs: ## @@ -0,0 +1,112 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [I] `FULL OUTER JOIN` and `LIMIT` produces wrong results [datafusion]

2025-01-28 Thread via GitHub
alamb commented on issue #14335: URL: https://github.com/apache/datafusion/issues/14335#issuecomment-2618837699 I wrote some tests for this here: - https://github.com/apache/datafusion/pull/14336 -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [I] `FULL OUTER JOIN` and `LIMIT` produces wrong results [datafusion]

2025-01-28 Thread via GitHub
alamb commented on issue #14335: URL: https://github.com/apache/datafusion/issues/14335#issuecomment-2618831985 I think @zhuqi-lucas found the issue. This line is almost certainly wrong: https://github.com/apache/datafusion/blob/e3db3592a846cb4d5ce175b624c1aecc70441981/datafusion/op

Re: [I] Equivalence class projection does not find new equivalent classes correctly [datafusion]

2025-01-28 Thread via GitHub
alamb commented on issue #14326: URL: https://github.com/apache/datafusion/issues/14326#issuecomment-2618849932 Thanks @askalt -- we'll check it out FYI @ozankabak and @berkaysynnada -- This is an automated message from the Apache Git Service. To respond to the message, please log

[I] Implement `fetch` limit for MemoryExec [datafusion]

2025-01-28 Thread via GitHub
alamb opened a new issue, #14337: URL: https://github.com/apache/datafusion/issues/14337 ### Is your feature request related to a problem or challenge? We rely on [`MemoryExec`](https://github.com/alamb/datafusion/blob/f77579108d1dc0285636fbfb24507d2bfca66446/datafusion/physical-plan/

Re: [I] `FULL OUTER JOIN` and `LIMIT` produces wrong results [datafusion]

2025-01-28 Thread via GitHub
alamb commented on issue #14335: URL: https://github.com/apache/datafusion/issues/14335#issuecomment-2618877800 I also filed this ticket to track making it easier to test for this kind of thing - https://github.com/apache/datafusion/issues/14337 -- This is an automated message from the

Re: [PR] Add related source code locations to errors [datafusion]

2025-01-28 Thread via GitHub
eliaperantoni commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1932086369 ## datafusion/sqllogictest/test_files/create_external_table.slt: ## @@ -33,23 +33,23 @@ statement error DataFusion error: SQL error: ParserError\("Missing L

Re: [PR] Add related source code locations to errors [datafusion]

2025-01-28 Thread via GitHub
eliaperantoni commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1932071772 ## datafusion/common/src/diagnostic.rs: ## @@ -0,0 +1,112 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] Add related source code locations to errors [datafusion]

2025-01-28 Thread via GitHub
eliaperantoni commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1932071276 ## datafusion/sqllogictest/test_files/unnest.slt: ## @@ -899,7 +899,7 @@ logical_plan 07)Unnest: lists[__unnest_placeholder(outer_ref(u.column1))|de

[I] `FULL OUTER JOIN` and `LIMIT` produces wrong results [datafusion]

2025-01-28 Thread via GitHub
alamb opened a new issue, #14335: URL: https://github.com/apache/datafusion/issues/14335 ### Describe the bug `LIMIT`s are incorrectly pushed through `FULL OUTER` Joins ### To Reproduce ```sql COPY (values (1), (2), (3), (4), (5)) TO '/tmp/t1.csv' STORED AS CSV; --

[PR] Add test case for incorrect FULL OUTER JOIN + LIMIT pushdown @alamb [datafusion]

2025-01-28 Thread via GitHub
alamb opened a new pull request, #14336: URL: https://github.com/apache/datafusion/pull/14336 ## Which issue does this PR close? - Related to https://github.com/apache/datafusion/issues/14335 ## Rationale for this change I wrote these test cases finding the bug, so I want

Re: [PR] fix: LimitPushdown rule uncorrect remove some GlobalLimitExec [datafusion]

2025-01-28 Thread via GitHub
alamb commented on code in PR #14245: URL: https://github.com/apache/datafusion/pull/14245#discussion_r1932060065 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -4247,8 +4247,10 @@ logical_plan physical_plan 01)CoalesceBatchesExec: target_batch_size=3, fetch=2 02)--Ha

Re: [PR] Add related source code locations to errors [datafusion]

2025-01-28 Thread via GitHub
eliaperantoni commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1932058075 ## datafusion/common/src/column.rs: ## @@ -254,6 +288,19 @@ impl Column { .collect(), }) } + +pub fn spans(&self) -> &Span

Re: [I] `FULL OUTER JOIN` and `LIMIT` produces wrong results [datafusion]

2025-01-28 Thread via GitHub
alamb commented on issue #14335: URL: https://github.com/apache/datafusion/issues/14335#issuecomment-2618842149 I don't think this is a regression (it has been like this for while) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Optimization: support push down limit when full join [datafusion]

2025-01-28 Thread via GitHub
alamb commented on PR #12963: URL: https://github.com/apache/datafusion/pull/12963#issuecomment-2618846767 I think this optimization produces incorrect results. See - https://github.com/apache/datafusion/pull/12963 -- This is an automated message from the Apache Git Service. To respond

Re: [I] Utility .map function for DataFrame for modifying internal LogicalPlan [datafusion]

2025-01-28 Thread via GitHub
alamb commented on issue #14317: URL: https://github.com/apache/datafusion/issues/14317#issuecomment-2618856619 This seems like a good idea to me As I understand it, it would allow things like ```rust let df = ctx.sql("SELECT * from foo"); let df = df.map(my_awesome_rewri

Re: [PR] Feat/parameterized sql queries [datafusion-python]

2025-01-28 Thread via GitHub
matko commented on code in PR #964: URL: https://github.com/apache/datafusion-python/pull/964#discussion_r1932064792 ## python/datafusion/context.py: ## @@ -534,12 +543,20 @@ def sql(self, query: str, options: SQLOptions | None = None) -> DataFrame: Args:

Re: [I] Add `DataFrame::map` utility .map function for DataFrame for modifying internal LogicalPlan [datafusion]

2025-01-28 Thread via GitHub
timsaucer commented on issue #14317: URL: https://github.com/apache/datafusion/issues/14317#issuecomment-2618893233 We have something similar in `datafusion-python` [here](https://github.com/apache/datafusion-python/blob/main/python/datafusion/dataframe.py#L835). It lets you do something li

Re: [PR] equivalence classes: use normalized mapping for projection [datafusion]

2025-01-28 Thread via GitHub
berkaysynnada commented on PR #14327: URL: https://github.com/apache/datafusion/pull/14327#issuecomment-2618903320 Thank you @askalt. I'll review this in detail. One initial suggestion: instead of adding a unit test and introducing additional testing components, I believe the same log

Re: [I] `FULL OUTER JOIN` and `LIMIT` produces wrong results [datafusion]

2025-01-28 Thread via GitHub
zhuqi-lucas commented on issue #14335: URL: https://github.com/apache/datafusion/issues/14335#issuecomment-2618913801 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] `FULL OUTER JOIN` and `LIMIT` produces wrong results [datafusion]

2025-01-28 Thread via GitHub
zhuqi-lucas commented on issue #14335: URL: https://github.com/apache/datafusion/issues/14335#issuecomment-2618916414 I want to take this task, and for the first step, i think we should disable full out join push down limit. For further improvement, we can investigate if we have some

Re: [PR] minor: add unit tests for monotonicity.rs [datafusion]

2025-01-28 Thread via GitHub
berkaysynnada commented on code in PR #14307: URL: https://github.com/apache/datafusion/pull/14307#discussion_r1931714859 ## datafusion/functions/src/math/monotonicity.rs: ## @@ -558,3 +558,405 @@ pub fn get_tanh_doc() -> &'static Documentation { .build() }) } + +

Re: [I] Expand Test Coverage for ScalarUDF's [datafusion]

2025-01-28 Thread via GitHub
berkaysynnada closed issue #10595: Expand Test Coverage for ScalarUDF's URL: https://github.com/apache/datafusion/issues/10595 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Add related source code locations to errors [datafusion]

2025-01-28 Thread via GitHub
eliaperantoni commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1931818796 ## datafusion/expr/src/expr.rs: ## @@ -1663,6 +1667,13 @@ impl Expr { | Expr::Placeholder(..) => false, } } + +pub fn spans(&s

[PR] chore(deps): update rand_distr requirement from 0.4.3 to 0.5.0 [datafusion]

2025-01-28 Thread via GitHub
dependabot[bot] opened a new pull request, #14334: URL: https://github.com/apache/datafusion/pull/14334 Updates the requirements on [rand_distr](https://github.com/rust-random/rand) to permit the latest version. Changelog Sourced from https://github.com/rust-random/rand/blob/master

Re: [PR] Add related source code locations to errors [datafusion]

2025-01-28 Thread via GitHub
eliaperantoni commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1931814763 ## datafusion/expr/src/expr_rewriter/mod.rs: ## @@ -181,10 +178,14 @@ pub fn create_col_from_scalar_expr( Some::(subqry_alias.into()),

Re: [PR] Add related source code locations to errors [datafusion]

2025-01-28 Thread via GitHub
eliaperantoni commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1931822190 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -68,11 +70,57 @@ impl Signature { } } -/// Returns a [`Signature`] for applying `op` to

Re: [PR] Add related source code locations to errors [datafusion]

2025-01-28 Thread via GitHub
eliaperantoni commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1931825255 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -68,11 +70,57 @@ impl Signature { } } -/// Returns a [`Signature`] for applying `op` to

Re: [PR] moving memory.rs out of datafusion/core [datafusion]

2025-01-28 Thread via GitHub
logan-keede commented on code in PR #14332: URL: https://github.com/apache/datafusion/pull/14332#discussion_r1931760587 ## datafusion/catalog/src/lib.rs: ## @@ -15,6 +15,16 @@ // specific language governing permissions and limitations // under the License. +//! Interfaces an

Re: [I] User Defined Coercion Rules [datafusion]

2025-01-28 Thread via GitHub
jayzhan211 commented on issue #14296: URL: https://github.com/apache/datafusion/issues/14296#issuecomment-2618344472 > > We have type coercion in logical plan now, consider the case where we want to separate logical types and physical types, should we add another type coercion layer in phys

Re: [PR] moving memory.rs out of datafusion/core [datafusion]

2025-01-28 Thread via GitHub
logan-keede commented on PR #14332: URL: https://github.com/apache/datafusion/pull/14332#issuecomment-2618361269 @alamb can this please get a review? Thanks, Logan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] moving memory.rs out of datafusion/core [datafusion]

2025-01-28 Thread via GitHub
logan-keede commented on code in PR #14332: URL: https://github.com/apache/datafusion/pull/14332#discussion_r1931760587 ## datafusion/catalog/src/lib.rs: ## @@ -15,6 +15,16 @@ // specific language governing permissions and limitations // under the License. +//! Interfaces an

Re: [PR] Add related source code locations to errors [datafusion]

2025-01-28 Thread via GitHub
eliaperantoni commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1931793970 ## datafusion/common/src/diagnostic.rs: ## @@ -0,0 +1,112 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] Add related source code locations to errors [datafusion]

2025-01-28 Thread via GitHub
eliaperantoni commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1931802942 ## datafusion/sql/src/set_expr.rs: ## @@ -36,8 +40,17 @@ impl SqlToRel<'_, S> { right, set_quantifier, } => {

Re: [PR] minor: add unit tests for monotonicity.rs [datafusion]

2025-01-28 Thread via GitHub
berkaysynnada merged PR #14307: URL: https://github.com/apache/datafusion/pull/14307 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] chore(deps): update rand requirement from 0.8 to 0.9 [datafusion]

2025-01-28 Thread via GitHub
dependabot[bot] opened a new pull request, #14333: URL: https://github.com/apache/datafusion/pull/14333 Updates the requirements on [rand](https://github.com/rust-random/rand) to permit the latest version. Changelog Sourced from https://github.com/rust-random/rand/blob/master/CHANG

Re: [I] Ensure `to_timestamp` behaves consistently with PostgreSQL [datafusion]

2025-01-28 Thread via GitHub
Omega359 commented on issue #13351: URL: https://github.com/apache/datafusion/issues/13351#issuecomment-2618686024 I think throwing an error on invalid format may be useful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Add related source code locations to errors [datafusion]

2025-01-28 Thread via GitHub
eliaperantoni commented on code in PR #13664: URL: https://github.com/apache/datafusion/pull/13664#discussion_r1931896684 ## datafusion/expr-common/Cargo.toml: ## @@ -41,3 +41,4 @@ arrow = { workspace = true } datafusion-common = { workspace = true } itertools = { workspace =

  1   2   3   >