Re: [PR] add catalog as part of the table path in plan_to_sql [datafusion]

2024-05-21 Thread via GitHub
phillipleblanc commented on code in PR #10612: URL: https://github.com/apache/datafusion/pull/10612#discussion_r1609299261 ## datafusion/sql/src/unparser/plan.rs: ## @@ -502,3 +505,35 @@ impl From for DataFusionError { DataFusionError::External(Box::new(e)) } } +

[PR] add catalog as part of the table path in plan_to_sql [datafusion]

2024-05-21 Thread via GitHub
y-f-u opened a new pull request, #10612: URL: https://github.com/apache/datafusion/pull/10612 ## Which issue does this PR close? TableReference has the catalog information but it's not used in `plan_to_sql` ## Rationale for this change It's common for DBMS to have pattern

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-21 Thread via GitHub
goldmedal commented on code in PR #10573: URL: https://github.com/apache/datafusion/pull/10573#discussion_r1609280746 ## datafusion/sql/src/unparser/dialect.rs: ## @@ -15,19 +15,33 @@ // specific language governing permissions and limitations // under the License. -/// Diale

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-21 Thread via GitHub
phillipleblanc commented on code in PR #10573: URL: https://github.com/apache/datafusion/pull/10573#discussion_r1609277646 ## datafusion/sql/src/unparser/dialect.rs: ## @@ -15,19 +15,33 @@ // specific language governing permissions and limitations // under the License. -///

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-21 Thread via GitHub
goldmedal commented on PR #10573: URL: https://github.com/apache/datafusion/pull/10573#issuecomment-2123883816 > Thank you @goldmedal -- I think this looks really nice > > Thank you for the reviews @comphead > > I left some suggestions for improvement but I think they could be d

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-21 Thread via GitHub
goldmedal commented on code in PR #10573: URL: https://github.com/apache/datafusion/pull/10573#discussion_r1609270235 ## datafusion/sql/src/unparser/dialect.rs: ## @@ -15,19 +15,30 @@ // specific language governing permissions and limitations // under the License. +use regex

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-21 Thread via GitHub
appletreeisyellow commented on PR #10268: URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2123829586 > I am not sure I have the time to do that in the next week -- maybe @appletreeisyellow does 🤔 @alamb I'm happy to coordinate with our performance team and run an in

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1609228014 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,26 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSpa

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
tshauck commented on PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#issuecomment-2123824670 @kazuyukitanimura @advancedxy ... re-requesting reviews from you two, please. I've updated the code to support dictionaries and removed some of the finer int types. Barring somethi

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
advancedxy commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1609204351 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_co

Re: [PR] Minor: Move group accumulator for aggregate function to physical-expr-common, and add ahash physical-expr-common [datafusion]

2024-05-21 Thread via GitHub
jayzhan211 merged PR #10574: URL: https://github.com/apache/datafusion/pull/10574 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Minor: Move group accumulator for aggregate function to physical-expr-common, and add ahash physical-expr-common [datafusion]

2024-05-21 Thread via GitHub
jayzhan211 commented on PR #10574: URL: https://github.com/apache/datafusion/pull/10574#issuecomment-2123785496 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[PR] Minor: Move median test [datafusion]

2024-05-21 Thread via GitHub
jayzhan211 opened a new pull request, #10611: URL: https://github.com/apache/datafusion/pull/10611 ## Which issue does this PR close? Part of #10384 . ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1609182533 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -505,7 +507,35 @@ pub async fn from_substrait_rel( _ => Ok(t), }

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1609180143 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1277,7 +1404,84 @@ pub(crate) fn from_substrait_literal(lit: &Literal) -> Result {

Re: [I] PR build for Linux Java 11 with Spark 3.4 is not running [datafusion-comet]

2024-05-21 Thread via GitHub
advancedxy commented on issue #389: URL: https://github.com/apache/datafusion-comet/issues/389#issuecomment-2123752077 There's quota limit per repo for github runners(might not be a problem for apache project but for the forked ones) and we thought it would be sufficient to cover Java8(old

Re: [PR] test: show stats in explain of two representative queries [datafusion]

2024-05-21 Thread via GitHub
github-actions[bot] commented on PR #8173: URL: https://github.com/apache/datafusion/pull/8173#issuecomment-2123722891 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] Draft: Optimize to_timestamp [datafusion]

2024-05-21 Thread via GitHub
github-actions[bot] commented on PR #9694: URL: https://github.com/apache/datafusion/pull/9694#issuecomment-2123722853 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

[PR] adding benchmark for extracting arrow statistics from parquet [datafusion]

2024-05-21 Thread via GitHub
Lordworms opened a new pull request, #10610: URL: https://github.com/apache/datafusion/pull/10610 ## Which issue does this PR close? cargo bench --bench parquet_statistic Closes #10606 ## Rationale for this change ## What changes are included in this PR

Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

2024-05-21 Thread via GitHub
shanretoo commented on issue #6747: URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2123697634 You can check it in the unit test: [`test_fn_lead`](https://github.com/shanretoo/datafusion/blob/65a8895d948152845dda1934b404399919ebe23c/datafusion/core/tests/dataframe/datafram

Re: [PR] Start setting up new StreamTable config [datafusion]

2024-05-21 Thread via GitHub
matthewmturner commented on PR #10600: URL: https://github.com/apache/datafusion/pull/10600#issuecomment-2123694427 Im hoping to get to a similar API as `ListingTable`. `ListingTable` => `ListingTableConfig` => `FileFormat` Where `ListingTable` and `ListingTableConfig` are provi

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10560: URL: https://github.com/apache/datafusion/pull/10560#issuecomment-2123673946 THanks @jayzhan211 -- I will plan to review this tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] feat: correlation support [datafusion-comet]

2024-05-21 Thread via GitHub
huaxingao commented on PR #456: URL: https://github.com/apache/datafusion-comet/pull/456#issuecomment-2123660352 cc @andygrove @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-21 Thread via GitHub
jayzhan211 commented on PR #10268: URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2123600422 I believe the coercion rule is quite messy as it currently stands. It would be more understandable and maintainable to move the coercion rule from coerced_from to each individual

Re: [I] Implement a benchmark for extracting arrow statistics from parquet [datafusion]

2024-05-21 Thread via GitHub
Lordworms commented on issue #10606: URL: https://github.com/apache/datafusion/issues/10606#issuecomment-2123596547 Take this one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] PR build for Linux Java 11 with Spark 3.4 is not running [datafusion-comet]

2024-05-21 Thread via GitHub
kazuyukitanimura commented on issue #389: URL: https://github.com/apache/datafusion-comet/issues/389#issuecomment-2123578608 Thanks @advancedxy Do you know the original reason why this specific combination was excluded by any chance? -- This is an automated message from the Apache Git Se

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-05-21 Thread via GitHub
erratic-pattern commented on PR #10386: URL: https://github.com/apache/datafusion/pull/10386#issuecomment-2123560931 @alamb I will try to update this today or tomorrow. I've been putting this off a bit. -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] refactor: reduce allocations in push down filter [datafusion]

2024-05-21 Thread via GitHub
erratic-pattern commented on PR #10567: URL: https://github.com/apache/datafusion/pull/10567#issuecomment-2123559213 Yes the `Arc::clone` is not a performance improvement, but was just a way for me to keep track of which clones were "zero copy" while working on this. It is a recommended con

Re: [PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-21 Thread via GitHub
ozankabak commented on PR #10590: URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2123531925 I agree with @alamb. Maybe we can do a quick survey on how different systems take per-column schema metadata and try to elucidate the best DF syntax from that -- This is an auto

Re: [I] Excessive memory consumption on sorting [datafusion]

2024-05-21 Thread via GitHub
samuelcolvin commented on issue #10511: URL: https://github.com/apache/datafusion/issues/10511#issuecomment-2123523912 Sorry for the delay, here we go: ### `logical_plan` ``` Projection: records_store.span_name Limit: skip=0, fetch=20 Sort: bit_length(records_sto

Re: [PR] Improve `UserDefinedLogicalNodeCore::from_template` API to return Result [datafusion]

2024-05-21 Thread via GitHub
alamb merged PR #10597: URL: https://github.com/apache/datafusion/pull/10597 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-21 Thread via GitHub
kazuyukitanimura commented on code in PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#discussion_r1608978203 ## spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/CometShuffleExchangeExec.scala: ## @@ -454,12 +458,13 @@ class CometShuffleWritePro

Re: [PR] Start setting up new StreamTable config [datafusion]

2024-05-21 Thread via GitHub
matthewmturner commented on PR #10600: URL: https://github.com/apache/datafusion/pull/10600#issuecomment-2123477465 @alamb i would be happy to add example - for this PR it would likely just be copying from https://github.com/apache/datafusion/blob/main/datafusion/core/tests/fifo.rs. I wil

Re: [PR] Test for reading read statistics from parquet files without statistics and boolean & struct data type [datafusion]

2024-05-21 Thread via GitHub
NGA-TRAN commented on code in PR #10608: URL: https://github.com/apache/datafusion/pull/10608#discussion_r1608970404 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -73,6 +73,22 @@ pub fn parquet_file_one_column( no_null_values_start: i64, no_null_values_en

Re: [I] [EPIC] Efficiently and correctly extract parquet statistics into ArrayRefs [datafusion]

2024-05-21 Thread via GitHub
NGA-TRAN commented on issue #10453: URL: https://github.com/apache/datafusion/issues/10453#issuecomment-2123465859 @alamb Another bug: https://github.com/apache/datafusion/issues/10609 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[I] Incorrect statistics read for struct array in parquet [datafusion]

2024-05-21 Thread via GitHub
NGA-TRAN opened a new issue, #10609: URL: https://github.com/apache/datafusion/issues/10609 ### Describe the bug I found this while adding tests https://github.com/apache/datafusion/pull/10608. The statistics of struct array returns nothing ### To Reproduce See `test_st

Re: [PR] feat: Add eliminate group by constant optimizer rule [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10591: URL: https://github.com/apache/datafusion/pull/10591#discussion_r1608953945 ## datafusion/optimizer/src/eliminate_group_by_constant.rs: ## @@ -0,0 +1,318 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [PR] Test for reading read statistics from parquet files without statistics [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10608: URL: https://github.com/apache/datafusion/pull/10608#issuecomment-2123435559 This PR also appears to have a conflict now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Improve `UserDefinedLogicalNodeCore::from_template` API to return Result [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10597: URL: https://github.com/apache/datafusion/pull/10597#issuecomment-2123433357 I took the liberty of merging this branch up from main to resolve a conflict. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Improve `UserDefinedLogicalNodeCore::from_template` API to return Result [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10597: URL: https://github.com/apache/datafusion/pull/10597#discussion_r1608941760 ## datafusion/expr/src/logical_plan/extension.rs: ## @@ -248,23 +248,27 @@ pub trait UserDefinedLogicalNodeCore: /// For example: `TopK: k=10` fn fmt_for_e

Re: [PR] feat: extend `unnest` to support Struct datatype [datafusion]

2024-05-21 Thread via GitHub
duongcongtoai commented on code in PR #10429: URL: https://github.com/apache/datafusion/pull/10429#discussion_r1608935540 ## datafusion/expr/src/expr_schema.rs: ## @@ -123,7 +123,8 @@ impl ExprSchemable for Expr { Ok(field.data_type().clone())

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10573: URL: https://github.com/apache/datafusion/pull/10573#discussion_r1608928216 ## datafusion-examples/examples/plan_to_sql.rs: ## @@ -52,7 +52,7 @@ fn simple_expr_to_sql_demo() -> Result<()> { let expr = col("a").lt(lit(5)).or(col("a").eq(

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10386: URL: https://github.com/apache/datafusion/pull/10386#issuecomment-2123403077 Is this PR ready for the next round of review @erratic-pattern ? Or do you plan to make further changes to it? -- This is an automated message from the Apache Git Service. To respond

Re: [PR] feat: extend unnest to support Struct datatype [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10429: URL: https://github.com/apache/datafusion/pull/10429#discussion_r1608917787 ## datafusion/sqllogictest/test_files/unnest.slt: ## @@ -288,6 +308,18 @@ select unnest(array_remove(column1, 12)) from unnest_table; 5 6 +## unnest struct-typed

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10268: URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2123378506 Basically I worry this is just changing behavior rather than fixing a bug and will result in churn for no benefit downstream. I may be mis understanding the change and rationale howeve

Re: [PR] Fix `Coalesce` casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10268: URL: https://github.com/apache/datafusion/pull/10268#issuecomment-2123377440 In general I am concerned about the potential downstream effects of this change. I don't fully understand them What I would ideally like to do is to run the influxdb_iox regress

Re: [PR] feat: support `grouping` aggregate function [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10208: URL: https://github.com/apache/datafusion/pull/10208#discussion_r1608905520 ## datafusion/physical-expr/src/aggregate/grouping.rs: ## @@ -96,8 +113,172 @@ impl PartialEq for Grouping { self.name == x.name

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-21 Thread via GitHub
appletreeisyellow commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2123367235 I'd like to work on this issue 🙋‍♀️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] feat: correlation support [datafusion-comet]

2024-05-21 Thread via GitHub
huaxingao commented on code in PR #456: URL: https://github.com/apache/datafusion-comet/pull/456#discussion_r1608882278 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -1212,6 +1212,157 @@ class CometAggregateSuite extends CometTestBase with Adapt

Re: [PR] feat: correlation support [datafusion-comet]

2024-05-21 Thread via GitHub
huaxingao commented on code in PR #456: URL: https://github.com/apache/datafusion-comet/pull/456#discussion_r1608882001 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -547,6 +547,26 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde wit

Re: [PR] Minor: Move group accumulator for aggregate function to physical-expr-common, and add ahash physical-expr-common [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10574: URL: https://github.com/apache/datafusion/pull/10574#discussion_r1608876810 ## datafusion/physical-expr-common/src/aggregate/groups_accumulator/mod.rs: ## @@ -0,0 +1,20 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

Re: [I] docs.rs build fails for datafusion-proto `37.0.0` [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #10163: URL: https://github.com/apache/datafusion/issues/10163#issuecomment-2123330179 FWIW it seems to be working on 38.0.0: https://docs.rs/datafusion-proto/latest/datafusion_proto/ ![Screenshot 2024-05-21 at 3 47 44  PM](https://github.com/apache/datafusion

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608872166 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,26 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSpa

Re: [PR] Test for reading read statistics from parquet files without statistics [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10608: URL: https://github.com/apache/datafusion/pull/10608#discussion_r1608873185 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -73,6 +73,22 @@ pub fn parquet_file_one_column( no_null_values_start: i64, no_null_values_end:

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608872166 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,26 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSpa

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608872166 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,26 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSpa

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608871374 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,46 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSpa

Re: [PR] test: add more tests for statistics reading [datafusion]

2024-05-21 Thread via GitHub
NGA-TRAN commented on code in PR #10592: URL: https://github.com/apache/datafusion/pull/10592#discussion_r1608870254 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -624,20 +624,281 @@ async fn test_dates_64_diff_rg_sizes() { .run("date64"); } +// BUG: +// ht

Re: [PR] Start setting up new StreamTable config [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10600: URL: https://github.com/apache/datafusion/pull/10600#discussion_r1608866960 ## datafusion/core/src/datasource/stream.rs: ## @@ -103,19 +105,29 @@ impl FromStr for StreamEncoding { } } -/// The configuration for a [`StreamTable`] +pub

Re: [PR] feat: correlation support [datafusion-comet]

2024-05-21 Thread via GitHub
kazuyukitanimura commented on code in PR #456: URL: https://github.com/apache/datafusion-comet/pull/456#discussion_r1608866121 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -547,6 +547,26 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSe

Re: [PR] Implement Unparser for `UNION ALL` [datafusion]

2024-05-21 Thread via GitHub
alamb merged PR #10603: URL: https://github.com/apache/datafusion/pull/10603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Implement Unparser for `UNION ALL` [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10603: URL: https://github.com/apache/datafusion/pull/10603#discussion_r1608860177 ## datafusion/sql/src/unparser/plan.rs: ## @@ -347,8 +373,33 @@ impl Unparser<'_> { Ok(()) } -LogicalPlan::Union(_union)

Re: [PR] Rename monotonicity as output_ordering in ScalarUDF's [datafusion]

2024-05-21 Thread via GitHub
alamb merged PR #10596: URL: https://github.com/apache/datafusion/pull/10596 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add to_date function to scalar functions doc [datafusion]

2024-05-21 Thread via GitHub
alamb merged PR #10601: URL: https://github.com/apache/datafusion/pull/10601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Add to_date function to scalar functions doc [datafusion]

2024-05-21 Thread via GitHub
alamb closed issue #10461: Add to_date function to scalar functions doc URL: https://github.com/apache/datafusion/issues/10461 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] error: this arithmetic operation will overflow (on i386) [datafusion]

2024-05-21 Thread via GitHub
alamb closed issue #10552: error: this arithmetic operation will overflow (on i386) URL: https://github.com/apache/datafusion/issues/10552 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Fix compilation of datafusion-cli on 32bit targets [datafusion]

2024-05-21 Thread via GitHub
alamb merged PR #10594: URL: https://github.com/apache/datafusion/pull/10594 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix compilation of datafusion-cli on 32bit targets [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10594: URL: https://github.com/apache/datafusion/pull/10594#issuecomment-2123297797 Thank you @nathaniel-daniel and @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] refactor: reduce allocations in push down filter [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10567: URL: https://github.com/apache/datafusion/pull/10567#issuecomment-2123296914 Thanks again @erratic-pattern -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] refactor: reduce allocations in push down filter [datafusion]

2024-05-21 Thread via GitHub
alamb merged PR #10567: URL: https://github.com/apache/datafusion/pull/10567 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] read statistics from parquet without statistics [datafusion]

2024-05-21 Thread via GitHub
NGA-TRAN opened a new pull request, #10608: URL: https://github.com/apache/datafusion/pull/10608 ## Which issue does this PR close? More tests for https://github.com/apache/datafusion/issues/10453 ## Rationale for this change ## What changes are included i

Re: [PR] test: add more tests for statistics reading [datafusion]

2024-05-21 Thread via GitHub
alamb merged PR #10592: URL: https://github.com/apache/datafusion/pull/10592 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] test: add more tests for statistics reading [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10592: URL: https://github.com/apache/datafusion/pull/10592#issuecomment-2123281857 Since this PR is just tests, merging it in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] test: add more tests for statistics reading [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10592: URL: https://github.com/apache/datafusion/pull/10592#discussion_r1608840230 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -624,20 +624,281 @@ async fn test_dates_64_diff_rg_sizes() { .run("date64"); } +// BUG: +// https

Re: [PR] fix: Specify schema when converting TPC-H csv to parquet [datafusion-benchmarks]

2024-05-21 Thread via GitHub
andygrove merged PR #3: URL: https://github.com/apache/datafusion-benchmarks/pull/3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] Fixes bug expect `Date32Array` but returns Int32Array [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10593: URL: https://github.com/apache/datafusion/pull/10593#discussion_r1608838018 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -593,10 +593,16 @@ async fn test_dates_32_diff_rg_sizes() { Test { reader, -// mi

Re: [PR] Fixes bug expect `Date32Array` but returns Int32Array [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10593: URL: https://github.com/apache/datafusion/pull/10593#issuecomment-2123268926 > Not a PMC or a full-time committer, my review is not very valuable, but looks good to me. Your review is valuable in my opinion -- thank you @edmondop -- This is an automat

Re: [PR] Refactor parquet row group pruning into a struct (use new statistics API, part 1) [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10607: URL: https://github.com/apache/datafusion/pull/10607#discussion_r1608821292 ## datafusion/core/src/datasource/physical_plan/parquet/mod.rs: ## @@ -556,32 +557,36 @@ impl FileOpener for ParquetOpener { }; }; -

Re: [PR] feat: Add HashJoin support for BuildRight [datafusion-comet]

2024-05-21 Thread via GitHub
viirya commented on code in PR #437: URL: https://github.com/apache/datafusion-comet/pull/437#discussion_r1608831021 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2417,11 +2417,12 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde wit

[I] Comet doesn't support Spark BroadcastHashJoinExec if it is null-aware anti-join [datafusion-comet]

2024-05-21 Thread via GitHub
viirya opened a new issue, #457: URL: https://github.com/apache/datafusion-comet/issues/457 ### What is the problem the feature request solves? DataFusion HashJoin LeftAnti doesn't support null-aware anti join. See https://github.com/apache/datafusion/issues/10583 ### Des

[PR] fix: Specify schema when converting TPC-H csv to parquet [datafusion-benchmarks]

2024-05-21 Thread via GitHub
andygrove opened a new pull request, #3: URL: https://github.com/apache/datafusion-benchmarks/pull/3 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] feat: correlation support [datafusion-comet]

2024-05-21 Thread via GitHub
codecov-commenter commented on PR #456: URL: https://github.com/apache/datafusion-comet/pull/456#issuecomment-2123243529 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/456?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

[PR] Refactor parquet row group pruning into a struct [datafusion]

2024-05-21 Thread via GitHub
alamb opened a new pull request, #10607: URL: https://github.com/apache/datafusion/pull/10607 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-21 Thread via GitHub
alamb commented on code in PR #10504: URL: https://github.com/apache/datafusion/pull/10504#discussion_r1608748477 ## datafusion/functions/src/math/monotonicity.rs: ## @@ -0,0 +1,241 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [I] DataFusion to run SQL queries on Parquet files with error No suitable object store found for file [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #9280: URL: https://github.com/apache/datafusion/issues/9280#issuecomment-2123183351 > I'd be happy to contribute some docs / examples if you point me at something similar. Thanks @aditanase 🙏 I would recommend two things: # Suggestion 1: Change

Re: [I] [EPIC] Efficiently and correctly extract parquet statistics into ArrayRefs [datafusion]

2024-05-21 Thread via GitHub
alamb commented on issue #10453: URL: https://github.com/apache/datafusion/issues/10453#issuecomment-2123173647 > @alamb I have created 2 more bug tickets but I cannot edit the description to add them in the subtasks. Can you help with that? Done -- This is an automated message fro

Re: [PR] test: add more tests for statistics reading [datafusion]

2024-05-21 Thread via GitHub
alamb commented on PR #10592: URL: https://github.com/apache/datafusion/pull/10592#issuecomment-2123171863 > What I still cannot understand is this a regression test for the bug we missed earlier? My strong suspicion is that the bugs @NGA-TRAN is finding would manifest themselves as

Re: [PR] feat: correlation support [datafusion-comet]

2024-05-21 Thread via GitHub
huaxingao closed pull request #456: feat: correlation support URL: https://github.com/apache/datafusion-comet/pull/456 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
kazuyukitanimura commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608714060 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,26 @@ class CometExpressionSuite extends CometTestBase with Ad

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
kazuyukitanimura commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608699758 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,46 @@ class CometExpressionSuite extends CometTestBase with Ad

Re: [I] Ensure examples stay updated in CI. [datafusion-python]

2024-05-21 Thread via GitHub
andygrove closed issue #696: Ensure examples stay updated in CI. URL: https://github.com/apache/datafusion-python/issues/696 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Regression in `substr` performance from 37.1.0 to 38.0.0 [datafusion-python]

2024-05-21 Thread via GitHub
andygrove closed issue #712: Regression in `substr` performance from 37.1.0 to 38.0.0 URL: https://github.com/apache/datafusion-python/issues/712 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] tsaucer/run TPC-H examples in CI [datafusion-python]

2024-05-21 Thread via GitHub
andygrove merged PR #711: URL: https://github.com/apache/datafusion-python/pull/711 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

2024-05-21 Thread via GitHub
timsaucer commented on issue #6747: URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2123031753 Oh, great. Have you been able to run the [example code above](https://github.com/apache/datafusion/issues/6747#issuecomment-2090260284) using the new easy interface? -- This

Re: [PR] test: add more tests for statistics reading [datafusion]

2024-05-21 Thread via GitHub
NGA-TRAN commented on PR #10592: URL: https://github.com/apache/datafusion/pull/10592#issuecomment-2123026061 @comphead > What I still cannot understand is this a regression test for the bug we missed earlier? I am working on new arrow statistics API https://github.com/apache

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-21 Thread via GitHub
advancedxy commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1608635628 ## core/src/execution/datafusion/expressions/scalar_funcs/hex.rs: ## @@ -0,0 +1,191 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or m

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-21 Thread via GitHub
comphead commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1608634043 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_coun

[I] Implement a benchmark for extracting arrow statistics from parquet [datafusion]

2024-05-21 Thread via GitHub
alamb opened a new issue, #10606: URL: https://github.com/apache/datafusion/issues/10606 ### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/10453 Part of https://github.com/apache/datafusion/issues/10453 is to "ef

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1608623644 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1277,7 +1404,84 @@ pub(crate) fn from_substrait_literal(lit: &Literal) -> Result {

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-21 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1608622303 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -505,7 +507,35 @@ pub async fn from_substrait_rel( _ => Ok(t), }

  1   2   >