Re: [PR] Minor: Add getter for logical optimizer rules [datafusion]

2024-09-07 Thread via GitHub
Weijun-H commented on PR #12379: URL: https://github.com/apache/datafusion/pull/12379#issuecomment-2336534360 Thanks @maronavenue for contribution and @Dandandan for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Minor: Add getter for logical optimizer rules [datafusion]

2024-09-07 Thread via GitHub
Weijun-H merged PR #12379: URL: https://github.com/apache/datafusion/pull/12379 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Minor: Add getter for logical optimizer rules [datafusion]

2024-09-07 Thread via GitHub
maronavenue commented on code in PR #12379: URL: https://github.com/apache/datafusion/pull/12379#discussion_r1749050920 ## datafusion/core/src/execution/session_state.rs: ## @@ -1922,4 +1929,32 @@ mod tests { assert!(new_state.catalog_list().catalog(&default_catalog).is

Re: [PR] Array agg groups accumulator [datafusion]

2024-09-07 Thread via GitHub
github-actions[bot] closed pull request #10149: Array agg groups accumulator URL: https://github.com/apache/datafusion/pull/10149 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Change Expr `PartialOrd` to not rely on comparing hash values [datafusion]

2024-09-07 Thread via GitHub
ngli-me commented on issue #8932: URL: https://github.com/apache/datafusion/issues/8932#issuecomment-2336510088 Hi, I'm new to this project and was poking through the issues, can I try working this one out? It looks like, to remove the hash we would need to first compare by enum var

Re: [PR] Fix cast timestamp test #468 [datafusion-comet]

2024-09-07 Thread via GitHub
codecov-commenter commented on PR #923: URL: https://github.com/apache/datafusion-comet/pull/923#issuecomment-2336473324 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/923?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [I] Can't convert date_bin aggregated with count(*) to arrow if some windows contain null data [datafusion-python]

2024-09-07 Thread via GitHub
rickspencer3 commented on issue #862: URL: https://github.com/apache/datafusion-python/issues/862#issuecomment-2336472790 > Oh, I just realized that might not work since `functions.when` was recently exposed. You would need to do something like `F.case(col("count") != lit(0)).when(lit(True

Re: [PR] Fix cast timestamp test #468 [datafusion-comet]

2024-09-07 Thread via GitHub
comphead commented on PR #923: URL: https://github.com/apache/datafusion-comet/pull/923#issuecomment-2336466004 Thanks @himadripal for your contribution, triggering the workflow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] Can't convert date_bin aggregated with count(*) to arrow if some windows contain null data [datafusion-python]

2024-09-07 Thread via GitHub
timsaucer commented on issue #862: URL: https://github.com/apache/datafusion-python/issues/862#issuecomment-2336458296 Oh, I just realized that might not work since `functions.when` was recently exposed. You would need to do something like `F.case(col("count") != lit(0)).when(lit(True), co

Re: [PR] Add `field` trait method to `WindowUDFImpl` [datafusion]

2024-09-07 Thread via GitHub
jcsherin commented on code in PR #12374: URL: https://github.com/apache/datafusion/pull/12374#discussion_r1748972478 ## datafusion/expr/src/expr_schema.rs: ## @@ -352,7 +358,18 @@ impl ExprSchemable for Expr { } } WindowFunc

Re: [PR] Fix cast timestamp test #468 [datafusion-comet]

2024-09-07 Thread via GitHub
himadripal commented on PR #923: URL: https://github.com/apache/datafusion-comet/pull/923#issuecomment-2336454885 @viirya @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Can't convert date_bin aggregated with count(*) to arrow if some windows contain null data [datafusion-python]

2024-09-07 Thread via GitHub
timsaucer commented on issue #862: URL: https://github.com/apache/datafusion-python/issues/862#issuecomment-2336452591 It's super hacky: ``` df = df.select("time_window", F.when(col("count") != lit(0), col("count")).end().alias("count")) ``` -- This is an automated message f

Re: [I] Can't convert date_bin aggregated with count(*) to arrow if some windows contain null data [datafusion-python]

2024-09-07 Thread via GitHub
timsaucer commented on issue #862: URL: https://github.com/apache/datafusion-python/issues/862#issuecomment-2336437952 Also, when I coerced the `count` field to be nullable it was able to work, so I've got a strong suspicion about that mixture problem. -- This is an automated message fro

Re: [I] Real-time streaming support [datafusion]

2024-09-07 Thread via GitHub
maronavenue commented on issue #10895: URL: https://github.com/apache/datafusion/issues/10895#issuecomment-2336432075 @alamb - Took my first stab at https://github.com/apache/datafusion/pull/12379. I have started looking at the docs setup. Hoping to contribute meaningfully on that front soo

Re: [I] Can't convert date_bin aggregated with count(*) to arrow if some windows contain null data [datafusion-python]

2024-09-07 Thread via GitHub
rickspencer3 commented on issue #862: URL: https://github.com/apache/datafusion-python/issues/862#issuecomment-2336431893 When looking at the dataframe under the debugger, it looks like it skipped the null values ok: ``` df DataFrame() +-+---+ | ti

[PR] Minor: Add getter for logical optimizer rules [datafusion]

2024-09-07 Thread via GitHub
maronavenue opened a new pull request, #12379: URL: https://github.com/apache/datafusion/pull/12379 ## Which issue does this PR close? N/A ## Rationale for this change My team and I have been exploring Datafusion's potential to federate our custom sources

Re: [PR] docs: Add documentation for architecture of Comet Parquet support [datafusion-comet]

2024-09-07 Thread via GitHub
comphead commented on code in PR #922: URL: https://github.com/apache/datafusion-comet/pull/922#discussion_r1748909897 ## docs/source/contributor-guide/plugin_overview.md: ## @@ -57,3 +57,50 @@ and this serialized plan is passed into the native code by `CometExecIterator`. In

[PR] docs: Add documentation for architecture of Comet Parquet support [datafusion-comet]

2024-09-07 Thread via GitHub
andygrove opened a new pull request, #922: URL: https://github.com/apache/datafusion-comet/pull/922 ## Which issue does this PR close? N/A ## Rationale for this change Add documentation to make it easier for new contributors to understand the architecture

[I] Consider using the FileFormat::get_ext when creating ListingOptions [datafusion]

2024-09-07 Thread via GitHub
waruto210 opened a new issue, #12378: URL: https://github.com/apache/datafusion/issues/12378 ### Is your feature request related to a problem or challenge? When I use the following code to attempt creating external tables using SQL and Rust API respectively, a "Corrupt footer" error o

[I] Implement grouping aggregate function [datafusion]

2024-09-07 Thread via GitHub
timsaucer opened a new issue, #12377: URL: https://github.com/apache/datafusion/issues/12377 ### Is your feature request related to a problem or challenge? This function has a placeholder udaf but is not yet implemented. ### Describe the solution you'd like Implement per

Re: [PR] Add documentation about performance PRs, add (TBD) section on feature criteria [datafusion]

2024-09-07 Thread via GitHub
comphead commented on code in PR #12372: URL: https://github.com/apache/datafusion/pull/12372#discussion_r1748609618 ## docs/source/contributor-guide/index.md: ## @@ -88,35 +106,61 @@ committer who approved your PR to help remind them to merge it. ## Creating Pull Requests

Re: [PR] Introduce the `DynamicFileCatalog` in `datafusion-catalog` [datafusion]

2024-09-07 Thread via GitHub
goldmedal commented on PR #11035: URL: https://github.com/apache/datafusion/pull/11035#issuecomment-2335668800 > I plan to leave this PR open until Monday so anyone else who is interested can take a look at it prior to merge. šŸ‘ > I left some suggestions on how to potentially im

Re: [PR] Enable datafusion.optimizer.filter_null_join_keys by default [datafusion]

2024-09-07 Thread via GitHub
Dandandan commented on code in PR #12369: URL: https://github.com/apache/datafusion/pull/12369#discussion_r1748404550 ## datafusion/optimizer/tests/optimizer_integration.rs: ## @@ -281,11 +278,9 @@ fn test_same_name_but_not_ambiguous() { let expected = "LeftSemi Join: t1.co

Re: [PR] Introduce the `DynamicFileCatalog` in `datafusion-catalog` [datafusion]

2024-09-07 Thread via GitHub
goldmedal commented on code in PR #11035: URL: https://github.com/apache/datafusion/pull/11035#discussion_r1748371500 ## datafusion/catalog/src/dynamic_file/catalog.rs: ## @@ -0,0 +1,184 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

[I] first_value and last_value should have identical signatures [datafusion]

2024-09-07 Thread via GitHub
timsaucer opened a new issue, #12376: URL: https://github.com/apache/datafusion/issues/12376 ### Describe the bug Less of a bug per se, but it would be nice to have identical function signatures between first_value and last_value ### To Reproduce _No response_ ###

[I] Enable datafusion.optimizer.filter_null_join_keys by default [datafusion]

2024-09-07 Thread via GitHub
Dandandan opened a new issue, #12375: URL: https://github.com/apache/datafusion/issues/12375 ### Is your feature request related to a problem or challenge? It probably is better to have it enabled by default, as pushing a filter down below a join is generally faster than the overhead

Re: [PR] Add `field` trait method to `WindowUDFImpl` [datafusion]

2024-09-07 Thread via GitHub
jayzhan211 commented on code in PR #12374: URL: https://github.com/apache/datafusion/pull/12374#discussion_r1748097339 ## datafusion/expr/src/expr_schema.rs: ## @@ -352,7 +358,18 @@ impl ExprSchemable for Expr { } } WindowFu

Re: [PR] Use `filtered_null_mask` in `CountGroupsAccumulator ` and `PrimitiveGroupsAccumulator` [datafusion]

2024-09-07 Thread via GitHub
Rachelint commented on PR #11825: URL: https://github.com/apache/datafusion/pull/11825#issuecomment-2335198529 > Here are some numebrs I got > > ``` > > Benchmark clickbench_1.json > > ā”ā”ā”ā”³ā”³ā”ā”

Re: [PR] Add `field` trait method to `WindowUDFImpl` [datafusion]

2024-09-07 Thread via GitHub
jcsherin commented on code in PR #12374: URL: https://github.com/apache/datafusion/pull/12374#discussion_r1748094081 ## datafusion/physical-plan/src/windows/mod.rs: ## @@ -73,7 +74,15 @@ pub fn schema_add_window_field( .iter() .map(|e| Arc::clone(e).as_ref().nu

Re: [PR] Add `field` trait method to `WindowUDFImpl` [datafusion]

2024-09-07 Thread via GitHub
jcsherin commented on code in PR #12374: URL: https://github.com/apache/datafusion/pull/12374#discussion_r1748092364 ## datafusion/expr/src/expr_schema.rs: ## @@ -352,7 +358,18 @@ impl ExprSchemable for Expr { } } WindowFunc

Re: [PR] Add `field` trait method to `WindowUDFImpl` [datafusion]

2024-09-07 Thread via GitHub
jcsherin commented on code in PR #12374: URL: https://github.com/apache/datafusion/pull/12374#discussion_r1748091147 ## datafusion/expr/src/expr_schema.rs: ## @@ -352,7 +358,18 @@ impl ExprSchemable for Expr { } } WindowFunc

[I] Add `field` trait method to `WindowUDFImpl` [datafusion]

2024-09-07 Thread via GitHub
jcsherin opened a new issue, #12373: URL: https://github.com/apache/datafusion/issues/12373 ### Is your feature request related to a problem or challenge? > Instead of return_type + nullable. I think `field` is a better choice. Given windowudf is pretty new so not widely used yet (?).

Re: [PR] Unparse TableScan with projections, filters or fetch to SQL string [datafusion]

2024-09-07 Thread via GitHub
goldmedal commented on PR #12158: URL: https://github.com/apache/datafusion/pull/12158#issuecomment-2335162233 Thanks @devinjdangelo @alamb @phillipleblanc for the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Minor: Add tests for using FilterExec when parquet was pushed down [datafusion]

2024-09-07 Thread via GitHub
alamb merged PR #12362: URL: https://github.com/apache/datafusion/pull/12362 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] add guidelines on separating python and rust code [datafusion-python]

2024-09-07 Thread via GitHub
timsaucer merged PR #860: URL: https://github.com/apache/datafusion-python/pull/860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] Support protobuf serialization for `ScalarValue::Utf8View` and `ScalarValue::BinaryView` [datafusion]

2024-09-07 Thread via GitHub
alamb closed issue #12117: Support protobuf serialization for `ScalarValue::Utf8View` and `ScalarValue::BinaryView` URL: https://github.com/apache/datafusion/issues/12117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] feat: Added DataFrameWriteOptions option when writing as csv, json, p… [datafusion-python]

2024-09-07 Thread via GitHub
allinux commented on code in PR #857: URL: https://github.com/apache/datafusion-python/pull/857#discussion_r1748042239 ## python/datafusion/dataframe.py: ## @@ -409,37 +409,62 @@ def except_all(self, other: DataFrame) -> DataFrame: """ return DataFrame(self.df.

Re: [PR] feat: Added DataFrameWriteOptions option when writing as csv, json, p… [datafusion-python]

2024-09-07 Thread via GitHub
allinux commented on code in PR #857: URL: https://github.com/apache/datafusion-python/pull/857#discussion_r1748042148 ## python/datafusion/dataframe.py: ## @@ -409,37 +409,62 @@ def except_all(self, other: DataFrame) -> DataFrame: """ return DataFrame(self.df.

Re: [PR] feat: Added DataFrameWriteOptions option when writing as csv, json, p… [datafusion-python]

2024-09-07 Thread via GitHub
allinux commented on code in PR #857: URL: https://github.com/apache/datafusion-python/pull/857#discussion_r1748041887 ## python/datafusion/dataframe.py: ## @@ -409,37 +409,62 @@ def except_all(self, other: DataFrame) -> DataFrame: """ return DataFrame(self.df.

Re: [PR] feat: scalar regex match physical expr [datafusion]

2024-09-07 Thread via GitHub
alamb commented on PR #12270: URL: https://github.com/apache/datafusion/pull/12270#issuecomment-2335147328 Thank you for this PR @zhuliquan . Have you run any benchmarks that show this approach is noticeably faster than the existing approach? It makes sense that it would be faster as it do

Re: [PR] feat: Added DataFrameWriteOptions option when writing as csv, json, p… [datafusion-python]

2024-09-07 Thread via GitHub
allinux commented on code in PR #857: URL: https://github.com/apache/datafusion-python/pull/857#discussion_r1748041401 ## python/datafusion/dataframe.py: ## @@ -409,37 +409,62 @@ def except_all(self, other: DataFrame) -> DataFrame: """ return DataFrame(self.df.

Re: [PR] Unparse TableScan with projections, filters or fetch to SQL string [datafusion]

2024-09-07 Thread via GitHub
alamb merged PR #12158: URL: https://github.com/apache/datafusion/pull/12158 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[I] LargeList and List type coercion not working in `CASE WHEN` [datafusion]

2024-09-07 Thread via GitHub
ion-elgreco opened a new issue, #12370: URL: https://github.com/apache/datafusion/issues/12370 ### Describe the bug Recently we made a change in delta-rs to make the schema's agnostic of large/normal/view types. However datafusion can't properly coerce `List` and `LargeList` yet

Re: [PR] Unparse TableScan with projections, filters or fetch to SQL string [datafusion]

2024-09-07 Thread via GitHub
phillipleblanc commented on code in PR #12158: URL: https://github.com/apache/datafusion/pull/12158#discussion_r1747999070 ## datafusion/sql/src/unparser/plan.rs: ## @@ -532,6 +548,73 @@ impl Unparser<'_> { } } +fn unparse_table_scan_pushdown( +plan:

Re: [PR] Unparse TableScan with projections, filters or fetch to SQL string [datafusion]

2024-09-07 Thread via GitHub
goldmedal commented on code in PR #12158: URL: https://github.com/apache/datafusion/pull/12158#discussion_r1747985197 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -607,6 +609,151 @@ fn sql_round_trip(query: &str, expect: &str) { assert_eq!(roundtrip_statement.to_stri