Re: [PR] fix(datafusion-proto): support serializing/deserilizing ArrowFormat tables [datafusion]

2025-07-25 Thread via GitHub
xudong963 merged PR #16875: URL: https://github.com/apache/datafusion/pull/16875 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Unable to serialize and deserialize scans using ArrowFormat [datafusion]

2025-07-25 Thread via GitHub
xudong963 closed issue #16874: Unable to serialize and deserialize scans using ArrowFormat URL: https://github.com/apache/datafusion/issues/16874 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat: [datafusion-spark] Implement `next_day` function [datafusion]

2025-07-25 Thread via GitHub
petern48 commented on PR #16780: URL: https://github.com/apache/datafusion/pull/16780#issuecomment-3121353753 Nice, I resolved the conflicts -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] WIP: Rewrite NestedLoopJoin to limit intermediate size (up to 2X faster) [datafusion]

2025-07-25 Thread via GitHub
2010YOUY01 commented on PR #16889: URL: https://github.com/apache/datafusion/pull/16889#issuecomment-3121297298 Re: https://github.com/apache/datafusion/pull/16889#issuecomment-3121216474 I think the reason is: `datafusion-cli` won't buffer the final output, and now the NLJ bench will

Re: [PR] WIP: Rewrite NestedLoopJoin to limit intermediate size (up to 2X faster) [datafusion]

2025-07-25 Thread via GitHub
2010YOUY01 commented on PR #16889: URL: https://github.com/apache/datafusion/pull/16889#issuecomment-3121293562 Run extended tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] feat: [datafusion-spark] Implement `next_day` function [datafusion]

2025-07-25 Thread via GitHub
2010YOUY01 commented on PR #16780: URL: https://github.com/apache/datafusion/pull/16780#issuecomment-3121286514 Thanks! I think it's ready to go after the merge conflict is resolved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] [datafusion-spark] Implement Spark `datetime` function `last_day` [datafusion]

2025-07-25 Thread via GitHub
2010YOUY01 closed issue #16774: [datafusion-spark] Implement Spark `datetime` function `last_day` URL: https://github.com/apache/datafusion/issues/16774 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat(spark): implement Spark datetime function last_day [datafusion]

2025-07-25 Thread via GitHub
2010YOUY01 merged PR #16828: URL: https://github.com/apache/datafusion/pull/16828 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] feature flag to statically link liblzma [datafusion]

2025-07-25 Thread via GitHub
rphlo commented on issue #9256: URL: https://github.com/apache/datafusion/issues/9256#issuecomment-3121255053 this need re-opening as same issue arise with now liblzma crate -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Add a "Gentle Introduction to Arrow / Record Batches" [datafusion]

2025-07-25 Thread via GitHub
Adez017 commented on issue #11336: URL: https://github.com/apache/datafusion/issues/11336#issuecomment-3121218014 > Thank you [@Adez017](https://github.com/Adez017) -- I would personally suggest starting by porting some of the contents of https://jorgecarleitao.github.io/arrow2/main/guide/

Re: [PR] WIP: Rewrite NestedLoopJoin to limit intermediate size (up to 2X faster) [datafusion]

2025-07-25 Thread via GitHub
UBarney commented on PR #16889: URL: https://github.com/apache/datafusion/pull/16889#issuecomment-3121216474 > > I'll need some time to read through the code, but reduced memory usage is also impressive 👍 > > > > * v1 (old) > > > > ``` > > Query Time (ms) Peak RSS P

[PR] Blog on Extending SQL to create own SQL Dialects [datafusion-site]

2025-07-25 Thread via GitHub
Adez017 opened a new pull request, #97: URL: https://github.com/apache/datafusion-site/pull/97 Hi @alamb @goldmedal, I have drafted the blog on the topic and need you to review it for suggestions. -- This is an automated message from the Apache Git Service. To respond to the message, plea

[I] Make the max temp directory size (for spills) configurable through configuration API [datafusion]

2025-07-25 Thread via GitHub
2010YOUY01 opened a new issue, #16922: URL: https://github.com/apache/datafusion/issues/16922 ### Is your feature request related to a problem or challenge? See the rationales in https://github.com/apache/datafusion/issues/16921 for a similar config option This issue is for ano

Re: [PR] feat: Use PartialSortExec when input data is sorted on prefix columns [datafusion]

2025-07-25 Thread via GitHub
EeshanBembi commented on PR #16905: URL: https://github.com/apache/datafusion/pull/16905#issuecomment-3121158741 I have done the review changes and updated PR description accordingly! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[I] Make the temporary directory (for spills) configurable through configuration API [datafusion]

2025-07-25 Thread via GitHub
2010YOUY01 opened a new issue, #16921: URL: https://github.com/apache/datafusion/issues/16921 ### Is your feature request related to a problem or challenge? For external executions, the query might write temporary data to disk, and clean them up before the query finished. Now t

Re: [PR] WIP: Limit intermediate data size for NestedLoopJoin (up to 2X faster) [datafusion]

2025-07-25 Thread via GitHub
2010YOUY01 commented on PR #16889: URL: https://github.com/apache/datafusion/pull/16889#issuecomment-3121126110 > I'll need some time to read through the code, but reduced memory usage is also impressive 👍 > > * v1 (old) > > ``` > Query Time (ms) Peak RSS Peak Commi

Re: [PR] feat: Use PartialSortExec when input data is sorted on prefix columns [datafusion]

2025-07-25 Thread via GitHub
EeshanBembi commented on PR #16905: URL: https://github.com/apache/datafusion/pull/16905#issuecomment-3121116387 > `EnforceSorting` rule already handles the conversion to PartialSort, so there shouldn't be any additional work during logical to physical mapping. If you remove the following c

Re: [I] CI: Check broken links in src doc comments [datafusion]

2025-07-25 Thread via GitHub
Adez017 commented on issue #16840: URL: https://github.com/apache/datafusion/issues/16840#issuecomment-3121115039 > hey [@jcsherin](https://github.com/jcsherin) , i think we got a big work to do as of your suggestion i ran the following command : `$fd -e html -e txt -e md . target/doc | xar

Re: [PR] Fix infinite loop in replace_with_partial_sort function (#16899) [datafusion]

2025-07-25 Thread via GitHub
robertream closed pull request #16920: Fix infinite loop in replace_with_partial_sort function (#16899) URL: https://github.com/apache/datafusion/pull/16920 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] fix: update DuckDB and ClickHouse documentation links [datafusion-sqlparser-rs]

2025-07-25 Thread via GitHub
IndexSeek opened a new pull request, #1978: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1978 Closes #1963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[PR] Fix infinite loop in replace_with_partial_sort function (#16899) [datafusion]

2025-07-25 Thread via GitHub
robertream opened a new pull request, #16920: URL: https://github.com/apache/datafusion/pull/16920 ## Summary Fixes issue #16899 - "Entire input is resorted when the data is partially sorted (not using `PartialSortExec`)" This PR addresses a bug in the `replace_with_partial_sort` f

Re: [PR] feat: support multi value column unpivot & alias in unpivot [datafusion-sqlparser-rs]

2025-07-25 Thread via GitHub
chenkovsky commented on code in PR #1969: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1969#discussion_r2232315079 ## tests/sqlparser_common.rs: ## @@ -11022,6 +11022,67 @@ fn parse_unpivot_table() { verified_stmt(sql_unpivot_include_nulls).to_string(),

Re: [I] Add support for `MapSort` expression in Spark 4.0.0 [datafusion-comet]

2025-07-25 Thread via GitHub
parthchandra commented on issue #1941: URL: https://github.com/apache/datafusion-comet/issues/1941#issuecomment-3120675532 Oh well. For Maps, you might want to look at the mapping of Spark's MapType to a Parquet Schema here: https://github.com/apache/datafusion-comet/blob/320ce55eec

Re: [I] the python udaf example cannot print the result [datafusion-python]

2025-07-25 Thread via GitHub
l1t1 commented on issue #1190: URL: https://github.com/apache/datafusion-python/issues/1190#issuecomment-3120671550 thanks, I find your key modification are: > def merge(self, states: list[pa.Array]) -> None: > # not nice since pyarrow scalars can't be summed yet. This breaks

Re: [I] Add support for `MapSort` expression in Spark 4.0.0 [datafusion-comet]

2025-07-25 Thread via GitHub
rishvin commented on issue #1941: URL: https://github.com/apache/datafusion-comet/issues/1941#issuecomment-3120621876 > > managed to scan the map-type by setting `CometConf.COMET_NATIVE_SCAN_IMPL.key -> native_datafusion `. Added `map_sort` UDF with return type as `Map`. > > Right.

Re: [I] arrays_overlap inconsistent behaviour on two arrays with NULL values [datafusion-comet]

2025-07-25 Thread via GitHub
parthchandra commented on issue #2036: URL: https://github.com/apache/datafusion-comet/issues/2036#issuecomment-3120617010 @SparkApplicationMaster There is a `datafusion-spark` [crate](https://github.com/apache/datafusion/tree/main/datafusion/spark) explicitly for the purpose of creating d

Re: [D] Best practices for memory-efficient deduplication of pre-sorted Parquet files [datafusion]

2025-07-25 Thread via GitHub
GitHub user zheniasigayev added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files I created a GitHub issue with relevant details summarized. See: `Streaming Aggregate operator not being used in deduplication of pre-sorted Parquet files`

[I] Streaming Aggregate operator not being used in deduplication of pre-sorted Parquet files [datafusion]

2025-07-25 Thread via GitHub
zheniasigayev opened a new issue, #16919: URL: https://github.com/apache/datafusion/issues/16919 ### Describe the bug See discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files #16776 After investigating an optimal approach to perform dedupli

Re: [I] [EPIC] Complete `datafusion-spark` Spark Compatible Functions [datafusion]

2025-07-25 Thread via GitHub
Standing-Man commented on issue #15914: URL: https://github.com/apache/datafusion/issues/15914#issuecomment-3120460061 > > Hi [@alamb](https://github.com/alamb), I just wanted to clarify: if a Spark function appears in the sqllogictest tests, are we expected to implement it in DataFusion?

Re: [PR] feat: monotonically_increasing_id and spark_partition_id implementation [datafusion-comet]

2025-07-25 Thread via GitHub
parthchandra commented on code in PR #2037: URL: https://github.com/apache/datafusion-comet/pull/2037#discussion_r2232007462 ## spark/src/main/scala/org/apache/comet/serde/nondetermenistic.scala: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] fix: `TrivialValueAccumulators` to ignore null states for `ignore nulls` [datafusion]

2025-07-25 Thread via GitHub
mbutrovich commented on PR #16918: URL: https://github.com/apache/datafusion/pull/16918#issuecomment-3120341095 I'm trying to put together a concise unit test based on the batches I see going through `update_batch` and `merge_batch` in the failing Comet test, but can't make sense of what I'

Re: [PR] fix: `TrivialValueAccumulators` to ignore null states for `ignore nulls` [datafusion]

2025-07-25 Thread via GitHub
alamb commented on PR #16918: URL: https://github.com/apache/datafusion/pull/16918#issuecomment-3120300842 > Existing tests, as I cannot reproduce it in DataFusion I'm not able to add a new test I don't think it is covered by existing tests as they all pass with and without this code

[PR] fix: TrivialValueAccumulators to ignore nulls for `ignore nulls` [datafusion]

2025-07-25 Thread via GitHub
comphead opened a new pull request, #16918: URL: https://github.com/apache/datafusion/pull/16918 ## Which issue does this PR close? - Closes #. Related to https://github.com/apache/datafusion/issues/16235 and https://github.com/apache/datafusion-comet/pull/2040 ## Ra

Re: [I] [EPIC] Complete `datafusion-spark` Spark Compatible Functions [datafusion]

2025-07-25 Thread via GitHub
alamb commented on issue #15914: URL: https://github.com/apache/datafusion/issues/15914#issuecomment-3120134550 > Hi [@alamb](https://github.com/alamb), I just wanted to clarify: if a Spark function appears in the sqllogictest tests, are we expected to implement it in DataFusion? @St

Re: [PR] Add support for Float16 type in substrait [datafusion]

2025-07-25 Thread via GitHub
alamb commented on PR #16793: URL: https://github.com/apache/datafusion/pull/16793#issuecomment-3120123644 I am not quite sure what the next steps for this PR are. @gabotechs do you think there need to be changes, or is https://github.com/apache/datafusion/pull/16793#pullrequestreview-30323

Re: [PR] Fix integration tests not running [datafusion]

2025-07-25 Thread via GitHub
alamb commented on code in PR #16835: URL: https://github.com/apache/datafusion/pull/16835#discussion_r2231878327 ## datafusion/core/tests/parquet/schema_adapter.rs: ## @@ -370,3 +378,321 @@ async fn test_custom_schema_adapter_and_custom_expression_adapter() { ]; asse

Re: [I] Integration tests are not being run [datafusion]

2025-07-25 Thread via GitHub
alamb commented on issue #16801: URL: https://github.com/apache/datafusion/issues/16801#issuecomment-3120115929 https://github.com/user-attachments/assets/d79a7148-65d6-4b7b-8fd1-08ff44c6d909"; /> I think the core problem is that the schema_adapter tests are in a folder that isn't co

Re: [PR] Fix integration tests not running [datafusion]

2025-07-25 Thread via GitHub
alamb commented on code in PR #16835: URL: https://github.com/apache/datafusion/pull/16835#discussion_r2231878327 ## datafusion/core/tests/parquet/schema_adapter.rs: ## @@ -370,3 +378,321 @@ async fn test_custom_schema_adapter_and_custom_expression_adapter() { ]; asse

Re: [PR] Implement Helpers for ScopedTimerGuard and Time Structs [datafusion]

2025-07-25 Thread via GitHub
alamb commented on code in PR #16911: URL: https://github.com/apache/datafusion/pull/16911#discussion_r2231870480 ## datafusion/physical-plan/src/metrics/value.rs: ## @@ -331,6 +341,20 @@ impl ScopedTimerGuard<'_> { pub fn done(mut self) { self.stop() } + +

Re: [PR] Mutable Join Unwind [datafusion]

2025-07-25 Thread via GitHub
alamb merged PR #16883: URL: https://github.com/apache/datafusion/pull/16883 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Ensure Substrait consumer can handle expressions in VirtualTable [datafusion]

2025-07-25 Thread via GitHub
alamb closed issue #16363: Ensure Substrait consumer can handle expressions in VirtualTable URL: https://github.com/apache/datafusion/issues/16363 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Ensure Substrait consumer can handle expressions in VirtualTable [datafusion]

2025-07-25 Thread via GitHub
alamb commented on PR #16857: URL: https://github.com/apache/datafusion/pull/16857#issuecomment-3120083386 Thanks @lorenarosati and @vbarua -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Ensure Substrait consumer can handle expressions in VirtualTable [datafusion]

2025-07-25 Thread via GitHub
alamb merged PR #16857: URL: https://github.com/apache/datafusion/pull/16857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] dissallow pushdown of volatile PhysicalExprs [datafusion]

2025-07-25 Thread via GitHub
alamb commented on code in PR #16861: URL: https://github.com/apache/datafusion/pull/16861#discussion_r2231863904 ## datafusion/physical-optimizer/src/filter_pushdown.rs: ## @@ -485,21 +497,32 @@ fn push_down_filters( // currently. `self_filters` are the predicates whic

Re: [PR] Derive UDF equality from PartialEq, Hash [datafusion]

2025-07-25 Thread via GitHub
findepi merged PR #16842: URL: https://github.com/apache/datafusion/pull/16842 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [PR] Derive UDF equality from PartialEq, Hash [datafusion]

2025-07-25 Thread via GitHub
findepi commented on PR #16842: URL: https://github.com/apache/datafusion/pull/16842#issuecomment-3120067570 Thank you @alamb @kosiew for review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] Derive UDF (`ScalarUDFImpl`) equality from PartialEq, Hash [datafusion]

2025-07-25 Thread via GitHub
findepi closed issue #16865: Derive UDF (`ScalarUDFImpl`) equality from PartialEq, Hash URL: https://github.com/apache/datafusion/issues/16865 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-25 Thread via GitHub
alamb commented on PR #16848: URL: https://github.com/apache/datafusion/pull/16848#issuecomment-3120013976 ✋ ✋ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] Add partial_sort.slt test for partially sorted data [datafusion]

2025-07-25 Thread via GitHub
alamb commented on code in PR #16900: URL: https://github.com/apache/datafusion/pull/16900#discussion_r2231803462 ## datafusion/sqllogictest/test_files/partial_sorts.slt: ## @@ -0,0 +1,132 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor li

Re: [I] Enable comments on datafusion-site via giscus [datafusion-site]

2025-07-25 Thread via GitHub
kevinjqliu commented on issue #80: URL: https://github.com/apache/datafusion-site/issues/80#issuecomment-3119858703 From https://issues.apache.org/jira/browse/INFRA-27070, Apache infra pointed to this example to add CSP https://privacy.apache.org/examples/youtube-html/with-youtube-api.html

[I] Consider deprecate or remove some physical expr helper functions [datafusion]

2025-07-25 Thread via GitHub
waynexia opened a new issue, #16917: URL: https://github.com/apache/datafusion/issues/16917 ### Is your feature request related to a problem or challenge? I find some old helper functions are no longer needed or recommended for use. For example, `physical_exprs_contains` only ha

[PR] We have now the CI ensure all doc strings remain formatted [datafusion]

2025-07-25 Thread via GitHub
yazanmashal03 opened a new pull request, #16916: URL: https://github.com/apache/datafusion/pull/16916 ## Which issue does this PR close? - Closes #16915. ## Rationale for this change This change ensures the CI process has all docs strings remain formatted.

Re: [PR] Derive UDF equality from PartialEq, Hash [datafusion]

2025-07-25 Thread via GitHub
findepi commented on code in PR #16842: URL: https://github.com/apache/datafusion/pull/16842#discussion_r2231669050 ## datafusion/expr/src/utils.rs: ## @@ -1260,6 +1261,42 @@ pub fn collect_subquery_cols( }) } +/// Generates implementation of `equals` and `hash_value` me

Re: [PR] DataFusion `49.0.0` release post [datafusion-site]

2025-07-25 Thread via GitHub
alamb commented on code in PR #91: URL: https://github.com/apache/datafusion-site/pull/91#discussion_r2231626685 ## content/blog/2025-07-28-datafusion-49.0.0.md: ## @@ -0,0 +1,424 @@ +--- +layout: post +title: Apache DataFusion 49.0.0 Released +date: 2025-07-28 +author: pmc +cat

Re: [I] Chore: format documentation examples [datafusion]

2025-07-25 Thread via GitHub
yazanmashal03 commented on issue #16915: URL: https://github.com/apache/datafusion/issues/16915#issuecomment-3119515327 unassign me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] Chore: format documentation examples [datafusion]

2025-07-25 Thread via GitHub
yazanmashal03 commented on issue #16915: URL: https://github.com/apache/datafusion/issues/16915#issuecomment-3119440772 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Blog post on async user defined functions [datafusion-site]

2025-07-25 Thread via GitHub
Adez017 commented on PR #96: URL: https://github.com/apache/datafusion-site/pull/96#issuecomment-3119350385 Hi @alamb @goldmedal I had made a lot of changes and think its ready -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] feat: [datafusion-spark] Implement `next_day` function [datafusion]

2025-07-25 Thread via GitHub
petern48 commented on code in PR #16780: URL: https://github.com/apache/datafusion/pull/16780#discussion_r2231556908 ## datafusion/sqllogictest/test_files/spark/datetime/next_day.slt: ## @@ -23,5 +23,17 @@ ## Original Query: SELECT next_day('2015-01-14', 'TU'); ## PySpark 3.

Re: [PR] feat: [datafusion-spark] Implement `next_day` function [datafusion]

2025-07-25 Thread via GitHub
petern48 commented on code in PR #16780: URL: https://github.com/apache/datafusion/pull/16780#discussion_r223182 ## datafusion/spark/src/function/datetime/mod.rs: ## @@ -15,11 +15,24 @@ // specific language governing permissions and limitations // under the License. +pub

Re: [I] Add a way to get what takes memory [datafusion]

2025-07-25 Thread via GitHub
alamb commented on issue #16904: URL: https://github.com/apache/datafusion/issues/16904#issuecomment-3119143186 Also possibly related: - https://github.com/apache/datafusion/issues/16841 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] Add a way to get what takes memory [datafusion]

2025-07-25 Thread via GitHub
alamb commented on issue #16904: URL: https://github.com/apache/datafusion/issues/16904#issuecomment-3119129331 I agree more details are good -- I think the trick will be figuring out how to do so in such as way that is reasonable to maintain as well as doesn't slow down the normal operatio

Re: [I] Question about string to utf8view when creating table [datafusion]

2025-07-25 Thread via GitHub
alamb commented on issue #16884: URL: https://github.com/apache/datafusion/issues/16884#issuecomment-3119117544 Here is a related discussion - https://github.com/apache/datafusion/issues/16903 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] SGA-11783 Added support for SHOW CHARSET [datafusion-sqlparser-rs]

2025-07-25 Thread via GitHub
iffyio commented on code in PR #1974: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1974#discussion_r2231404829 ## src/ast/mod.rs: ## @@ -3704,6 +3704,20 @@ pub enum Statement { history: bool, show_options: ShowStatementOptions, }, +// `

Re: [PR] feat: support export data for bigquery [datafusion-sqlparser-rs]

2025-07-25 Thread via GitHub
iffyio commented on code in PR #1976: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1976#discussion_r2231396257 ## tests/sqlparser_bigquery.rs: ## @@ -2566,3 +2566,24 @@ fn test_struct_trailing_and_nested_bracket() { ) ); } + +#[test] +fn test_expor

Re: [I] [datafusion-spark] Implement Spark `string` function `luhn_check` [datafusion]

2025-07-25 Thread via GitHub
comphead closed issue #16612: [datafusion-spark] Implement Spark `string` function `luhn_check` URL: https://github.com/apache/datafusion/issues/16612 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [datafusion-spark] Implement Spark `luhn_check` function [datafusion]

2025-07-25 Thread via GitHub
comphead closed pull request #16580: [datafusion-spark] Implement Spark `luhn_check` function URL: https://github.com/apache/datafusion/pull/16580 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-25 Thread via GitHub
comphead merged PR #16848: URL: https://github.com/apache/datafusion/pull/16848 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-25 Thread via GitHub
comphead commented on PR #16848: URL: https://github.com/apache/datafusion/pull/16848#issuecomment-3118473506 Nice teamwork! Thanks everyone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] chore(deps): bump aws-config from 1.8.2 to 1.8.3 [datafusion]

2025-07-25 Thread via GitHub
comphead merged PR #16912: URL: https://github.com/apache/datafusion/pull/16912 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: support multi value column unpivot & alias in unpivot [datafusion-sqlparser-rs]

2025-07-25 Thread via GitHub
iffyio commented on code in PR #1969: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1969#discussion_r2231365111 ## tests/sqlparser_common.rs: ## @@ -11022,6 +11022,67 @@ fn parse_unpivot_table() { verified_stmt(sql_unpivot_include_nulls).to_string(),

Re: [PR] Snowflake: Improve support for reserved keywords for table factor [datafusion-sqlparser-rs]

2025-07-25 Thread via GitHub
iffyio commented on code in PR #1942: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1942#discussion_r2231345244 ## src/dialect/snowflake.rs: ## @@ -427,6 +503,21 @@ impl Dialect for SnowflakeDialect { } } +fn is_table_factor(&self, kw: &Keyword

Re: [PR] fix: begin statement for bigquery [datafusion-sqlparser-rs]

2025-07-25 Thread via GitHub
iffyio merged PR #1975: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1975 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[I] Chore: format documentation examples [datafusion]

2025-07-25 Thread via GitHub
findepi opened a new issue, #16915: URL: https://github.com/apache/datafusion/issues/16915 Format examples in doc strings. Have the CI ensure all doc strings remain formatted. rustfmt has an option for this - https://rust-lang.github.io/rustfmt/?version=v1.8.0&search=#format

Re: [I] [DISCUSSION] Conditional Utf8View support for downstream projects [datafusion]

2025-07-25 Thread via GitHub
zhuqi-lucas commented on issue #16903: URL: https://github.com/apache/datafusion/issues/16903#issuecomment-3118232799 I agree we need a better solution: 1. For high level, currently, we default to mapping sql level string/varchar/char/text to utf8view when we create table, we can use

Re: [PR] Fixes 3 bugs during serialization and deserialization of physical plans [datafusion]

2025-07-25 Thread via GitHub
alamb commented on PR #16858: URL: https://github.com/apache/datafusion/pull/16858#issuecomment-3118107079 Thanks again everyone! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] Only 4 tpc-h queries have matching physical plans before serialization and after deserialization [datafusion]

2025-07-25 Thread via GitHub
alamb closed issue #16772: Only 4 tpc-h queries have matching physical plans before serialization and after deserialization URL: https://github.com/apache/datafusion/issues/16772 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Fixes 3 bugs during serialization and deserialization of physical plans [datafusion]

2025-07-25 Thread via GitHub
alamb merged PR #16858: URL: https://github.com/apache/datafusion/pull/16858 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] the python udaf example cannot print the result [datafusion-python]

2025-07-25 Thread via GitHub
kosiew commented on issue #1190: URL: https://github.com/apache/datafusion-python/issues/1190#issuecomment-3117895992 hi @l1t1 Can you try ```python import pyarrow as pa from datafusion import Accumulator, SessionContext, udaf # Define a user-defined aggregati

Re: [I] [BLOG] Blog post about writing your own SQL dialect / extending SQL with DataFusion [datafusion]

2025-07-25 Thread via GitHub
Adez017 commented on issue #16756: URL: https://github.com/apache/datafusion/issues/16756#issuecomment-3117735378 sure let me begin the work -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Add partial_sort.slt test for partially sorted data [datafusion]

2025-07-25 Thread via GitHub
berkaysynnada commented on code in PR #16900: URL: https://github.com/apache/datafusion/pull/16900#discussion_r2231023621 ## datafusion/sqllogictest/test_files/partial_sorts.slt: ## @@ -0,0 +1,132 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contri

Re: [PR] feat: Use PartialSortExec when input data is sorted on prefix columns [datafusion]

2025-07-25 Thread via GitHub
berkaysynnada commented on PR #16905: URL: https://github.com/apache/datafusion/pull/16905#issuecomment-3117701072 `EnforceSorting` rule already handles the conversion to PartialSort, so there shouldn't be any additional work during logical to physical mapping. If you remove the following c

Re: [PR] feat: Use PartialSortExec when input data is sorted on prefix columns [datafusion]

2025-07-25 Thread via GitHub
alamb commented on PR #16905: URL: https://github.com/apache/datafusion/pull/16905#issuecomment-3117669084 Thank you @EeshanBembi -- this looks nice. I will try and review it over the next few days cc @berkaysynnada -- This is an automated message from the Apache Git Service. To

Re: [PR] Add partial_sort.slt test for partially sorted data [datafusion]

2025-07-25 Thread via GitHub
alamb commented on code in PR #16900: URL: https://github.com/apache/datafusion/pull/16900#discussion_r2230999738 ## datafusion/sqllogictest/test_files/partial_sorts.slt: ## @@ -0,0 +1,132 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor li

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-25 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230996830 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,449 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-25 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230995605 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,449 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [I] [BLOG] Blog post about writing your own SQL dialect / extending SQL with DataFusion [datafusion]

2025-07-25 Thread via GitHub
alamb commented on issue #16756: URL: https://github.com/apache/datafusion/issues/16756#issuecomment-3117653208 > hi [@alamb](https://github.com/alamb) so as of my understanding i should pull the blog in datafusion-site repo Yes please -- This is an automated message from the Apach

Re: [I] [Epic]: Google Summer of Code 2025 Correlated Subquery Support [datafusion]

2025-07-25 Thread via GitHub
alamb commented on issue #16059: URL: https://github.com/apache/datafusion/issues/16059#issuecomment-3117652066 Awesome -- thank you @irenjj Maybe once you have a PR, you can ping me and I can give it a high level look and hear what you plan next. -- This is an automated message f

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-25 Thread via GitHub
rluvaton commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230992741 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -17,11 +17,10 @@ //! Define the `SpillManager` struct, which is responsible for reading and writ

Re: [PR] Add partial_sort.slt test for partially sorted data [datafusion]

2025-07-25 Thread via GitHub
alamb commented on code in PR #16900: URL: https://github.com/apache/datafusion/pull/16900#discussion_r2230992028 ## datafusion/sqllogictest/test_files/partial_sorts.slt: ## @@ -0,0 +1,132 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor li

Re: [I] [Epic]: Google Summer of Code 2025 Correlated Subquery Support [datafusion]

2025-07-25 Thread via GitHub
irenjj commented on issue #16059: URL: https://github.com/apache/datafusion/issues/16059#issuecomment-3117592820 > 👋 [@irenjj](https://github.com/irenjj) and [@duongcongtoai](https://github.com/duongcongtoai) , I hope things are well with you > > It seems as though you have been work

Re: [I] [Epic]: Google Summer of Code 2025 Correlated Subquery Support [datafusion]

2025-07-25 Thread via GitHub
alamb commented on issue #16059: URL: https://github.com/apache/datafusion/issues/16059#issuecomment-3117573049 👋 @irenjj and @duongcongtoai , I hope things are well with you It seems as though you have been working on @duongcongtoai 's fork which is a great idea: https://github.com/d

Re: [PR] Add partial_sort.slt test for partially sorted data [datafusion]

2025-07-25 Thread via GitHub
berkaysynnada commented on code in PR #16900: URL: https://github.com/apache/datafusion/pull/16900#discussion_r2230922586 ## datafusion/sqllogictest/test_files/partial_sorts.slt: ## @@ -0,0 +1,132 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contri

Re: [PR] Add partial_sort.slt test for partially sorted data [datafusion]

2025-07-25 Thread via GitHub
alamb commented on code in PR #16900: URL: https://github.com/apache/datafusion/pull/16900#discussion_r2230908262 ## datafusion/sqllogictest/test_files/partial_sorts.slt: ## @@ -0,0 +1,132 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor li

Re: [PR] Blog post on async user defined functions [datafusion-site]

2025-07-25 Thread via GitHub
alamb commented on PR #96: URL: https://github.com/apache/datafusion-site/pull/96#issuecomment-3117471965 FYI @goldmedal perhaps you are interested in this blog post too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Blog post on async user defined functions [datafusion-site]

2025-07-25 Thread via GitHub
alamb commented on PR #96: URL: https://github.com/apache/datafusion-site/pull/96#issuecomment-3117470691 Thanks @Adez017 -- I will put it on my lst -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Add partial_sort.slt test for partially sorted data [datafusion]

2025-07-25 Thread via GitHub
berkaysynnada commented on PR #16900: URL: https://github.com/apache/datafusion/pull/16900#issuecomment-3117352137 Do you want to see some PartialSortExec in these new tests? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [D] Best practices for memory-efficient deduplication of pre-sorted Parquet files [datafusion]

2025-07-25 Thread via GitHub
GitHub user berkaysynnada added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files > > Yes, please, I actually did some testing today, > > > > * [Entire input is resorted when the data is partially sorted (not using > > `PartialSortExec`

Re: [PR] minor: implement with_new_expressions for AggregateFunctionExpr [datafusion]

2025-07-25 Thread via GitHub
berkaysynnada commented on code in PR #16897: URL: https://github.com/apache/datafusion/pull/16897#discussion_r2230787048 ## datafusion/physical-expr/src/window/window_expr.rs: ## @@ -130,6 +130,12 @@ pub trait WindowExpr: Send + Sync + Debug { /// Get the reverse expressio

Re: [PR] Fix Partial Sort Get Slice Point Between Batches [datafusion]

2025-07-25 Thread via GitHub
berkaysynnada commented on PR #16881: URL: https://github.com/apache/datafusion/pull/16881#issuecomment-3117295272 > The only thing I would also like to see is a SQL level test. However, I was not able to write one as I can't seem to get `PartialSortExec` to appear in the tests. I filed a t

Re: [I] Add a "col_case_preserved" helper function for creating Columns with the case preserved [datafusion]

2025-07-25 Thread via GitHub
niebayes commented on issue #16914: URL: https://github.com/apache/datafusion/issues/16914#issuecomment-3117110420 Update: I recently found a `ident` helper function which exactly suits our usages. But its name is somewhat ambiguous for creating a Column expression. -- This is an automate

  1   2   >