[PR] Create 2025-07-25-async-scaler-udf.md [datafusion-site]

2025-07-24 Thread via GitHub
Adez017 opened a new pull request, #96: URL: https://github.com/apache/datafusion-site/pull/96 Hi @alamb, just finished drafting the basic post with all the things you had mentioned in [#16525](https://github.com/apache/datafusion/issues/16525) . I need you to review for further updates tha

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
Standing-Man commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2230260434 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -35,3 +35,140 @@ ## PySpark 3.5.5 Result: {'luhn_check(8112189876)': True, 'typeof(

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
Standing-Man commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2230257567 ## datafusion/spark/src/function/string/luhn_check.rs: ## @@ -0,0 +1,145 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [I] Serialize user defined functions and table providers via protobuf [datafusion-python]

2025-07-24 Thread via GitHub
milenkovicm commented on issue #1181: URL: https://github.com/apache/datafusion-python/issues/1181#issuecomment-3116483390 I believe this could unblock ballista support. I'm not 100% sure but it looks like step in right direction. -- This is an automated message from the Apache Git Serv

Re: [PR] Support utf8view for spark hex [datafusion]

2025-07-24 Thread via GitHub
xudong963 merged PR #16885: URL: https://github.com/apache/datafusion/pull/16885 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Support utf8view for spark hex [datafusion]

2025-07-24 Thread via GitHub
xudong963 commented on PR #16885: URL: https://github.com/apache/datafusion/pull/16885#issuecomment-3116460358 thanks all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Derive UDF equality from PartialEq, Hash [datafusion]

2025-07-24 Thread via GitHub
kosiew commented on code in PR #16842: URL: https://github.com/apache/datafusion/pull/16842#discussion_r2230100602 ## datafusion/expr/src/utils.rs: ## @@ -1260,6 +1261,42 @@ pub fn collect_subquery_cols( }) } +/// Generates implementation of `equals` and `hash_value` met

Re: [PR] Derive UDF equality from PartialEq, Hash [datafusion]

2025-07-24 Thread via GitHub
kosiew commented on code in PR #16842: URL: https://github.com/apache/datafusion/pull/16842#discussion_r2230104338 ## datafusion/expr/src/utils.rs: ## @@ -1260,6 +1261,42 @@ pub fn collect_subquery_cols( }) } +/// Generates implementation of `equals` and `hash_value` met

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
shehabgamin commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2230086388 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -35,3 +35,140 @@ ## PySpark 3.5.5 Result: {'luhn_check(8112189876)': True, 'typeof(l

Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-24 Thread via GitHub
kosiew commented on PR #16681: URL: https://github.com/apache/datafusion/pull/16681#issuecomment-3116309058 Closing this in favour of https://github.com/apache/datafusion/issues/16677#issuecomment-3092338265 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-24 Thread via GitHub
kosiew closed pull request #16681: Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic URL: https://github.com/apache/datafusion/pull/16681 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [I] Combine utilities in `SpillManager` [datafusion]

2025-07-24 Thread via GitHub
ding-young commented on issue #16907: URL: https://github.com/apache/datafusion/issues/16907#issuecomment-3116294357 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-24 Thread via GitHub
2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230075680 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,449 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

[I] Validate the memory consumption in `SortPreservingMergeStream` [datafusion]

2025-07-24 Thread via GitHub
2010YOUY01 opened a new issue, #16909: URL: https://github.com/apache/datafusion/issues/16909 ### Is your feature request related to a problem or challenge? This is a follow-up to: https://github.com/apache/datafusion/pull/15700 Part of https://github.com/apache/datafusion/issues/1

[I] Limit the max merge degree during re-spill in external sort [datafusion]

2025-07-24 Thread via GitHub
2010YOUY01 opened a new issue, #16908: URL: https://github.com/apache/datafusion/issues/16908 ### Is your feature request related to a problem or challenge? This is a follow-up to: https://github.com/apache/datafusion/pull/15700 Part of https://github.com/apache/datafusion/issues/1

[I] Combine utilities in `SpillManager` [datafusion]

2025-07-24 Thread via GitHub
2010YOUY01 opened a new issue, #16907: URL: https://github.com/apache/datafusion/issues/16907 ### Is your feature request related to a problem or challenge? follow-up to: https://github.com/apache/datafusion/pull/15700 This is a simple clean-up idea, see the original discussion

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-24 Thread via GitHub
Copilot commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230055548 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -17,11 +17,10 @@ //! Define the `SpillManager` struct, which is responsible for reading and writi

Re: [PR] Add benchmark utility to profile peak memory usage [datafusion]

2025-07-24 Thread via GitHub
ding-young commented on PR #16814: URL: https://github.com/apache/datafusion/pull/16814#issuecomment-3116251667 @2010YOUY01 This is ready for review :) I would love to hear your feedback. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Add benchmark utility to profile peak memory usage [datafusion]

2025-07-24 Thread via GitHub
ding-young commented on code in PR #16814: URL: https://github.com/apache/datafusion/pull/16814#discussion_r2230051866 ## benchmarks/README.md: ## @@ -321,6 +322,64 @@ FLAGS: ... ``` +# Profiling Memory Stats for each benchmark query +The `mem_profile` program wraps benchmar

Re: [PR] Benchmark: Add micro-benchmark for Nested Loop Join operator [datafusion]

2025-07-24 Thread via GitHub
2010YOUY01 commented on PR #16819: URL: https://github.com/apache/datafusion/pull/16819#issuecomment-3116228640 > It might be helpful to add a brief description in `benchmarks/README.md`. Also, once this PR is merged, I'll follow up by adding nlj benchmarks to the memory profiling utility (

Re: [PR] Support utf8view for spark hex [datafusion]

2025-07-24 Thread via GitHub
zhuqi-lucas commented on code in PR #16885: URL: https://github.com/apache/datafusion/pull/16885#discussion_r2230025399 ## datafusion/spark/src/function/math/hex.rs: ## @@ -212,6 +215,16 @@ pub fn compute_hex( Ok(ColumnarValue::Array(Arc::new(hexed)))

Re: [I] [EPIC] Complete `datafusion-spark` Spark Compatible Functions [datafusion]

2025-07-24 Thread via GitHub
Standing-Man commented on issue #15914: URL: https://github.com/apache/datafusion/issues/15914#issuecomment-3116117607 Hi @alamb, I just wanted to clarify: if a Spark function appears in the sqllogictest tests, are we expected to implement it in DataFusion? -- This is an automated message

Re: [PR] feat: add `register_metadata` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-07-24 Thread via GitHub
github-actions[bot] commented on PR #15022: URL: https://github.com/apache/datafusion/pull/15022#issuecomment-3116116848 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Transform scalar correlated subqueries in Where to DependentJoin [datafusion]

2025-07-24 Thread via GitHub
github-actions[bot] commented on PR #16174: URL: https://github.com/apache/datafusion/pull/16174#issuecomment-3116116734 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] Question about string to utf8view when creating table [datafusion]

2025-07-24 Thread via GitHub
xudong963 commented on issue #16884: URL: https://github.com/apache/datafusion/issues/16884#issuecomment-3116085852 A draft PR: https://github.com/apache/datafusion/pull/16906 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[PR] Fix create table by values with string, which doesn't respect `string_to_utf8view` config [datafusion]

2025-07-24 Thread via GitHub
xudong963 opened a new pull request, #16906: URL: https://github.com/apache/datafusion/pull/16906 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16884 ## Rationale for this change ## What changes are included in thi

Re: [I] Question about string to utf8view when creating table [datafusion]

2025-07-24 Thread via GitHub
xudong963 commented on issue #16884: URL: https://github.com/apache/datafusion/issues/16884#issuecomment-3116059618 I wanna narrow the implementation to the `create with values` first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Support utf8view for spark hex [datafusion]

2025-07-24 Thread via GitHub
xudong963 commented on code in PR #16885: URL: https://github.com/apache/datafusion/pull/16885#discussion_r2229970123 ## datafusion/spark/src/function/math/hex.rs: ## @@ -212,6 +215,16 @@ pub fn compute_hex( Ok(ColumnarValue::Array(Arc::new(hexed)))

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
shehabgamin commented on PR #16848: URL: https://github.com/apache/datafusion/pull/16848#issuecomment-3116003842 > Thank you @Standing-Man -- this looks good to me > > > > @shehabgamin does this look good to you (at a high level)? Will review when I'm home in the next fe

Re: [PR] Benchmark: Add micro-benchmark for Nested Loop Join operator [datafusion]

2025-07-24 Thread via GitHub
ding-young commented on PR #16819: URL: https://github.com/apache/datafusion/pull/16819#issuecomment-3115960659 It might be helpful to add a brief description in `benchmarks/README.md`. Also, once this PR is merged, I'll follow up by adding nlj benchmarks to the memory profiling utility (in

Re: [PR] Fix integration tests not running [datafusion]

2025-07-24 Thread via GitHub
kosiew commented on code in PR #16835: URL: https://github.com/apache/datafusion/pull/16835#discussion_r2229960446 ## datafusion/datasource/src/schema_adapter.rs: ## @@ -68,6 +68,8 @@ pub trait SchemaAdapterFactory: Debug + Send + Sync + 'static { ) -> Box { self.

Re: [I] [Blog] Async Scalar User Defined Functions [datafusion]

2025-07-24 Thread via GitHub
Adez017 commented on issue #16525: URL: https://github.com/apache/datafusion/issues/16525#issuecomment-3115698018 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] [BLOG] Blog post about writing your own SQL dialect / extending SQL with DataFusion [datafusion]

2025-07-24 Thread via GitHub
Adez017 commented on issue #16756: URL: https://github.com/apache/datafusion/issues/16756#issuecomment-3115628901 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[PR] feat: Use PartialSortExec when input data is sorted on prefix columns [datafusion]

2025-07-24 Thread via GitHub
EeshanBembi opened a new pull request, #16905: URL: https://github.com/apache/datafusion/pull/16905 ## Summary This PR optimizes sort execution by automatically using `PartialSortExec` instead of `SortExec` when input data is already sorted on a prefix of the requested sort columns.

Re: [I] Entire input is resorted when the data is partially sorted (not using `PartialSortExec`) [datafusion]

2025-07-24 Thread via GitHub
EeshanBembi commented on issue #16899: URL: https://github.com/apache/datafusion/issues/16899#issuecomment-3115467757 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
Standing-Man commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2229851563 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -15,23 +15,114 @@ # specific language governing permissions and limitations # under

Re: [I] Discussion: DataFusion Improvement Proposal (DIPs) Process? [datafusion]

2025-07-24 Thread via GitHub
phillipleblanc commented on issue #16886: URL: https://github.com/apache/datafusion/issues/16886#issuecomment-3115391379 I agree that format voting/approval doesn't make sense yet. Also having a structured way to propose "larger" changes that incorporates all relevant context for rev

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
Standing-Man commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2229840064 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -15,23 +15,134 @@ # specific language governing permissions and limitations # under

Re: [PR] feat: support datetime_field as expr for bigquery [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub
chenkovsky closed pull request #1971: feat: support datetime_field as expr for bigquery URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] arrays_overlap inconsistent behaviour on two arrays with NULL values [datafusion-comet]

2025-07-24 Thread via GitHub
SparkApplicationMaster commented on issue #2036: URL: https://github.com/apache/datafusion-comet/issues/2036#issuecomment-3115360009 @coderfender thanks for response! Actually, there is a whole bunch of such inconsistencies in different spark functions between datafusion output and w

Re: [I] [DISCUSSION] Memory accounting model discussion [datafusion]

2025-07-24 Thread via GitHub
waynexia commented on issue #16841: URL: https://github.com/apache/datafusion/issues/16841#issuecomment-3115349716 >I believe @waynexia added just such an API in Datafusion is a very good example and use case for that API. I haven't fully figured out how to evolve that API, like wheth

Re: [I] arrays_overlap inconsistent behaviour on two arrays with NULL values [datafusion-comet]

2025-07-24 Thread via GitHub
coderfender commented on issue #2036: URL: https://github.com/apache/datafusion-comet/issues/2036#issuecomment-3115338305 I can take this issue up if you arent already working on this @SparkApplicationMaster ? -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] feat: support datetime_field as expr for bigquery [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub
chenkovsky commented on code in PR #1971: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1971#discussion_r2229741103 ## tests/sqlparser_bigquery.rs: ## @@ -2566,3 +2566,101 @@ fn test_struct_trailing_and_nested_bracket() { ) ); } + +#[test] +fn test_

Re: [D] Best practices for memory-efficient deduplication of pre-sorted Parquet files [datafusion]

2025-07-24 Thread via GitHub
GitHub user zheniasigayev added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files > Yes, please, I actually did some testing today, > > * [Entire input is resorted when the data is partially sorted (not using > `PartialSortExec`) #16899

Re: [I] Add a way to get what takes memory [datafusion]

2025-07-24 Thread via GitHub
rluvaton commented on issue #16904: URL: https://github.com/apache/datafusion/issues/16904#issuecomment-3115074500 The problem with that is there is no breakdown on what the memory is actually spent on in each consumer -- This is an automated message from the Apache Git Service. To respo

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub
waynexia commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229628338 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub
waynexia commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229625741 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub
waynexia commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229609157 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_

Re: [I] Add a way to get what takes memory [datafusion]

2025-07-24 Thread via GitHub
zheniasigayev commented on issue #16904: URL: https://github.com/apache/datafusion/issues/16904#issuecomment-3114928847 Within the DataFusion CLI there is a flag called: ``` --top-memory-consumers The number of top memory consumers to display when query fails due to memory

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub
findepi commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229557133 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_t

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub
findepi commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229550787 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_t

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-24 Thread via GitHub
adamreeve commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3114903866 > Thanks [@XiangpengHao](https://github.com/XiangpengHao) -- do you think we should disable the crypto feature by default? > > cc [@corwinjoy](https://github.com/corwinj

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub
findepi commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229547803 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_t

Re: [I] Add support for `MapSort` expression in Spark 4.0.0 [datafusion-comet]

2025-07-24 Thread via GitHub
parthchandra commented on issue #1941: URL: https://github.com/apache/datafusion-comet/issues/1941#issuecomment-3114897910 > managed to scan the map-type by setting `CometConf.COMET_NATIVE_SCAN_IMPL.key -> native_datafusion `. Added `map_sort` UDF with return type as `Map`. Right. `

Re: [PR] fix : cast_operands_to_decimal_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-24 Thread via GitHub
coderfender commented on PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#issuecomment-3114876952 Casting data types to DoubleType which would be translated to f64 in Rust produces incorrect results since the results are casted back to int64 / LongType capping the value

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
comphead commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2229534219 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -15,23 +15,114 @@ # specific language governing permissions and limitations # under the

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
comphead commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2229534219 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -15,23 +15,114 @@ # specific language governing permissions and limitations # under the

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
comphead commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2229528720 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -15,23 +15,114 @@ # specific language governing permissions and limitations # under the

Re: [D] Best practices for memory-efficient deduplication of pre-sorted Parquet files [datafusion]

2025-07-24 Thread via GitHub
GitHub user zheniasigayev added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files Both queries use `mode=Partial`. Addressing Question / Query 1) ``` +---+

Re: [PR] dissallow pushdown of volatile PhysicalExprs [datafusion]

2025-07-24 Thread via GitHub
adriangb commented on code in PR #16861: URL: https://github.com/apache/datafusion/pull/16861#discussion_r2229503058 ## datafusion/physical-optimizer/src/filter_pushdown.rs: ## @@ -485,21 +497,32 @@ fn push_down_filters( // currently. `self_filters` are the predicates w

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub
Omega359 commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229497695 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_

Re: [PR] remove deprecated methods from FileScanConfig / DataSourceExec [datafusion]

2025-07-24 Thread via GitHub
adriangb merged PR #16901: URL: https://github.com/apache/datafusion/pull/16901 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] remove deprecated methods from FileScanConfig / DataSourceExec [datafusion]

2025-07-24 Thread via GitHub
blaginin commented on PR #16901: URL: https://github.com/apache/datafusion/pull/16901#issuecomment-3114808149 would love to help! feel free ping in discord -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] remove deprecated methods from FileScanConfig / DataSourceExec [datafusion]

2025-07-24 Thread via GitHub
adriangb commented on PR #16901: URL: https://github.com/apache/datafusion/pull/16901#issuecomment-3114805481 thanks @alamb @blaginin ! @blaginin I might have to consult with you for some further work I'm trying to get across in this space -- This is an automated message from the A

[I] Add a way to get what takes memory [datafusion]

2025-07-24 Thread via GitHub
rluvaton opened a new issue, #16904: URL: https://github.com/apache/datafusion/issues/16904 ### Is your feature request related to a problem or challenge? Yes, debugging memory problems are hard, when running DF in production and the memory pool does not able to grow the memory it wil

Re: [PR] Fixes 3 bugs during serialization and deserialization of physical plans [datafusion]

2025-07-24 Thread via GitHub
NGA-TRAN commented on code in PR #16858: URL: https://github.com/apache/datafusion/pull/16858#discussion_r2229483527 ## datafusion/core/src/physical_planner.rs: ## @@ -1358,6 +1358,9 @@ impl DefaultPhysicalPlanner { physical_name(expr), ))?]

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-24 Thread via GitHub
XiangpengHao commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3114784223 > > Update 1: I have to disable the `encryption` feature in Parquet to make it work: https://github.com/apache/datafusion/blob/main/Cargo.toml#L162 > > Thanks [@Xiang

Re: [PR] chore(deps): Update sqlparser to 0.56 [datafusion]

2025-07-24 Thread via GitHub
alamb commented on PR #16456: URL: https://github.com/apache/datafusion/pull/16456#issuecomment-3114784536 0.58.0 is released: https://github.com/apache/datafusion-sqlparser-rs/issues/1886#issuecomment-3114709826 -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Fixes 3 bugs during serialization and deserialization of physical plans [datafusion]

2025-07-24 Thread via GitHub
alamb commented on code in PR #16858: URL: https://github.com/apache/datafusion/pull/16858#discussion_r2229471200 ## datafusion/core/src/physical_planner.rs: ## @@ -1358,6 +1358,9 @@ impl DefaultPhysicalPlanner { physical_name(expr), ))?])),

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub
alamb commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2229467559 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -15,23 +15,114 @@ # specific language governing permissions and limitations # under the Li

Re: [D] Best practices for memory-efficient deduplication of pre-sorted Parquet files [datafusion]

2025-07-24 Thread via GitHub
GitHub user alamb added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files Yes, please, I actually did some testing today, - https://github.com/apache/datafusion/issues/16899 - https://github.com/apache/datafusion/pull/16900 What I would

Re: [I] [DISCUSSION] Conditional Utf8View support for downstream projects [datafusion]

2025-07-24 Thread via GitHub
alamb commented on issue #16903: URL: https://github.com/apache/datafusion/issues/16903#issuecomment-3114730477 I think @findepi has brought this up when looking at a better way to generate functions -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] [DISCUSSION] Conditional Utf8View support for downstream projects [datafusion]

2025-07-24 Thread via GitHub
alamb commented on issue #16903: URL: https://github.com/apache/datafusion/issues/16903#issuecomment-3114729829 I agree we should have a better plan about this Another thing that I would like to consider is allowing users to pick which of the several string representations to generat

Re: [I] Release sqlparser-rs version `0.58.0` around 2025-07-18 (was 2024-08-15) [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub
alamb commented on issue #1886: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1886#issuecomment-3114709826 Thanks to @viirya and @comphead the release has been approved! The release is available here: https://dist.apache.org/repos/dist/release/datafusion/datafusi

Re: [I] Release sqlparser-rs version `0.58.0` around 2025-07-18 (was 2024-08-15) [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub
alamb closed issue #1886: Release sqlparser-rs version `0.58.0` around 2025-07-18 (was 2024-08-15) URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1886 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[I] [DISCUSSION] Conditional Utf8View support for downstream projects [datafusion]

2025-07-24 Thread via GitHub
comphead opened a new issue, #16903: URL: https://github.com/apache/datafusion/issues/16903 ### Is your feature request related to a problem or challenge? Datafusion Comet encountered a migration issue when upgrading to DataFusion 49, caused by https://github.com/apache/datafusion/pul

Re: [I] Enable comments on datafusion-site via giscus [datafusion-site]

2025-07-24 Thread via GitHub
kevinjqliu commented on issue #80: URL: https://github.com/apache/datafusion-site/issues/80#issuecomment-3114551179 There's precedence to add new CSP. Let's see what Apache infra says -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Enable comments on datafusion-site via giscus [datafusion-site]

2025-07-24 Thread via GitHub
alamb commented on issue #80: URL: https://github.com/apache/datafusion-site/issues/80#issuecomment-3114540526 Oh, but I see that requires making a new github app -- that sounds complicated -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] fix: regex bench [datafusion]

2025-07-24 Thread via GitHub
Omega359 commented on PR #16890: URL: https://github.com/apache/datafusion/pull/16890#issuecomment-3114533566 LGTM, thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Enable comments on datafusion-site via giscus [datafusion-site]

2025-07-24 Thread via GitHub
alamb commented on issue #80: URL: https://github.com/apache/datafusion-site/issues/80#issuecomment-3114535175 Or maybe we can host the .js directly on our site? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] Make `AsyncScalarUDFImpl::invoke_async_with_args` consistent with `ScalarUDFImpl::invoke_with_args` [datafusion]

2025-07-24 Thread via GitHub
geetanshjuneja opened a new pull request, #16902: URL: https://github.com/apache/datafusion/pull/16902 ## Which issue does this PR close? - Closes #16896. ## Rationale for this change ## What changes are included in this PR? Changed `Asy

Re: [PR] remove deprecated methods from FileScanConfig / DataSourceExec [datafusion]

2025-07-24 Thread via GitHub
alamb commented on code in PR #16901: URL: https://github.com/apache/datafusion/pull/16901#discussion_r2229318658 ## datafusion/datasource/src/file_scan_config.rs: ## @@ -706,28 +706,6 @@ impl FileScanConfig { } } -/// Set the file source -#[deprecated(si

Re: [PR] SGA-11419 Added snowflake ability for if not exists after create view… [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub
iffyio commented on code in PR #1961: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1961#discussion_r2229297004 ## tests/sqlparser_common.rs: ## @@ -16183,3 +16190,21 @@ fn test_identifier_unicode_start() { ]); let _ = dialects.verified_stmt(sql); } + +

Re: [PR] Snowflake: DROP STREAM [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub
iffyio merged PR #1973: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1973 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] feat: support datetime_field as expr for bigquery [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub
iffyio commented on code in PR #1971: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1971#discussion_r2229289934 ## tests/sqlparser_bigquery.rs: ## @@ -2566,3 +2566,101 @@ fn test_struct_trailing_and_nested_bracket() { ) ); } + +#[test] +fn test_date

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub
findepi commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229281161 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_t

[PR] remove deprecated methods from FileScanConfig / DataSourceExec [datafusion]

2025-07-24 Thread via GitHub
adriangb opened a new pull request, #16901: URL: https://github.com/apache/datafusion/pull/16901 I spoke with @alamb about continuing to refactor these traits with the goal of eventually getting projection pushdown. We agreed tugging at loose ends of the knot is a good place to start, and

Re: [PR] Improve async_udf example and docs [datafusion]

2025-07-24 Thread via GitHub
alamb commented on code in PR #16846: URL: https://github.com/apache/datafusion/pull/16846#discussion_r2229267643 ## datafusion-examples/examples/async_udf.rs: ## @@ -15,104 +15,104 @@ // specific language governing permissions and limitations // under the License. -use arro

Re: [PR] fix: begin statement for bigquery [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub
iffyio commented on code in PR #1975: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1975#discussion_r2229265625 ## src/dialect/bigquery.rs: ## @@ -47,6 +48,15 @@ pub struct BigQueryDialect; impl Dialect for BigQueryDialect { fn parse_statement(&self, parser:

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub
findepi commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229234066 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_t

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub
findepi commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229264131 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_t

Re: [PR] Fix Partial Sort Get Slice Point Between Batches [datafusion]

2025-07-24 Thread via GitHub
alamb commented on code in PR #16881: URL: https://github.com/apache/datafusion/pull/16881#discussion_r2229200729 ## datafusion/physical-plan/src/sorts/partial_sort.rs: ## @@ -375,34 +375,52 @@ impl PartialSortStream { return Poll::Ready(None); } l

Re: [I] Entire input is resorted when the data is partially sorted (not using `PartialSortExec`) [datafusion]

2025-07-24 Thread via GitHub
alamb commented on issue #16899: URL: https://github.com/apache/datafusion/issues/16899#issuecomment-3114447318 I made a PR with some tests here: - https://github.com/apache/datafusion/pull/16900 -- This is an automated message from the Apache Git Service. To respond to the message, ple

[PR] Add partial_sort.slt test for partially sorted data [datafusion]

2025-07-24 Thread via GitHub
alamb opened a new pull request, #16900: URL: https://github.com/apache/datafusion/pull/16900 ## Which issue does this PR close? - A test for https://github.com/apache/datafusion/issues/16899 ## Rationale for this change I was trying to write some tests for https

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub
waynexia commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229256985 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_

Re: [I] Enable comments on datafusion-site via giscus [datafusion-site]

2025-07-24 Thread via GitHub
kevinjqliu commented on issue #80: URL: https://github.com/apache/datafusion-site/issues/80#issuecomment-311756 https://issues.apache.org/jira/browse/INFRA-27070 Request CSP exception for Giscus comments on datafusion.apache.org -- This is an automated message from the Apache Gi

[I] PartialSortExec not used even when it could be [datafusion]

2025-07-24 Thread via GitHub
alamb opened a new issue, #16899: URL: https://github.com/apache/datafusion/issues/16899 ### Describe the bug When data is sorted on a prefix, but not all, of the input columns I expect DataFusion to use the faster / more memory efficient operator `PartialSortExec`: https://github.c

Re: [I] Enable comments on datafusion-site via giscus [datafusion-site]

2025-07-24 Thread via GitHub
kevinjqliu commented on issue #80: URL: https://github.com/apache/datafusion-site/issues/80#issuecomment-3114435954 I tried a few workarounds suggested by LLM, none of them worked - use iframe - download `giscus-client.js` and serve from repo ``` Refused to frame 'https://gis

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub
findepi commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229234066 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_t

Re: [PR] MINOR: add unit tests for chr function [datafusion]

2025-07-24 Thread via GitHub
waynexia commented on PR #16856: URL: https://github.com/apache/datafusion/pull/16856#issuecomment-3114405140 Thank you for reviewing @findepi @alamb :heart: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

  1   2   3   >