date:20250724

[PR] Create 2025-07-25-async-scaler-udf.md [datafusion-site]

2025-07-24 Thread via GitHub

Adez017 opened a new pull request, #96: URL: https://github.com/apache/datafusion-site/pull/96 Hi @alamb, just finished drafting the basic post with all the things you had mentioned in [#16525](https://github.com/apache/datafusion/issues/16525) . I need you to review for further updates tha

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub

Standing-Man commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2230260434 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -35,3 +35,140 @@ ## PySpark 3.5.5 Result: {'luhn_check(8112189876)': True, 'typeof(

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub

Standing-Man commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2230257567 ## datafusion/spark/src/function/string/luhn_check.rs: ## @@ -0,0 +1,145 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [I] Serialize user defined functions and table providers via protobuf [datafusion-python]

2025-07-24 Thread via GitHub

milenkovicm commented on issue #1181: URL: https://github.com/apache/datafusion-python/issues/1181#issuecomment-3116483390 I believe this could unblock ballista support. I'm not 100% sure but it looks like step in right direction. -- This is an automated message from the Apache Git Serv

Re: [PR] Support utf8view for spark hex [datafusion]

2025-07-24 Thread via GitHub

xudong963 merged PR #16885: URL: https://github.com/apache/datafusion/pull/16885 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Support utf8view for spark hex [datafusion]

2025-07-24 Thread via GitHub

xudong963 commented on PR #16885: URL: https://github.com/apache/datafusion/pull/16885#issuecomment-3116460358 thanks all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Derive UDF equality from PartialEq, Hash [datafusion]

2025-07-24 Thread via GitHub

kosiew commented on code in PR #16842: URL: https://github.com/apache/datafusion/pull/16842#discussion_r2230100602 ## datafusion/expr/src/utils.rs: ## @@ -1260,6 +1261,42 @@ pub fn collect_subquery_cols( }) } +/// Generates implementation of `equals` and `hash_value` met

Re: [PR] Derive UDF equality from PartialEq, Hash [datafusion]

2025-07-24 Thread via GitHub

kosiew commented on code in PR #16842: URL: https://github.com/apache/datafusion/pull/16842#discussion_r2230104338 ## datafusion/expr/src/utils.rs: ## @@ -1260,6 +1261,42 @@ pub fn collect_subquery_cols( }) } +/// Generates implementation of `equals` and `hash_value` met

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub

shehabgamin commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2230086388 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -35,3 +35,140 @@ ## PySpark 3.5.5 Result: {'luhn_check(8112189876)': True, 'typeof(l

Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-24 Thread via GitHub

kosiew commented on PR #16681: URL: https://github.com/apache/datafusion/pull/16681#issuecomment-3116309058 Closing this in favour of https://github.com/apache/datafusion/issues/16677#issuecomment-3092338265 -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-24 Thread via GitHub

kosiew closed pull request #16681: Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic URL: https://github.com/apache/datafusion/pull/16681 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [I] Combine utilities in `SpillManager` [datafusion]

2025-07-24 Thread via GitHub

ding-young commented on issue #16907: URL: https://github.com/apache/datafusion/issues/16907#issuecomment-3116294357 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-24 Thread via GitHub

2010YOUY01 commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230075680 ## datafusion/physical-plan/src/sorts/multi_level_merge.rs: ## @@ -0,0 +1,449 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

[I] Validate the memory consumption in `SortPreservingMergeStream` [datafusion]

2025-07-24 Thread via GitHub

2010YOUY01 opened a new issue, #16909: URL: https://github.com/apache/datafusion/issues/16909 ### Is your feature request related to a problem or challenge? This is a follow-up to: https://github.com/apache/datafusion/pull/15700 Part of https://github.com/apache/datafusion/issues/1

[I] Limit the max merge degree during re-spill in external sort [datafusion]

2025-07-24 Thread via GitHub

2010YOUY01 opened a new issue, #16908: URL: https://github.com/apache/datafusion/issues/16908 ### Is your feature request related to a problem or challenge? This is a follow-up to: https://github.com/apache/datafusion/pull/15700 Part of https://github.com/apache/datafusion/issues/1

[I] Combine utilities in `SpillManager` [datafusion]

2025-07-24 Thread via GitHub

2010YOUY01 opened a new issue, #16907: URL: https://github.com/apache/datafusion/issues/16907 ### Is your feature request related to a problem or challenge? follow-up to: https://github.com/apache/datafusion/pull/15700 This is a simple clean-up idea, see the original discussion

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

2025-07-24 Thread via GitHub

Copilot commented on code in PR #15700: URL: https://github.com/apache/datafusion/pull/15700#discussion_r2230055548 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -17,11 +17,10 @@ //! Define the `SpillManager` struct, which is responsible for reading and writi

Re: [PR] Add benchmark utility to profile peak memory usage [datafusion]

2025-07-24 Thread via GitHub

ding-young commented on PR #16814: URL: https://github.com/apache/datafusion/pull/16814#issuecomment-3116251667 @2010YOUY01 This is ready for review :) I would love to hear your feedback. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Add benchmark utility to profile peak memory usage [datafusion]

2025-07-24 Thread via GitHub

ding-young commented on code in PR #16814: URL: https://github.com/apache/datafusion/pull/16814#discussion_r2230051866 ## benchmarks/README.md: ## @@ -321,6 +322,64 @@ FLAGS: ... ``` +# Profiling Memory Stats for each benchmark query +The `mem_profile` program wraps benchmar

Re: [PR] Benchmark: Add micro-benchmark for Nested Loop Join operator [datafusion]

2025-07-24 Thread via GitHub

2010YOUY01 commented on PR #16819: URL: https://github.com/apache/datafusion/pull/16819#issuecomment-3116228640 > It might be helpful to add a brief description in `benchmarks/README.md`. Also, once this PR is merged, I'll follow up by adding nlj benchmarks to the memory profiling utility (

Re: [PR] Support utf8view for spark hex [datafusion]

2025-07-24 Thread via GitHub

zhuqi-lucas commented on code in PR #16885: URL: https://github.com/apache/datafusion/pull/16885#discussion_r2230025399 ## datafusion/spark/src/function/math/hex.rs: ## @@ -212,6 +215,16 @@ pub fn compute_hex( Ok(ColumnarValue::Array(Arc::new(hexed)))

Re: [I] [EPIC] Complete `datafusion-spark` Spark Compatible Functions [datafusion]

2025-07-24 Thread via GitHub

Standing-Man commented on issue #15914: URL: https://github.com/apache/datafusion/issues/15914#issuecomment-3116117607 Hi @alamb, I just wanted to clarify: if a Spark function appears in the sqllogictest tests, are we expected to implement it in DataFusion? -- This is an automated message

Re: [PR] feat: add `register_metadata` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-07-24 Thread via GitHub

github-actions[bot] commented on PR #15022: URL: https://github.com/apache/datafusion/pull/15022#issuecomment-3116116848 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Transform scalar correlated subqueries in Where to DependentJoin [datafusion]

2025-07-24 Thread via GitHub

github-actions[bot] commented on PR #16174: URL: https://github.com/apache/datafusion/pull/16174#issuecomment-3116116734 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] Question about string to utf8view when creating table [datafusion]

2025-07-24 Thread via GitHub

xudong963 commented on issue #16884: URL: https://github.com/apache/datafusion/issues/16884#issuecomment-3116085852 A draft PR: https://github.com/apache/datafusion/pull/16906 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[PR] Fix create table by values with string, which doesn't respect `string_to_utf8view` config [datafusion]

2025-07-24 Thread via GitHub

xudong963 opened a new pull request, #16906: URL: https://github.com/apache/datafusion/pull/16906 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16884 ## Rationale for this change ## What changes are included in thi

Re: [I] Question about string to utf8view when creating table [datafusion]

2025-07-24 Thread via GitHub

xudong963 commented on issue #16884: URL: https://github.com/apache/datafusion/issues/16884#issuecomment-3116059618 I wanna narrow the implementation to the `create with values` first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Support utf8view for spark hex [datafusion]

2025-07-24 Thread via GitHub

xudong963 commented on code in PR #16885: URL: https://github.com/apache/datafusion/pull/16885#discussion_r2229970123 ## datafusion/spark/src/function/math/hex.rs: ## @@ -212,6 +215,16 @@ pub fn compute_hex( Ok(ColumnarValue::Array(Arc::new(hexed)))

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub

shehabgamin commented on PR #16848: URL: https://github.com/apache/datafusion/pull/16848#issuecomment-3116003842 > Thank you @Standing-Man -- this looks good to me > > > > @shehabgamin does this look good to you (at a high level)? Will review when I'm home in the next fe

Re: [PR] Benchmark: Add micro-benchmark for Nested Loop Join operator [datafusion]

2025-07-24 Thread via GitHub

ding-young commented on PR #16819: URL: https://github.com/apache/datafusion/pull/16819#issuecomment-3115960659 It might be helpful to add a brief description in `benchmarks/README.md`. Also, once this PR is merged, I'll follow up by adding nlj benchmarks to the memory profiling utility (in

Re: [PR] Fix integration tests not running [datafusion]

2025-07-24 Thread via GitHub

kosiew commented on code in PR #16835: URL: https://github.com/apache/datafusion/pull/16835#discussion_r2229960446 ## datafusion/datasource/src/schema_adapter.rs: ## @@ -68,6 +68,8 @@ pub trait SchemaAdapterFactory: Debug + Send + Sync + 'static { ) -> Box { self.

Re: [I] [Blog] Async Scalar User Defined Functions [datafusion]

2025-07-24 Thread via GitHub

Adez017 commented on issue #16525: URL: https://github.com/apache/datafusion/issues/16525#issuecomment-3115698018 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] [BLOG] Blog post about writing your own SQL dialect / extending SQL with DataFusion [datafusion]

2025-07-24 Thread via GitHub

Adez017 commented on issue #16756: URL: https://github.com/apache/datafusion/issues/16756#issuecomment-3115628901 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[PR] feat: Use PartialSortExec when input data is sorted on prefix columns [datafusion]

2025-07-24 Thread via GitHub

EeshanBembi opened a new pull request, #16905: URL: https://github.com/apache/datafusion/pull/16905 ## Summary This PR optimizes sort execution by automatically using `PartialSortExec` instead of `SortExec` when input data is already sorted on a prefix of the requested sort columns.

Re: [I] Entire input is resorted when the data is partially sorted (not using `PartialSortExec`) [datafusion]

2025-07-24 Thread via GitHub

EeshanBembi commented on issue #16899: URL: https://github.com/apache/datafusion/issues/16899#issuecomment-3115467757 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub

Standing-Man commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2229851563 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -15,23 +15,114 @@ # specific language governing permissions and limitations # under

Re: [I] Discussion: DataFusion Improvement Proposal (DIPs) Process? [datafusion]

2025-07-24 Thread via GitHub

phillipleblanc commented on issue #16886: URL: https://github.com/apache/datafusion/issues/16886#issuecomment-3115391379 I agree that format voting/approval doesn't make sense yet. Also having a structured way to propose "larger" changes that incorporates all relevant context for rev

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub

Standing-Man commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2229840064 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -15,23 +15,134 @@ # specific language governing permissions and limitations # under

Re: [PR] feat: support datetime_field as expr for bigquery [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub

chenkovsky closed pull request #1971: feat: support datetime_field as expr for bigquery URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] arrays_overlap inconsistent behaviour on two arrays with NULL values [datafusion-comet]

2025-07-24 Thread via GitHub

SparkApplicationMaster commented on issue #2036: URL: https://github.com/apache/datafusion-comet/issues/2036#issuecomment-3115360009 @coderfender thanks for response! Actually, there is a whole bunch of such inconsistencies in different spark functions between datafusion output and w

Re: [I] [DISCUSSION] Memory accounting model discussion [datafusion]

2025-07-24 Thread via GitHub

waynexia commented on issue #16841: URL: https://github.com/apache/datafusion/issues/16841#issuecomment-3115349716 >I believe @waynexia added just such an API in Datafusion is a very good example and use case for that API. I haven't fully figured out how to evolve that API, like wheth

Re: [I] arrays_overlap inconsistent behaviour on two arrays with NULL values [datafusion-comet]

2025-07-24 Thread via GitHub

coderfender commented on issue #2036: URL: https://github.com/apache/datafusion-comet/issues/2036#issuecomment-3115338305 I can take this issue up if you arent already working on this @SparkApplicationMaster ? -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] feat: support datetime_field as expr for bigquery [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub

chenkovsky commented on code in PR #1971: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1971#discussion_r2229741103 ## tests/sqlparser_bigquery.rs: ## @@ -2566,3 +2566,101 @@ fn test_struct_trailing_and_nested_bracket() { ) ); } + +#[test] +fn test_

Re: [D] Best practices for memory-efficient deduplication of pre-sorted Parquet files [datafusion]

2025-07-24 Thread via GitHub

GitHub user zheniasigayev added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files > Yes, please, I actually did some testing today, > > * [Entire input is resorted when the data is partially sorted (not using > `PartialSortExec`) #16899

Re: [I] Add a way to get what takes memory [datafusion]

2025-07-24 Thread via GitHub

rluvaton commented on issue #16904: URL: https://github.com/apache/datafusion/issues/16904#issuecomment-3115074500 The problem with that is there is no breakdown on what the memory is actually spent on in each consumer -- This is an automated message from the Apache Git Service. To respo

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub

waynexia commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229628338 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub

waynexia commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229625741 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub

waynexia commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229609157 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_

Re: [I] Add a way to get what takes memory [datafusion]

2025-07-24 Thread via GitHub

zheniasigayev commented on issue #16904: URL: https://github.com/apache/datafusion/issues/16904#issuecomment-3114928847 Within the DataFusion CLI there is a flag called: ``` --top-memory-consumers The number of top memory consumers to display when query fails due to memory

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub

findepi commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229557133 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_t

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub

findepi commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229550787 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_t

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-24 Thread via GitHub

adamreeve commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3114903866 > Thanks [@XiangpengHao](https://github.com/XiangpengHao) -- do you think we should disable the crypto feature by default? > > cc [@corwinjoy](https://github.com/corwinj

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub

findepi commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229547803 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_t

Re: [I] Add support for `MapSort` expression in Spark 4.0.0 [datafusion-comet]

2025-07-24 Thread via GitHub

parthchandra commented on issue #1941: URL: https://github.com/apache/datafusion-comet/issues/1941#issuecomment-3114897910 > managed to scan the map-type by setting `CometConf.COMET_NATIVE_SCAN_IMPL.key -> native_datafusion `. Added `map_sort` UDF with return type as `Map`. Right. `

Re: [PR] fix : cast_operands_to_decimal_type_to_fix_arithmetic_overflow [datafusion-comet]

2025-07-24 Thread via GitHub

coderfender commented on PR #1996: URL: https://github.com/apache/datafusion-comet/pull/1996#issuecomment-3114876952 Casting data types to DoubleType which would be translated to f64 in Rust produces incorrect results since the results are casted back to int64 / LongType capping the value

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub

comphead commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2229534219 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -15,23 +15,114 @@ # specific language governing permissions and limitations # under the

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub

comphead commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2229534219 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -15,23 +15,114 @@ # specific language governing permissions and limitations # under the

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub

comphead commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2229528720 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -15,23 +15,114 @@ # specific language governing permissions and limitations # under the

Re: [D] Best practices for memory-efficient deduplication of pre-sorted Parquet files [datafusion]

2025-07-24 Thread via GitHub

GitHub user zheniasigayev added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files Both queries use `mode=Partial`. Addressing Question / Query 1) ``` +---+

Re: [PR] dissallow pushdown of volatile PhysicalExprs [datafusion]

2025-07-24 Thread via GitHub

adriangb commented on code in PR #16861: URL: https://github.com/apache/datafusion/pull/16861#discussion_r2229503058 ## datafusion/physical-optimizer/src/filter_pushdown.rs: ## @@ -485,21 +497,32 @@ fn push_down_filters( // currently. `self_filters` are the predicates w

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub

Omega359 commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229497695 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_

Re: [PR] remove deprecated methods from FileScanConfig / DataSourceExec [datafusion]

2025-07-24 Thread via GitHub

adriangb merged PR #16901: URL: https://github.com/apache/datafusion/pull/16901 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] remove deprecated methods from FileScanConfig / DataSourceExec [datafusion]

2025-07-24 Thread via GitHub

blaginin commented on PR #16901: URL: https://github.com/apache/datafusion/pull/16901#issuecomment-3114808149 would love to help! feel free ping in discord -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] remove deprecated methods from FileScanConfig / DataSourceExec [datafusion]

2025-07-24 Thread via GitHub

adriangb commented on PR #16901: URL: https://github.com/apache/datafusion/pull/16901#issuecomment-3114805481 thanks @alamb @blaginin ! @blaginin I might have to consult with you for some further work I'm trying to get across in this space -- This is an automated message from the A

[I] Add a way to get what takes memory [datafusion]

2025-07-24 Thread via GitHub

rluvaton opened a new issue, #16904: URL: https://github.com/apache/datafusion/issues/16904 ### Is your feature request related to a problem or challenge? Yes, debugging memory problems are hard, when running DF in production and the memory pool does not able to grow the memory it wil

Re: [PR] Fixes 3 bugs during serialization and deserialization of physical plans [datafusion]

2025-07-24 Thread via GitHub

NGA-TRAN commented on code in PR #16858: URL: https://github.com/apache/datafusion/pull/16858#discussion_r2229483527 ## datafusion/core/src/physical_planner.rs: ## @@ -1358,6 +1358,9 @@ impl DefaultPhysicalPlanner { physical_name(expr), ))?]

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-24 Thread via GitHub

XiangpengHao commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3114784223 > > Update 1: I have to disable the `encryption` feature in Parquet to make it work: https://github.com/apache/datafusion/blob/main/Cargo.toml#L162 > > Thanks [@Xiang

Re: [PR] chore(deps): Update sqlparser to 0.56 [datafusion]

2025-07-24 Thread via GitHub

alamb commented on PR #16456: URL: https://github.com/apache/datafusion/pull/16456#issuecomment-3114784536 0.58.0 is released: https://github.com/apache/datafusion-sqlparser-rs/issues/1886#issuecomment-3114709826 -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Fixes 3 bugs during serialization and deserialization of physical plans [datafusion]

2025-07-24 Thread via GitHub

alamb commented on code in PR #16858: URL: https://github.com/apache/datafusion/pull/16858#discussion_r2229471200 ## datafusion/core/src/physical_planner.rs: ## @@ -1358,6 +1358,9 @@ impl DefaultPhysicalPlanner { physical_name(expr), ))?])),

Re: [PR] feat(spark): Implement Spark `string` function `luhn_check` [datafusion]

2025-07-24 Thread via GitHub

alamb commented on code in PR #16848: URL: https://github.com/apache/datafusion/pull/16848#discussion_r2229467559 ## datafusion/sqllogictest/test_files/spark/string/luhn_check.slt: ## @@ -15,23 +15,114 @@ # specific language governing permissions and limitations # under the Li

Re: [D] Best practices for memory-efficient deduplication of pre-sorted Parquet files [datafusion]

2025-07-24 Thread via GitHub

GitHub user alamb added a comment to the discussion: Best practices for memory-efficient deduplication of pre-sorted Parquet files Yes, please, I actually did some testing today, - https://github.com/apache/datafusion/issues/16899 - https://github.com/apache/datafusion/pull/16900 What I would

Re: [I] [DISCUSSION] Conditional Utf8View support for downstream projects [datafusion]

2025-07-24 Thread via GitHub

alamb commented on issue #16903: URL: https://github.com/apache/datafusion/issues/16903#issuecomment-3114730477 I think @findepi has brought this up when looking at a better way to generate functions -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] [DISCUSSION] Conditional Utf8View support for downstream projects [datafusion]

2025-07-24 Thread via GitHub

alamb commented on issue #16903: URL: https://github.com/apache/datafusion/issues/16903#issuecomment-3114729829 I agree we should have a better plan about this Another thing that I would like to consider is allowing users to pick which of the several string representations to generat

Re: [I] Release sqlparser-rs version `0.58.0` around 2025-07-18 (was 2024-08-15) [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub

alamb commented on issue #1886: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1886#issuecomment-3114709826 Thanks to @viirya and @comphead the release has been approved! The release is available here: https://dist.apache.org/repos/dist/release/datafusion/datafusi

Re: [I] Release sqlparser-rs version `0.58.0` around 2025-07-18 (was 2024-08-15) [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub

alamb closed issue #1886: Release sqlparser-rs version `0.58.0` around 2025-07-18 (was 2024-08-15) URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1886 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[I] [DISCUSSION] Conditional Utf8View support for downstream projects [datafusion]

2025-07-24 Thread via GitHub

comphead opened a new issue, #16903: URL: https://github.com/apache/datafusion/issues/16903 ### Is your feature request related to a problem or challenge? Datafusion Comet encountered a migration issue when upgrading to DataFusion 49, caused by https://github.com/apache/datafusion/pul

Re: [I] Enable comments on datafusion-site via giscus [datafusion-site]

2025-07-24 Thread via GitHub

kevinjqliu commented on issue #80: URL: https://github.com/apache/datafusion-site/issues/80#issuecomment-3114551179 There's precedence to add new CSP. Let's see what Apache infra says -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Enable comments on datafusion-site via giscus [datafusion-site]

2025-07-24 Thread via GitHub

alamb commented on issue #80: URL: https://github.com/apache/datafusion-site/issues/80#issuecomment-3114540526 Oh, but I see that requires making a new github app -- that sounds complicated -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] fix: regex bench [datafusion]

2025-07-24 Thread via GitHub

Omega359 commented on PR #16890: URL: https://github.com/apache/datafusion/pull/16890#issuecomment-3114533566 LGTM, thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Enable comments on datafusion-site via giscus [datafusion-site]

2025-07-24 Thread via GitHub

alamb commented on issue #80: URL: https://github.com/apache/datafusion-site/issues/80#issuecomment-3114535175 Or maybe we can host the .js directly on our site? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] Make `AsyncScalarUDFImpl::invoke_async_with_args` consistent with `ScalarUDFImpl::invoke_with_args` [datafusion]

2025-07-24 Thread via GitHub

geetanshjuneja opened a new pull request, #16902: URL: https://github.com/apache/datafusion/pull/16902 ## Which issue does this PR close? - Closes #16896. ## Rationale for this change ## What changes are included in this PR? Changed `Asy

Re: [PR] remove deprecated methods from FileScanConfig / DataSourceExec [datafusion]

2025-07-24 Thread via GitHub

alamb commented on code in PR #16901: URL: https://github.com/apache/datafusion/pull/16901#discussion_r2229318658 ## datafusion/datasource/src/file_scan_config.rs: ## @@ -706,28 +706,6 @@ impl FileScanConfig { } } -/// Set the file source -#[deprecated(si

Re: [PR] SGA-11419 Added snowflake ability for if not exists after create view… [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub

iffyio commented on code in PR #1961: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1961#discussion_r2229297004 ## tests/sqlparser_common.rs: ## @@ -16183,3 +16190,21 @@ fn test_identifier_unicode_start() { ]); let _ = dialects.verified_stmt(sql); } + +

Re: [PR] Snowflake: DROP STREAM [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub

iffyio merged PR #1973: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1973 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] feat: support datetime_field as expr for bigquery [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub

iffyio commented on code in PR #1971: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1971#discussion_r2229289934 ## tests/sqlparser_bigquery.rs: ## @@ -2566,3 +2566,101 @@ fn test_struct_trailing_and_nested_bracket() { ) ); } + +#[test] +fn test_date

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub

findepi commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229281161 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_t

[PR] remove deprecated methods from FileScanConfig / DataSourceExec [datafusion]

2025-07-24 Thread via GitHub

adriangb opened a new pull request, #16901: URL: https://github.com/apache/datafusion/pull/16901 I spoke with @alamb about continuing to refactor these traits with the goal of eventually getting projection pushdown. We agreed tugging at loose ends of the knot is a good place to start, and

Re: [PR] Improve async_udf example and docs [datafusion]

2025-07-24 Thread via GitHub

alamb commented on code in PR #16846: URL: https://github.com/apache/datafusion/pull/16846#discussion_r2229267643 ## datafusion-examples/examples/async_udf.rs: ## @@ -15,104 +15,104 @@ // specific language governing permissions and limitations // under the License. -use arro

Re: [PR] fix: begin statement for bigquery [datafusion-sqlparser-rs]

2025-07-24 Thread via GitHub

iffyio commented on code in PR #1975: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1975#discussion_r2229265625 ## src/dialect/bigquery.rs: ## @@ -47,6 +48,15 @@ pub struct BigQueryDialect; impl Dialect for BigQueryDialect { fn parse_statement(&self, parser:

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub

findepi commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229234066 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_t

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub

findepi commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229264131 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_t

Re: [PR] Fix Partial Sort Get Slice Point Between Batches [datafusion]

2025-07-24 Thread via GitHub

alamb commented on code in PR #16881: URL: https://github.com/apache/datafusion/pull/16881#discussion_r2229200729 ## datafusion/physical-plan/src/sorts/partial_sort.rs: ## @@ -375,34 +375,52 @@ impl PartialSortStream { return Poll::Ready(None); } l

Re: [I] Entire input is resorted when the data is partially sorted (not using `PartialSortExec`) [datafusion]

2025-07-24 Thread via GitHub

alamb commented on issue #16899: URL: https://github.com/apache/datafusion/issues/16899#issuecomment-3114447318 I made a PR with some tests here: - https://github.com/apache/datafusion/pull/16900 -- This is an automated message from the Apache Git Service. To respond to the message, ple

[PR] Add partial_sort.slt test for partially sorted data [datafusion]

2025-07-24 Thread via GitHub

alamb opened a new pull request, #16900: URL: https://github.com/apache/datafusion/pull/16900 ## Which issue does this PR close? - A test for https://github.com/apache/datafusion/issues/16899 ## Rationale for this change I was trying to write some tests for https

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub

waynexia commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229256985 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_

Re: [I] Enable comments on datafusion-site via giscus [datafusion-site]

2025-07-24 Thread via GitHub

kevinjqliu commented on issue #80: URL: https://github.com/apache/datafusion-site/issues/80#issuecomment-311756 https://issues.apache.org/jira/browse/INFRA-27070 Request CSP exception for Giscus comments on datafusion.apache.org -- This is an automated message from the Apache Gi

[I] PartialSortExec not used even when it could be [datafusion]

2025-07-24 Thread via GitHub

alamb opened a new issue, #16899: URL: https://github.com/apache/datafusion/issues/16899 ### Describe the bug When data is sorted on a prefix, but not all, of the input columns I expect DataFusion to use the faster / more memory efficient operator `PartialSortExec`: https://github.c

Re: [I] Enable comments on datafusion-site via giscus [datafusion-site]

2025-07-24 Thread via GitHub

kevinjqliu commented on issue #80: URL: https://github.com/apache/datafusion-site/issues/80#issuecomment-3114435954 I tried a few workarounds suggested by LLM, none of them worked - use iframe - download `giscus-client.js` and serve from repo ``` Refused to frame 'https://gis

Re: [PR] speedup `date_trunc` (~7x faster) in some cases [datafusion]

2025-07-24 Thread via GitHub

findepi commented on code in PR #16859: URL: https://github.com/apache/datafusion/pull/16859#discussion_r2229234066 ## datafusion/functions/src/datetime/date_trunc.rs: ## @@ -185,6 +187,21 @@ impl ScalarUDFImpl for DateTruncFunc { ) -> Result { let parsed_t

Re: [PR] MINOR: add unit tests for chr function [datafusion]

2025-07-24 Thread via GitHub

waynexia commented on PR #16856: URL: https://github.com/apache/datafusion/pull/16856#issuecomment-3114405140 Thank you for reviewing @findepi @alamb :heart: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

1 2 3 >

1 - 100 of 238 matches

Mail list logo