Re: [I] [Epic] Prepared Statement Support [datafusion]

2024-07-12 Thread via GitHub
lewiszlw commented on issue #4539: URL: https://github.com/apache/datafusion/issues/4539#issuecomment-2224955156 What's the best way to handle `?` placeholder in datafusion? For example, `select * from t where a = ?`, it could be converted to a logical plan in datafusion, but the plan can

Re: [PR] fix: Spark-4.0 widening type support [datafusion-comet]

2024-07-12 Thread via GitHub
kazuyukitanimura commented on code in PR #604: URL: https://github.com/apache/datafusion-comet/pull/604#discussion_r1675488328 ## core/src/parquet/read/column.rs: ## @@ -124,19 +129,81 @@ impl ColumnReader { bit_width, is

Re: [I] Prototype implementing DataFusion functions / operators using `arrow-udf` liibrary [datafusion]

2024-07-12 Thread via GitHub
xinlifoobar commented on issue #11413: URL: https://github.com/apache/datafusion/issues/11413#issuecomment-2225109745 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] fix: Optimize some functions to rewrite dictionary-encoded strings [datafusion-comet]

2024-07-12 Thread via GitHub
vaibhawvipul commented on code in PR #627: URL: https://github.com/apache/datafusion-comet/pull/627#discussion_r1675535745 ## native/core/src/execution/datafusion/expressions/scalar_funcs/hex.rs: ## @@ -118,65 +118,36 @@ pub(super) fn spark_hex(args: &[ColumnarValue]) -> Result

[PR] fix: make sure JOIN ON expression is boolean type [datafusion]

2024-07-12 Thread via GitHub
jonahgao opened a new pull request, #11423: URL: https://github.com/apache/datafusion/pull/11423 ## Which issue does this PR close? Closes #11414. ## Rationale for this change The query in the issue generates the following plan after some optimizations. ```sql selec

Re: [PR] Add parser option enable_options_value_normalization [datafusion]

2024-07-12 Thread via GitHub
xinlifoobar commented on PR #11330: URL: https://github.com/apache/datafusion/pull/11330#issuecomment-2225134330 > Hi again @xinlifoobar. The idea in my mind was something like this: [synnada-ai@341b484](https://github.com/synnada-ai/datafusion-upstream/commit/341b48458c09f25d2a5873fcee2fce1

Re: [PR] fix: make sure JOIN ON expression is boolean type [datafusion]

2024-07-12 Thread via GitHub
jonahgao commented on code in PR #11423: URL: https://github.com/apache/datafusion/pull/11423#discussion_r1675556217 ## datafusion/sqllogictest/test_files/join.slt: ## @@ -998,11 +997,22 @@ CREATE TABLE t2 (v0 DOUBLE) AS VALUES (-1.663563947387); statement ok CREATE TABLE t3 (

Re: [I] [DISCUSSION] Support for Streaming in DataFusion [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #11404: URL: https://github.com/apache/datafusion/issues/11404#issuecomment-2225183104 > was to add general-purpose functionality upstream and keep stream processing focused features downstream > I think the consensus reached at the time of https://github.com/apa

Re: [I] ASOF join support / Specialize Range Joins [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #318: URL: https://github.com/apache/datafusion/issues/318#issuecomment-2225184278 FWIW this came up at InfluxData recently and we are considering investing more in this area. I will keep this ticket updated -- This is an automated message from the Apache Git Serv

Re: [I] Add StreamingWindowExec to DataFusion physical plan to support aggregations over unbounded data [datafusion]

2024-07-12 Thread via GitHub
ozankabak commented on issue #11366: URL: https://github.com/apache/datafusion/issues/11366#issuecomment-2225204324 To help as best as I can, let me first reiterate my understanding of your use case: You have a streaming source, which has some columns like `speed` and `altitude`, but your `

Re: [I] Support sort pushdown [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #7871: URL: https://github.com/apache/datafusion/issues/7871#issuecomment-2225209422 A usecase from discord https://discord.com/channels/885562378132000778/1166447479609376850/1261096613565304884 Basically if you have multiple indexes that can provide the data

[PR] Minor: remove duplicated select [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 opened a new pull request, #11424: URL: https://github.com/apache/datafusion/pull/11424 ## Which issue does this PR close? Closes #. ## Rationale for this change First, I think this is a duplicated select expression, since `.sql("select count(*) from t

Re: [PR] Minor: remove duplicated select [datafusion]

2024-07-12 Thread via GitHub
jonahgao commented on code in PR #11424: URL: https://github.com/apache/datafusion/pull/11424#discussion_r1675619263 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -212,7 +212,6 @@ async fn test_count_wildcard_on_aggregate() -> Result<()> { let sql_results = ctx

Re: [PR] fix: Optimize some functions to rewrite dictionary-encoded strings [datafusion-comet]

2024-07-12 Thread via GitHub
vaibhawvipul commented on code in PR #627: URL: https://github.com/apache/datafusion-comet/pull/627#discussion_r1675535745 ## native/core/src/execution/datafusion/expressions/scalar_funcs/hex.rs: ## @@ -118,65 +118,36 @@ pub(super) fn spark_hex(args: &[ColumnarValue]) -> Result

[PR] Docs: Document creating new extension APIs [datafusion]

2024-07-12 Thread via GitHub
alamb opened a new pull request, #11425: URL: https://github.com/apache/datafusion/pull/11425 ## Which issue does this PR close? Part of #7013 ## Rationale for this change @ozankabak's comments on https://github.com/apache/datafusion/issues/11404 https://github.com/apac

Re: [I] [DISCUSSION] Support for Streaming in DataFusion [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #11404: URL: https://github.com/apache/datafusion/issues/11404#issuecomment-2225282450 I also tried to document some of how the Extension API process works here: https://github.com/apache/datafusion/pull/11425 -- This is an automated message from the Apache Git Se

Re: [I] Feature request: Support for lateral joins [datafusion]

2024-07-12 Thread via GitHub
aalexandrov commented on issue #10048: URL: https://github.com/apache/datafusion/issues/10048#issuecomment-2225287585 @alamb If nobody else is actively working on this and there is interest in adding this feature I will take a stab at sketching an implementation next week and report back if

Re: [PR] Docs: Document creating new extension APIs [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11425: URL: https://github.com/apache/datafusion/pull/11425#discussion_r1675658447 ## docs/source/contributor-guide/architecture.md: ## @@ -25,3 +25,54 @@ possible. You can find the most up to date version in the [source code]. [crates.io docum

[PR] Combine the Roadmap / Quarterly Roadmap sections [datafusion]

2024-07-12 Thread via GitHub
alamb opened a new pull request, #11426: URL: https://github.com/apache/datafusion/pull/11426 ## Which issue does this PR close? Part of #7013 ## Rationale for this change The contributor guide is getting long so let's try and combine some of the sections ## Wha

Re: [PR] Combine the Roadmap / Quarterly Roadmap sections [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11426: URL: https://github.com/apache/datafusion/pull/11426#discussion_r1675662334 ## docs/source/contributor-guide/roadmap.md: ## @@ -43,3 +43,84 @@ start a conversation using a github issue or the make review efficient and avoid surprises. [T

[PR] Minor: Consolidate specificataion doc sections [datafusion]

2024-07-12 Thread via GitHub
alamb opened a new pull request, #11427: URL: https://github.com/apache/datafusion/pull/11427 ## Which issue does this PR close? N/A ## Rationale for this change It is strange to have the specifications intro text on a different page than the actual specifications

[PR] Minor: fix labeler rules [datafusion]

2024-07-12 Thread via GitHub
alamb opened a new pull request, #11428: URL: https://github.com/apache/datafusion/pull/11428 ## Which issue does this PR close? ## Rationale for this change The automatic labeler job doesn't seem to work for documentation changes or development process (for example,

Re: [PR] Improve `CommonSubexprEliminate` rule with surely and conditionally evaluated stats [datafusion]

2024-07-12 Thread via GitHub
alamb merged PR #11357: URL: https://github.com/apache/datafusion/pull/11357 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Constructing and Destructing Objects (JSON) [datafusion]

2024-07-12 Thread via GitHub
alamb closed issue #6631: Constructing and Destructing Objects (JSON) URL: https://github.com/apache/datafusion/issues/6631 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] Constructing and Destructing Objects (JSON) [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #6631: URL: https://github.com/apache/datafusion/issues/6631#issuecomment-2225325177 I think this is covered by https://github.com/apache/datafusion/issues/7845 so let's keep the discussion going there -- This is an automated message from the Apache Git Service. T

[I] [EPIC] A collection of issues for supporting the `MAP` DataType [datafusion]

2024-07-12 Thread via GitHub
alamb opened a new issue, #11429: URL: https://github.com/apache/datafusion/issues/11429 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like - [ ] https://github.com/apache/datafusion/issues/11128 - [ ] http

Re: [PR] Implement ScalarFunction `MAKE_MAP` and `MAP` [datafusion]

2024-07-12 Thread via GitHub
alamb commented on PR #11361: URL: https://github.com/apache/datafusion/pull/11361#issuecomment-2225331434 > Thanks @jayzhan211 @alamb. If needed, I can help to file the follow-up issues tonight. Thank yoU @goldmedal -- that would be awesome. I started collecting Map related tickets

Re: [PR] remove termtree dependency [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11416: URL: https://github.com/apache/datafusion/pull/11416#discussion_r1675687544 ## datafusion/physical-plan/src/aggregates/topk/heap.rs: ## @@ -462,7 +486,7 @@ mod tests { let mut heap = TopKHeap::new(10, false); heap.append_or

Re: [PR] fix(11397): surface proper errors in ParquetSink [datafusion]

2024-07-12 Thread via GitHub
alamb merged PR #11399: URL: https://github.com/apache/datafusion/pull/11399 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Improve error messages for parallel parquet writer "Unable to send array to writer!" [datafusion]

2024-07-12 Thread via GitHub
alamb closed issue #11397: Improve error messages for parallel parquet writer "Unable to send array to writer!" URL: https://github.com/apache/datafusion/issues/11397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] remove termtree dependency [datafusion]

2024-07-12 Thread via GitHub
Kev1n8 commented on code in PR #11416: URL: https://github.com/apache/datafusion/pull/11416#discussion_r1675691877 ## datafusion/physical-plan/src/aggregates/topk/heap.rs: ## @@ -361,9 +385,9 @@ impl HeapItem { impl Debug for HeapItem { fn fmt(&self, f: &mut Formatter<'_>)

Re: [PR] remove termtree dependency [datafusion]

2024-07-12 Thread via GitHub
Kev1n8 commented on code in PR #11416: URL: https://github.com/apache/datafusion/pull/11416#discussion_r1675692321 ## datafusion/physical-plan/src/aggregates/topk/heap.rs: ## @@ -462,7 +486,7 @@ mod tests { let mut heap = TopKHeap::new(10, false); heap.append_o

Re: [PR] Update pbjson-types requirement from 0.6 to 0.7 [datafusion]

2024-07-12 Thread via GitHub
alamb closed pull request #11406: Update pbjson-types requirement from 0.6 to 0.7 URL: https://github.com/apache/datafusion/pull/11406 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Update pbjson-types requirement from 0.6 to 0.7 [datafusion]

2024-07-12 Thread via GitHub
alamb commented on PR #11406: URL: https://github.com/apache/datafusion/pull/11406#issuecomment-2225344949 Part of https://github.com/apache/datafusion/pull/11372 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Update pbjson-types requirement from 0.6 to 0.7 [datafusion]

2024-07-12 Thread via GitHub
dependabot[bot] commented on PR #11406: URL: https://github.com/apache/datafusion/pull/11406#issuecomment-2225345007 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version

Re: [PR] fix: make sure JOIN ON expression is boolean type [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11423: URL: https://github.com/apache/datafusion/pull/11423#discussion_r1675699247 ## datafusion/sql/src/relation/join.rs: ## @@ -107,7 +110,20 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { JoinConstraint::On(sql_expr) => {

[PR] Minor: Add note about SQLLancer fuzz testing to docs [datafusion]

2024-07-12 Thread via GitHub
alamb opened a new pull request, #11430: URL: https://github.com/apache/datafusion/pull/11430 ## Which issue does this PR close? Part of #7013 ## Rationale for this change @2010YOUY01 had done some great work for SQLLancer -- see https://github.com/apache/datafusion/is

Re: [I] Implement SQLancer (a end-to-end SQL fuzz testing library) [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #11030: URL: https://github.com/apache/datafusion/issues/11030#issuecomment-2225379057 Filed https://github.com/apache/datafusion/pull/11430 to note this on the docs Also posted on twitter: https://twitter.com/andrewlamb/status/1811725290801963475

Re: [PR] remove termtree dependency [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11416: URL: https://github.com/apache/datafusion/pull/11416#discussion_r1675723753 ## datafusion/physical-plan/src/aggregates/topk/heap.rs: ## @@ -361,9 +385,9 @@ impl HeapItem { impl Debug for HeapItem { fn fmt(&self, f: &mut Formatter<'_>)

Re: [PR] Minor: remove duplicated select [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on code in PR #11424: URL: https://github.com/apache/datafusion/pull/11424#discussion_r1675728286 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -212,7 +212,6 @@ async fn test_count_wildcard_on_aggregate() -> Result<()> { let sql_results = ctx

Re: [PR] Minor: remove duplicated select [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on code in PR #11424: URL: https://github.com/apache/datafusion/pull/11424#discussion_r1675728286 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -212,7 +212,6 @@ async fn test_count_wildcard_on_aggregate() -> Result<()> { let sql_results = ctx

Re: [PR] Minor: remove duplicated select [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on code in PR #11424: URL: https://github.com/apache/datafusion/pull/11424#discussion_r1675728286 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -212,7 +212,6 @@ async fn test_count_wildcard_on_aggregate() -> Result<()> { let sql_results = ctx

Re: [PR] Minor: remove duplicated select [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on code in PR #11424: URL: https://github.com/apache/datafusion/pull/11424#discussion_r1675728286 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -212,7 +212,6 @@ async fn test_count_wildcard_on_aggregate() -> Result<()> { let sql_results = ctx

Re: [PR] Avoid calling shutdown after failed write of AsyncWrite [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11415: URL: https://github.com/apache/datafusion/pull/11415#discussion_r1675730066 ## datafusion/core/src/datasource/file_format/write/orchestration.rs: ## @@ -50,7 +50,7 @@ pub(crate) async fn serialize_rb_stream_to_object_store( mut data_rx:

Re: [PR] chore: Refactoring of CometError/SparkError [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove merged PR #655: URL: https://github.com/apache/datafusion-comet/pull/655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] Minor: remove duplicated select [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11424: URL: https://github.com/apache/datafusion/pull/11424#discussion_r1675739488 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -212,7 +212,6 @@ async fn test_count_wildcard_on_aggregate() -> Result<()> { let sql_results = ctx .sq

Re: [PR] Avoid calling shutdown after failed write of AsyncWrite [datafusion]

2024-07-12 Thread via GitHub
joroKr21 commented on code in PR #11415: URL: https://github.com/apache/datafusion/pull/11415#discussion_r1675755702 ## datafusion/core/src/datasource/file_format/write/orchestration.rs: ## @@ -50,7 +50,7 @@ pub(crate) async fn serialize_rb_stream_to_object_store( mut data_

Re: [PR] Docs: Document creating new extension APIs [datafusion]

2024-07-12 Thread via GitHub
ozankabak commented on PR #11425: URL: https://github.com/apache/datafusion/pull/11425#issuecomment-2225469912 This looks good to me. I want to note the typical outlook on extension APIs we have: Extension APIs that provide "safe" default behaviors are more likely to be suitable for

Re: [I] DataFusion weekly project plan (Andrew Lamb) - July 8, 2024 [datafusion]

2024-07-12 Thread via GitHub
alamb commented on issue #11334: URL: https://github.com/apache/datafusion/issues/11334#issuecomment-2225489914 Review Queue Arrow - [ ] https://github.com/apache/arrow-rs/pull/6046 - [ ] https://github.com/apache/arrow-rs/pull/6045 - [ ] https://github.com/apache/arrow-rs/pul

Re: [PR] Update sqlparser requirement from 0.47 to 0.48 [datafusion]

2024-07-12 Thread via GitHub
alamb commented on PR #11377: URL: https://github.com/apache/datafusion/pull/11377#issuecomment-2225493065 @tisonkun I think this one might be signficantly easier than the last sqlparser upgrade 😄 -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Short term way to make `AggregateStatistics` still work when min/max is converted to udaf [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11261: URL: https://github.com/apache/datafusion/pull/11261#discussion_r1675789494 ## datafusion/core/src/physical_optimizer/aggregate_statistics.rs: ## @@ -273,6 +263,44 @@ fn take_optimizable_max( None } +fn is_non_distinct_count(agg_expr

Re: [PR] Support serialization/deserialization for custom physical exprs in proto [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11387: URL: https://github.com/apache/datafusion/pull/11387#discussion_r1675793468 ## datafusion/proto/tests/cases/roundtrip_physical_plan.rs: ## @@ -658,6 +662,147 @@ async fn roundtrip_parquet_exec_with_table_partition_cols() -> Result<()> {

Re: [PR] Improve `CommonSubexprEliminate` rule with surely and conditionally evaluated stats [datafusion]

2024-07-12 Thread via GitHub
peter-toth commented on PR #11357: URL: https://github.com/apache/datafusion/pull/11357#issuecomment-2225510256 Thanks for the review @alamb! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] feat: add raw aggregate udf planner [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11371: URL: https://github.com/apache/datafusion/pull/11371#discussion_r1675797038 ## datafusion/expr/src/planner.rs: ## @@ -161,6 +162,28 @@ pub trait ExprPlanner: Send + Sync { ) -> Result>> { Ok(PlannerResult::Original(args))

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-12 Thread via GitHub
timsaucer commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1675852887 ## python/datafusion/udf.py: ## @@ -0,0 +1,62 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-12 Thread via GitHub
timsaucer commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1675856084 ## python/datafusion/substrait.py: ## @@ -15,9 +15,156 @@ # specific language governing permissions and limitations # under the License. +from __future__

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-12 Thread via GitHub
timsaucer commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1675870247 ## python/datafusion/context.py: ## @@ -0,0 +1,1167 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-12 Thread via GitHub
timsaucer commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1675871415 ## python/datafusion/context.py: ## @@ -0,0 +1,1167 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-12 Thread via GitHub
timsaucer commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1675873313 ## python/datafusion/substrait.py: ## @@ -15,9 +15,156 @@ # specific language governing permissions and limitations # under the License. +from __future__

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-12 Thread via GitHub
timsaucer commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1675874142 ## python/datafusion/context.py: ## @@ -0,0 +1,1167 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements

[PR] feat: Upgrade to DataFusion 40 [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove opened a new pull request, #657: URL: https://github.com/apache/datafusion-comet/pull/657 ## Which issue does this PR close? N/A ## Rationale for this change DataFusion 40 was just released, so we don't need to use the release candidate now.

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-12 Thread via GitHub
timsaucer commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1675874679 ## python/datafusion/context.py: ## @@ -0,0 +1,1167 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements

Re: [PR] Add SessionStateBuilder and extract out the registration of defaults [datafusion]

2024-07-12 Thread via GitHub
Omega359 commented on code in PR #11403: URL: https://github.com/apache/datafusion/pull/11403#discussion_r1675887462 ## datafusion/core/src/execution/session_state.rs: ## @@ -195,122 +196,10 @@ impl SessionState { runtime: Arc, catalog_list: Arc, ) -> Self

Re: [PR] chore: Move `cast` to `spark-expr` crate [datafusion-comet]

2024-07-12 Thread via GitHub
codecov-commenter commented on PR #654: URL: https://github.com/apache/datafusion-comet/pull/654#issuecomment-2225593324 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/654?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [PR] Add SessionStateBuilder and extract out the registration of defaults [datafusion]

2024-07-12 Thread via GitHub
alamb commented on code in PR #11403: URL: https://github.com/apache/datafusion/pull/11403#discussion_r1675852702 ## datafusion/core/src/execution/session_state.rs: ## @@ -195,122 +196,10 @@ impl SessionState { runtime: Arc, catalog_list: Arc, ) -> Self {

Re: [I] regexp_replace fails when pattern or replacement is a scalar NULL [datafusion]

2024-07-12 Thread via GitHub
Weijun-H commented on issue #11410: URL: https://github.com/apache/datafusion/issues/11410#issuecomment-2225601291 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Add SessionStateBuilder and extract out the registration of defaults [datafusion]

2024-07-12 Thread via GitHub
Omega359 commented on code in PR #11403: URL: https://github.com/apache/datafusion/pull/11403#discussion_r1675940838 ## datafusion/core/src/execution/session_state.rs: ## @@ -976,6 +837,482 @@ impl SessionState { } } +/// A builder to be used for building [`SessionState`

Re: [PR] Add SessionStateBuilder and extract out the registration of defaults [datafusion]

2024-07-12 Thread via GitHub
Omega359 commented on code in PR #11403: URL: https://github.com/apache/datafusion/pull/11403#discussion_r1675942907 ## datafusion/core/src/execution/session_state.rs: ## @@ -976,6 +837,482 @@ impl SessionState { } } +/// A builder to be used for building [`SessionState`

Re: [PR] Add SessionStateBuilder and extract out the registration of defaults [datafusion]

2024-07-12 Thread via GitHub
Omega359 commented on code in PR #11403: URL: https://github.com/apache/datafusion/pull/11403#discussion_r1675946153 ## datafusion/core/src/execution/session_state.rs: ## @@ -976,6 +837,482 @@ impl SessionState { } } +/// A builder to be used for building [`SessionState`

Re: [PR] feat: support `COUNT()` [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on PR #11229: URL: https://github.com/apache/datafusion/pull/11229#issuecomment-2225614819 I found the function name issue is a lot complex than I thought. I think we should have a different name for function for displayed and planning. In DuckDB, `count_star`

[I] Preceding and Following (WindowFrameBound) are incorrectly handled when an unoptimized plan created via SQL is converted to a substrait Plan [datafusion]

2024-07-12 Thread via GitHub
notfilippo opened a new issue, #11432: URL: https://github.com/apache/datafusion/issues/11432 ### Describe the bug The function which parses the Preceding and Following expressions in window functions returns a `ScalarValue::Utf8`: https://github.com/apache/datafusion/blob/1df

Re: [PR] Add SessionStateBuilder and extract out the registration of defaults [datafusion]

2024-07-12 Thread via GitHub
Omega359 commented on code in PR #11403: URL: https://github.com/apache/datafusion/pull/11403#discussion_r1675951519 ## datafusion/core/src/execution/session_state.rs: ## @@ -976,6 +837,482 @@ impl SessionState { } } +/// A builder to be used for building [`SessionState`

Re: [PR] Add SessionStateBuilder and extract out the registration of defaults [datafusion]

2024-07-12 Thread via GitHub
Omega359 commented on code in PR #11403: URL: https://github.com/apache/datafusion/pull/11403#discussion_r1675946153 ## datafusion/core/src/execution/session_state.rs: ## @@ -976,6 +837,482 @@ impl SessionState { } } +/// A builder to be used for building [`SessionState`

[PR] Fixes Setting Job Name Not Reflected in Ballista UI [datafusion-ballista]

2024-07-12 Thread via GitHub
athultr1997 opened a new pull request, #1039: URL: https://github.com/apache/datafusion-ballista/pull/1039 # Which issue does this PR close? Closes #1019 # Rationale for this change Bug of Job Name not being displayed in Ballista UI even after its explicitly set. # Wha

[I] Add nullable in `StateFieldArgs` [datafusion]

2024-07-12 Thread via GitHub
jcsherin opened a new issue, #11433: URL: https://github.com/apache/datafusion/issues/11433 I think given the existing nth function, we should let nullable configurable. And, the nullability is actually for the list element. We should add nullable in `StateFieldArgs`.

Re: [I] Add nullable in `StateFieldArgs` [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on issue #11433: URL: https://github.com/apache/datafusion/issues/11433#issuecomment-2225655947 Hold on, I think it is outdated, since I found that most of the sql database returns null if no row qualified. #11299 -- This is an automated message from the Apache Git S

Re: [PR] Short term way to make `AggregateStatistics` still work when min/max is converted to udaf [datafusion]

2024-07-12 Thread via GitHub
Rachelint commented on PR #11261: URL: https://github.com/apache/datafusion/pull/11261#issuecomment-2225678848 Thanks @alamb for review, will continue to try to move them into `AggregateUDFImpl`, and comments have been added. -- This is an automated message from the Apache Git Service. To

Re: [I] Create a scalar from array of type Map [datafusion]

2024-07-12 Thread via GitHub
Rachelint commented on issue #6485: URL: https://github.com/apache/datafusion/issues/6485#issuecomment-2225682451 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Implement TPCH substrait integration test, support tpch_13, tpch_14,16 [datafusion]

2024-07-12 Thread via GitHub
Lordworms commented on PR #11405: URL: https://github.com/apache/datafusion/pull/11405#issuecomment-2225686585 > Thanks @Lordworms 🙏 > > I think @Blizzara added a clearner way to setup the contexts in #11396 > > Would it be possible to use that new pattern in this PR too?

Re: [I] Add nullable in `StateFieldArgs` [datafusion]

2024-07-12 Thread via GitHub
jcsherin commented on issue #11433: URL: https://github.com/apache/datafusion/issues/11433#issuecomment-2225700932 For comparison I ran a few queries from #11299 with `nth_value`. This looks right to me. ```sql DataFusion CLI v40.0.0 > create table t(a int, b float, c bigint) as

Re: [I] Add nullable in `StateFieldArgs` [datafusion]

2024-07-12 Thread via GitHub
jcsherin commented on issue #11433: URL: https://github.com/apache/datafusion/issues/11433#issuecomment-2225706313 This comment is also outdated because of #11299. https://github.com/apache/datafusion/blob/1dfac86a89750193491cf3e04917e37b92c64ffa/datafusion/functions-aggregate/src/nth

[I] Implement the rewrite from the Map literal to Map function [datafusion]

2024-07-12 Thread via GitHub
goldmedal opened a new issue, #11434: URL: https://github.com/apache/datafusion/issues/11434 ### Is your feature request related to a problem or challenge? Based on the discussion in https://github.com/apache/datafusion/issues/11268#issuecomment-2211125762, we will support the MAP li

[I] Document the Map function in the documentation [datafusion]

2024-07-12 Thread via GitHub
goldmedal opened a new issue, #11435: URL: https://github.com/apache/datafusion/issues/11435 ### Is your feature request related to a problem or challenge? We implemented `map` and `make_map` functions in #1136. We're better to have the corresponding documentation for them. ###

Re: [I] Implement the rewrite from the Map literal to Map function [datafusion]

2024-07-12 Thread via GitHub
goldmedal commented on issue #11434: URL: https://github.com/apache/datafusion/issues/11434#issuecomment-2225717277 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Add nullable in `StateFieldArgs` [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on issue #11433: URL: https://github.com/apache/datafusion/issues/11433#issuecomment-2225720653 This one looks good to me too. But, it is nice to have a case that benefit on the nullability of the element (like the query that is optimized based on this) ```rust

Re: [I] Add nullable in `StateFieldArgs` [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on issue #11433: URL: https://github.com/apache/datafusion/issues/11433#issuecomment-2225728116 > This comment is also outdated because of #11299. > > https://github.com/apache/datafusion/blob/1dfac86a89750193491cf3e04917e37b92c64ffa/datafusion/functions-aggregate

Re: [PR] fix: make sure JOIN ON expression is boolean type [datafusion]

2024-07-12 Thread via GitHub
jonahgao commented on code in PR #11423: URL: https://github.com/apache/datafusion/pull/11423#discussion_r1676031292 ## datafusion/sql/src/relation/join.rs: ## @@ -107,7 +110,20 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { JoinConstraint::On(sql_expr) => {

Re: [PR] Extract parquet statistics for `StructArray` [datafusion]

2024-07-12 Thread via GitHub
efredine commented on code in PR #11289: URL: https://github.com/apache/datafusion/pull/11289#discussion_r1676020569 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -946,13 +946,29 @@ pub(crate) fn parquet_column<'a>( ) -> Option<(usize, &'a FieldRe

Re: [PR] feat: add raw aggregate udf planner [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on code in PR #11371: URL: https://github.com/apache/datafusion/pull/11371#discussion_r1676040274 ## datafusion/sql/src/expr/function.rs: ## @@ -349,13 +350,31 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { .map(|e| self.sql_expr_to_l

Re: [PR] fix: make sure JOIN ON expression is boolean type [datafusion]

2024-07-12 Thread via GitHub
jonahgao commented on code in PR #11423: URL: https://github.com/apache/datafusion/pull/11423#discussion_r1676040776 ## datafusion/core/src/dataframe/mod.rs: ## @@ -896,9 +896,8 @@ impl DataFrame { join_type: JoinType, on_exprs: impl IntoIterator, ) -> Res

Re: [PR] feat: Upgrade to DataFusion 40 [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove merged PR #657: URL: https://github.com/apache/datafusion-comet/pull/657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] Add nullable in `StateFieldArgs` [datafusion]

2024-07-12 Thread via GitHub
jcsherin commented on issue #11433: URL: https://github.com/apache/datafusion/issues/11433#issuecomment-2225749277 Sorry, I do not follow. Could you please elaborate on the incorrect part. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] feat: add raw aggregate udf planner [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on code in PR #11371: URL: https://github.com/apache/datafusion/pull/11371#discussion_r1676046289 ## datafusion/expr/src/planner.rs: ## @@ -161,6 +162,28 @@ pub trait ExprPlanner: Send + Sync { ) -> Result>> { Ok(PlannerResult::Original(args))

[I] Support Arrays for the Map scalar functions [datafusion]

2024-07-12 Thread via GitHub
goldmedal opened a new issue, #11436: URL: https://github.com/apache/datafusion/issues/11436 ### Is your feature request related to a problem or challenge? As @alamb mentioned in https://github.com/apache/datafusion/pull/11361#discussion_r1672743943, we should support not only scalar

Re: [PR] fix: Optimize some functions to rewrite dictionary-encoded strings [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove commented on code in PR #627: URL: https://github.com/apache/datafusion-comet/pull/627#discussion_r1676051349 ## native/core/src/execution/datafusion/expressions/cast.rs: ## @@ -500,19 +500,27 @@ impl Cast { let to_type = &self.data_type; let array =

Re: [I] Add nullable in `StateFieldArgs` [datafusion]

2024-07-12 Thread via GitHub
jayzhan211 commented on issue #11433: URL: https://github.com/apache/datafusion/issues/11433#issuecomment-2225757052 Like what we have in #11299 . To returns null if row qualified, the `nullable` for list should be `true`, but the current code is false which is not what I thought of. --

Re: [PR] fix: Optimize some functions to rewrite dictionary-encoded strings [datafusion-comet]

2024-07-12 Thread via GitHub
andygrove commented on code in PR #627: URL: https://github.com/apache/datafusion-comet/pull/627#discussion_r1676052360 ## native/core/src/execution/datafusion/expressions/cast.rs: ## @@ -1709,6 +1722,37 @@ mod tests { assert_eq!(result.len(), 2); } +#[test]

Re: [PR] Minor: Add note about SQLLancer fuzz testing to docs [datafusion]

2024-07-12 Thread via GitHub
jonahgao merged PR #11430: URL: https://github.com/apache/datafusion/pull/11430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Extract parquet statistics for `StructArray` [datafusion]

2024-07-12 Thread via GitHub
Lordworms commented on code in PR #11289: URL: https://github.com/apache/datafusion/pull/11289#discussion_r1676059795 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -1984,7 +1981,96 @@ async fn test_struct() { } .run(); } +// test nested struct +#[tokio::

Re: [PR] fix: Optimize some functions to rewrite dictionary-encoded strings [datafusion-comet]

2024-07-12 Thread via GitHub
vaibhawvipul commented on code in PR #627: URL: https://github.com/apache/datafusion-comet/pull/627#discussion_r1676060588 ## native/core/src/execution/datafusion/expressions/cast.rs: ## @@ -1709,6 +1722,37 @@ mod tests { assert_eq!(result.len(), 2); } +#[tes

  1   2   3   >