Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
iffyio commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2051415729 ## src/ast/mod.rs: ## @@ -4054,6 +4054,12 @@ pub enum Statement { arguments: Vec, options: Vec, }, +/// Go (MSSQL) +/// +

Re: [PR] Add `CREATE FUNCTION` support for SQL Server [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
iffyio commented on code in PR #1808: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1808#discussion_r2051414204 ## src/parser/mod.rs: ## @@ -5135,6 +5146,63 @@ impl<'a> Parser<'a> { })) } +/// Parse `CREATE FUNCTION` for [SQL Server] +/// +

Re: [I] Filter multiple columns from TopK using Lexicographical ordering [datafusion]

2025-04-18 Thread via GitHub
Dandandan commented on issue #15698: URL: https://github.com/apache/datafusion/issues/15698#issuecomment-2816571388 This waits on Https://github.com/apache/datafusion/pull/15697 to be finalized -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] OOM when nested join + limit [datafusion]

2025-04-18 Thread via GitHub
Dandandan commented on issue #15628: URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2816553633 There is also a now closed PR for IEJoin https://github.com/apache/datafusion/pull/12754#pullrequestreview-2350501097 That allows joining on some other conditions.

Re: [I] OOM when nested join + limit [datafusion]

2025-04-18 Thread via GitHub
kosiew commented on issue #15628: URL: https://github.com/apache/datafusion/issues/15628#issuecomment-2816494226 @2010YOUY01 What do you think of using sorted inputs, just like SortMergeJoinExec Instead of ==, allow conditions like: t1.id < t2.id t1.timestamp <= t2.ev

Re: [PR] Speed up `optimize_projection` by improving `is_projection_unnecessary` [datafusion]

2025-04-18 Thread via GitHub
xudong963 merged PR #15761: URL: https://github.com/apache/datafusion/pull/15761 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Speed up `optimize_projection` by improving `is_projection_unnecessary` [datafusion]

2025-04-18 Thread via GitHub
xudong963 commented on PR #15761: URL: https://github.com/apache/datafusion/pull/15761#issuecomment-2816475287 thank you @Dandandan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] feat: support inner iejoin [datafusion]

2025-04-18 Thread via GitHub
github-actions[bot] closed pull request #12754: feat: support inner iejoin URL: https://github.com/apache/datafusion/pull/12754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Filter multiple columns from TopK using Lexicographical ordering [datafusion]

2025-04-18 Thread via GitHub
Standing-Man commented on issue #15698: URL: https://github.com/apache/datafusion/issues/15698#issuecomment-2816406328 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] add support for XMLTABLE(...) [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
lovasoa opened a new pull request, #1817: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1817 adds support for xmltable(...) see https://www.postgresql.org/docs/15/functions-xml.html#FUNCTIONS-XML-PROCESSING fixes https://github.com/apache/datafusion-sqlparser-rs/i

Re: [PR] Optimize TopK with threshold filter ~1.4x speedup [datafusion]

2025-04-18 Thread via GitHub
Dandandan commented on PR #15697: URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2816249151 > FYI @Dandandan although very rough I put up a draft of filter pushdown in #15770. > > The interaction with this PR is something to think about. In particular it’d be nice

[I] xmltable(...) function support [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
lovasoa opened a new issue, #1816: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1816 Postgres, db2, and oracle all support the `xmltable` table-valued function. ```sql SELECT xmltable.* FROM xmldata, XMLTABLE('//ROWS/ROW' PASSING data

Re: [PR] Use `interleave` in hash repartitioning [datafusion]

2025-04-18 Thread via GitHub
Copilot commented on code in PR #15768: URL: https://github.com/apache/datafusion/pull/15768#discussion_r2051134690 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -260,33 +260,43 @@ impl BatchPartitioner { } => { let idx = *next_i

Re: [PR] User `interleave` in hash repartitioning [datafusion]

2025-04-18 Thread via GitHub
Dandandan commented on code in PR #15768: URL: https://github.com/apache/datafusion/pull/15768#discussion_r2051129739 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -298,25 +299,15 @@ impl BatchPartitioner { .into_iter()

Re: [I] Update BATCH_SIZE config key to match DataFusion changes [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #73: URL: https://github.com/apache/datafusion-ballista/issues/73#issuecomment-2816196125 I believe this one issue is stale -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Update BATCH_SIZE config key to match DataFusion changes [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #73: Update BATCH_SIZE config key to match DataFusion changes URL: https://github.com/apache/datafusion-ballista/issues/73 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] feat: Improve fetch partition performance, support skip validation arrow ipc files [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on PR #1216: URL: https://github.com/apache/datafusion-ballista/pull/1216#issuecomment-2816193059 hey @westhide are you still interested to get this PR merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#issuecomment-2816193857 hey @westhide are you still interested to get this PR merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Optimize TopK with threshold filter ~1.4x speedup [datafusion]

2025-04-18 Thread via GitHub
adriangb commented on PR #15697: URL: https://github.com/apache/datafusion/pull/15697#issuecomment-2816181258 FYI @Dandandan although very rough I put up a draft of filter pushdown in https://github.com/apache/datafusion/pull/15770. The interaction with this PR is something to think a

Re: [I] [EPIC] Ballista 2025/H2 Roadmap Proposal [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #1245: URL: https://github.com/apache/datafusion-ballista/issues/1245#issuecomment-2816176772 would be happy to help. will have a look at comet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [I] [EPIC] Ballista 2025/H2 Roadmap Proposal [datafusion-ballista]

2025-04-18 Thread via GitHub
andygrove commented on issue #1245: URL: https://github.com/apache/datafusion-ballista/issues/1245#issuecomment-2816170438 There is work in progress to add a `datafusion-spark` crate in the core DataFusion repo. See https://github.com/apache/datafusion/issues/5600 and https://github.com/ap

Re: [I] [EPIC] Ballista 2025/H2 Roadmap Proposal [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #1245: URL: https://github.com/apache/datafusion-ballista/issues/1245#issuecomment-2816167845 I would take shuffle related task with highest priority @andygrove was thinking of #320 and few others related to compression, schema serialization and so on, but

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-04-18 Thread via GitHub
andygrove commented on code in PR #15168: URL: https://github.com/apache/datafusion/pull/15168#discussion_r2051099886 ## datafusion/spark/src/function/math/expm1.rs: ## @@ -0,0 +1,169 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] User `interleave` in hash repartitioning [datafusion]

2025-04-18 Thread via GitHub
Dandandan commented on code in PR #15768: URL: https://github.com/apache/datafusion/pull/15768#discussion_r2051062202 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -298,25 +299,15 @@ impl BatchPartitioner { .into_iter()

Re: [I] [EPIC] Ballista 2025/H2 Roadmap Proposal [datafusion-ballista]

2025-04-18 Thread via GitHub
andygrove commented on issue #1245: URL: https://github.com/apache/datafusion-ballista/issues/1245#issuecomment-2816157686 There has been a lot of progress with shuffle performance in Comet that Ballista could benefit from. -- This is an automated message from the Apache Git Service. To

Re: [PR] feat: Add support for complex types in native shuffle [datafusion-comet]

2025-04-18 Thread via GitHub
andygrove commented on code in PR #1655: URL: https://github.com/apache/datafusion-comet/pull/1655#discussion_r2051088550 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2920,7 +2936,8 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [I] Suspicious slow test in Ballista [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #235: URL: https://github.com/apache/datafusion-ballista/issues/235#issuecomment-2816148972 I believe there is race between shutdown and spawn, leading to last barrier to wait forever (as shutdown is called before task is scheduled), adding new barrier to sig

Re: [PR] feat: Add support for complex types in native shuffle [datafusion-comet]

2025-04-18 Thread via GitHub
andygrove commented on code in PR #1655: URL: https://github.com/apache/datafusion-comet/pull/1655#discussion_r2051086969 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2889,6 +2878,35 @@ object QueryPlanSerde extends Logging with CometExprShim {

[PR] fix: executor_shutdown_while_running test has race [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm opened a new pull request, #1248: URL: https://github.com/apache/datafusion-ballista/pull/1248 # Which issue does this PR close? Closes #235 # Rationale for this change # What changes are included in this PR? executor_shutdown_while_running

Re: [PR] feat: Add support for complex types in native shuffle [datafusion-comet]

2025-04-18 Thread via GitHub
andygrove commented on code in PR #1655: URL: https://github.com/apache/datafusion-comet/pull/1655#discussion_r2051083077 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2889,6 +2878,35 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] fix: better int96 support for experimental native scans [datafusion-comet]

2025-04-18 Thread via GitHub
andygrove merged PR #1652: URL: https://github.com/apache/datafusion-comet/pull/1652 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: Add support for complex types in native shuffle [datafusion-comet]

2025-04-18 Thread via GitHub
andygrove commented on code in PR #1655: URL: https://github.com/apache/datafusion-comet/pull/1655#discussion_r2051072959 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2889,6 +2855,31 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] feat: Add support for complex types in native shuffle [datafusion-comet]

2025-04-18 Thread via GitHub
andygrove commented on code in PR #1655: URL: https://github.com/apache/datafusion-comet/pull/1655#discussion_r2051070890 ## spark/src/test/scala/org/apache/comet/CometFuzzTestSuite.scala: ## @@ -161,6 +162,18 @@ class CometFuzzTestSuite extends CometTestBase with AdaptiveSpark

Re: [I] [Ballista] Fix regression in `roundtrip_logical_plan_custom_ctx` test [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #481: [Ballista] Fix regression in `roundtrip_logical_plan_custom_ctx` test URL: https://github.com/apache/datafusion-ballista/issues/481 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [I] [Ballista] Fix regression in `roundtrip_logical_plan_custom_ctx` test [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #481: URL: https://github.com/apache/datafusion-ballista/issues/481#issuecomment-2816118243 looks like stale issue, closing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-04-18 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2816116031 Pausing this until https://github.com/apache/datafusion/pull/15769 is done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-04-18 Thread via GitHub
adriangb commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2051065573 ## datafusion/physical-optimizer/src/push_down_filter.rs: ## @@ -382,7 +383,7 @@ impl PhysicalOptimizerRule for PushdownFilter { context .

Re: [I] [Ballista] Support to access remote object store, like HDFS, S3, etc [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #6: URL: https://github.com/apache/datafusion-ballista/issues/6#issuecomment-2816115439 ballista supports s3 store, others can be added by users if needed. closing this issue -- This is an automated message from the Apache Git Service. To respond to the me

Re: [I] [Ballista] Support to access remote object store, like HDFS, S3, etc [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #6: [Ballista] Support to access remote object store, like HDFS, S3, etc URL: https://github.com/apache/datafusion-ballista/issues/6 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] User `interleave` in hash repartitioning [datafusion]

2025-04-18 Thread via GitHub
Dandandan commented on code in PR #15768: URL: https://github.com/apache/datafusion/pull/15768#discussion_r2051062202 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -298,25 +299,15 @@ impl BatchPartitioner { .into_iter()

[PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-04-18 Thread via GitHub
adriangb opened a new pull request, #15770: URL: https://github.com/apache/datafusion/pull/15770 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] re-implement filter pushdown for parquet [datafusion]

2025-04-18 Thread via GitHub
adriangb commented on code in PR #15769: URL: https://github.com/apache/datafusion/pull/15769#discussion_r2051060245 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -453,29 +451,8 @@ impl FileFormat for ParquetFormat { Ok(Arc::new(DataSinkExec::new(input, sin

Re: [I] Enable/configure shuffle compression [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #575: URL: https://github.com/apache/datafusion-ballista/issues/575#issuecomment-2816109421 Looks like shuffle files are compressed now, https://github.com/apache/datafusion-ballista/blob/559bcf29ed0719d4fb133bba4d39e48c18f45891/ballista/core/src/execution_pla

Re: [I] Enable/configure shuffle compression [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #575: Enable/configure shuffle compression URL: https://github.com/apache/datafusion-ballista/issues/575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] re-implement filter pushdown for parquet [datafusion]

2025-04-18 Thread via GitHub
adriangb opened a new pull request, #15769: URL: https://github.com/apache/datafusion/pull/15769 Very much WIP, please do not review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] Save session data [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #751: Save session data URL: https://github.com/apache/datafusion-ballista/issues/751 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

[PR] User `interleave` in hash repartitioning [datafusion]

2025-04-18 Thread via GitHub
Dandandan opened a new pull request, #15768: URL: https://github.com/apache/datafusion/pull/15768 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/7957 ## Rationale for this change ## What changes are included in this P

Re: [PR] Add `CREATE FUNCTION` support for SQL Server [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
aharpervc commented on code in PR #1808: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1808#discussion_r2051054233 ## src/parser/mod.rs: ## @@ -5135,6 +5146,63 @@ impl<'a> Parser<'a> { })) } +/// Parse `CREATE FUNCTION` for [SQL Server] +//

Re: [PR] feat: Add support for complex types in native shuffle [datafusion-comet]

2025-04-18 Thread via GitHub
kazuyukitanimura commented on code in PR #1655: URL: https://github.com/apache/datafusion-comet/pull/1655#discussion_r2051045028 ## spark/src/test/scala/org/apache/comet/CometFuzzTestSuite.scala: ## @@ -161,6 +162,18 @@ class CometFuzzTestSuite extends CometTestBase with Adapti

Re: [I] Stop using arrow_flight::utils::flight_data_from_arrow_batch [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #569: Stop using arrow_flight::utils::flight_data_from_arrow_batch URL: https://github.com/apache/datafusion-ballista/issues/569 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] Stop using arrow_flight::utils::flight_data_from_arrow_batch [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #569: URL: https://github.com/apache/datafusion-ballista/issues/569#issuecomment-2816096383 looks like this has been resolved, I can't find use of `flight_data_from_arrow_batch` in current code base -- This is an automated message from the Apache Git Servi

Re: [PR] Add `CREATE FUNCTION` support for SQL Server [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
aharpervc commented on code in PR #1808: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1808#discussion_r2051040245 ## tests/sqlparser_common.rs: ## @@ -15015,3 +15015,8 @@ fn parse_set_time_zone_alias() { _ => unreachable!(), } } + +#[test] +fn pars

Re: [PR] Add `CREATE FUNCTION` support for SQL Server [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
aharpervc commented on code in PR #1808: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1808#discussion_r2051037497 ## src/ast/spans.rs: ## @@ -2280,6 +2277,12 @@ impl Spanned for TableObject { } } +impl Spanned for BeginEndStatements { +fn span(&self)

Re: [PR] docs: Update compatibility docs for new native scans [datafusion-comet]

2025-04-18 Thread via GitHub
kazuyukitanimura commented on code in PR #1657: URL: https://github.com/apache/datafusion-comet/pull/1657#discussion_r2051032371 ## docs/source/user-guide/compatibility.md: ## Review Comment: I think compatibility-template.md needs to be changed -- This is an automated

[PR] Improve `ListingTable` / `ListingTableOptions` docs [datafusion]

2025-04-18 Thread via GitHub
alamb opened a new pull request, #15767: URL: https://github.com/apache/datafusion/pull/15767 ## Which issue does this PR close? - Closes #. ## Rationale for this change I found myself with some time on ✈️ without internet access so I worked on some doc improveme

[PR] Improve documentation for `FileSource`, `DataSource` and `DataSourceExec` [datafusion]

2025-04-18 Thread via GitHub
alamb opened a new pull request, #15766: URL: https://github.com/apache/datafusion/pull/15766 ## Which issue does this PR close? - Closes #. ## Rationale for this change I found myself with some time on ✈️ without internet access and was reviewing some documentat

Re: [I] org.apache.spark.sql.catalyst.expressions.BoundReference cannot be cast to class org.apache.spark.sql.ColumnarExpression [datafusion-comet]

2025-04-18 Thread via GitHub
kazuyukitanimura commented on issue #1197: URL: https://github.com/apache/datafusion-comet/issues/1197#issuecomment-2816024962 @andygrove Yes, I think so https://github.com/apache/datafusion-comet/blob/e7a3214510091cf9d177fabdf9e3221317d1e785/dev/diffs/3.4.3.diff#L981 -- This is an a

Re: [PR] docs: Update compatibility docs for new native scans [datafusion-comet]

2025-04-18 Thread via GitHub
mbutrovich commented on code in PR #1657: URL: https://github.com/apache/datafusion-comet/pull/1657#discussion_r2050993171 ## docs/source/user-guide/compatibility.md: ## @@ -34,25 +34,36 @@ This guide offers information about areas of functionality where there are known Comet

Re: [PR] predicate pruning: support cast and try_cast for more types [datafusion]

2025-04-18 Thread via GitHub
adriangb commented on PR #15764: URL: https://github.com/apache/datafusion/pull/15764#issuecomment-2815993041 @appletreeisyellow would you mind reviewing 🙏🏻 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] Use ListingSchema in Ballista [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #500: URL: https://github.com/apache/datafusion-ballista/issues/500#issuecomment-2815991033 if datafusion supports this, I believe it should be supported in ballista now, please re-open if still required -- This is an automated message from the Apache Git S

Re: [I] Use ListingSchema in Ballista [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #500: Use ListingSchema in Ballista URL: https://github.com/apache/datafusion-ballista/issues/500 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] Data quality framework [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #802: Data quality framework URL: https://github.com/apache/datafusion-ballista/issues/802 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2050966395 ## src/dialect/mssql.rs: ## @@ -116,7 +116,29 @@ impl Dialect for MsSqlDialect { true } -fn is_column_alias(&self, kw: &Keyword,

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2050966038 ## tests/sqlparser_mssql.rs: ## @@ -2053,3 +2054,171 @@ fn parse_drop_trigger() { } ); } + +#[test] +fn parse_mssql_go_keyword() { +

Re: [I] Improve GitHub Workflows [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #703: Improve GitHub Workflows URL: https://github.com/apache/datafusion-ballista/issues/703 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] fix: better int96 support for experimental native scans [datafusion-comet]

2025-04-18 Thread via GitHub
mbutrovich commented on PR #1652: URL: https://github.com/apache/datafusion-comet/pull/1652#issuecomment-2815975208 https://github.com/apache/datafusion/issues/15763 I opened an issue for nested INT96. For now we'll keep the CometFuzzTestSuite, I will open a PR early next week to re-

Re: [I] Improve GitHub Workflows [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #703: Improve GitHub Workflows URL: https://github.com/apache/datafusion-ballista/issues/703 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [I] Streamline github actions [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on issue #1128: URL: https://github.com/apache/datafusion-ballista/issues/1128#issuecomment-2815974509 there is #703 which is asking same -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] minor: change log level for object store creation [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm opened a new pull request, #1247: URL: https://github.com/apache/datafusion-ballista/pull/1247 # Which issue does this PR close? Closes #. # Rationale for this change object store creation will print configuration with secret key at INFO log level, w

Re: [I] [BUG] Error when adding Date32 and Int64 [datafusion]

2025-04-18 Thread via GitHub
qstommyshu commented on issue #12342: URL: https://github.com/apache/datafusion/issues/12342#issuecomment-2812968577 Oh I see, thanks @Omega359 for sending the doc. Hope you don't mind me asking another question. I'm still wondering why we want DF to support operations like `select t

Re: [PR] Updated extending operators documentation [datafusion]

2025-04-18 Thread via GitHub
Max-Meldrum commented on PR #15612: URL: https://github.com/apache/datafusion/pull/15612#issuecomment-2815924192 @the0ninjas It should passes the test if **```rust,ignore** is used instead. And we could link a reference to the external code https://github.com/uwheel/datafusion-uwhe

Re: [I] Add a "Gentle Introduction to Arrow / Record Batches" [datafusion]

2025-04-18 Thread via GitHub
Adez017 commented on issue #11336: URL: https://github.com/apache/datafusion/issues/11336#issuecomment-2815907267 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Add `CREATE FUNCTION` support for SQL Server [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
aharpervc commented on code in PR #1808: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1808#discussion_r2047762830 ## tests/sqlparser_mssql.rs: ## @@ -187,6 +188,386 @@ fn parse_mssql_create_procedure() { let _ = ms().verified_stmt("CREATE PROCEDURE [foo] AS

Re: [I] Make it easier to run TPCH queries with datafusion-cli [datafusion]

2025-04-18 Thread via GitHub
clflushopt commented on issue #14608: URL: https://github.com/apache/datafusion/issues/14608#issuecomment-2811379764 In order to try and make progress on this, I decided to go with having a single function that builds all tables for a single scale factor similar to how DuckDB does it. My re

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-18 Thread via GitHub
berkaysynnada commented on code in PR #15566: URL: https://github.com/apache/datafusion/pull/15566#discussion_r2049226852 ## datafusion/physical-optimizer/src/push_down_filter.rs: ## @@ -0,0 +1,535 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more con

Re: [PR] Improve `simplify_expressions` rule [datafusion]

2025-04-18 Thread via GitHub
xudong963 commented on code in PR #15735: URL: https://github.com/apache/datafusion/pull/15735#discussion_r2048178156 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -188,7 +188,7 @@ impl ExprSimplifier { /// assert_eq!(expr, b_lt_2); /// ```

Re: [D] DISCUSSION: DataFusion Meetup in San Francisco Bay Area USA [datafusion]

2025-04-18 Thread via GitHub
GitHub user alamb added a comment to the discussion: DISCUSSION: DataFusion Meetup in San Francisco Bay Area USA We are plotting another meetup during DataBricks Data and AI event: - https://github.com/apache/datafusion/discussions/15657 GitHub link: https://github.com/apache/datafusion/discu

Re: [PR] Improve push down limit (logical optimizer rule) [datafusion]

2025-04-18 Thread via GitHub
2010YOUY01 commented on PR #15744: URL: https://github.com/apache/datafusion/pull/15744#issuecomment-2814413906 > > Topk > > IIUC, the topk in https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs is only used for test. Yes, now

Re: [PR] improve eliminate_outer_join rule [datafusion]

2025-04-18 Thread via GitHub
github-actions[bot] commented on PR #13249: URL: https://github.com/apache/datafusion/pull/13249#issuecomment-2814346885 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] [Epic] Add snapshot tests (migrate to `insta` for tests) [datafusion]

2025-04-18 Thread via GitHub
xudong963 commented on issue #15178: URL: https://github.com/apache/datafusion/issues/15178#issuecomment-2813142044 Places are still using old styles: - [ ] core/tests/sql/explain_analyze.rs -- This is an automated message from the Apache Git Service. To respond to the message, plea

[I] Update datafusion <> homebrew instructions [datafusion]

2025-04-18 Thread via GitHub
kevinjqliu opened a new issue, #15751: URL: https://github.com/apache/datafusion/issues/15751 ### Describe the bug While looking at how datafusion publishes to homebrew, i came across the release instructions [here](https://github.com/apache/datafusion/blob/ab5edc975d9cc6aa36c9e9eb77

Re: [PR] Support `Accumulator` for avg duration [datafusion]

2025-04-18 Thread via GitHub
alamb commented on code in PR #15468: URL: https://github.com/apache/datafusion/pull/15468#discussion_r2048964601 ## datafusion/functions-aggregate/src/average.rs: ## @@ -399,6 +410,105 @@ impl Accumulator for DecimalAvgAccumu } } +/// An accumulator to compute the aver

Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-04-18 Thread via GitHub
rluvaton commented on code in PR #15022: URL: https://github.com/apache/datafusion/pull/15022#discussion_r2049446143 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -106,6 +107,44 @@ impl EmitTo { /// [`Accumulator`]: crate::accumulator::Accumulator /// [Aggregating

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-18 Thread via GitHub
berkaysynnada commented on code in PR #15566: URL: https://github.com/apache/datafusion/pull/15566#discussion_r2049205041 ## datafusion/physical-optimizer/src/push_down_filter.rs: ## @@ -0,0 +1,535 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more con

Re: [PR] feat: Add ConfigOptions to ScalarFunctionArgs [datafusion]

2025-04-18 Thread via GitHub
Omega359 commented on PR #13527: URL: https://github.com/apache/datafusion/pull/13527#issuecomment-2812860089 > BTW how many fields from ConfigOptions really need to be copied? Can we just add the ones you need for spark? Or do we need a huge pile of them? For starters I was thinking

Re: [PR] Perf: Support automatically concat_batches for sort which will improve performance [datafusion]

2025-04-18 Thread via GitHub
zhuqi-lucas commented on code in PR #15380: URL: https://github.com/apache/datafusion/pull/15380#discussion_r2049125883 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -662,53 +665,152 @@ impl ExternalSorter { let elapsed_compute = metrics.elapsed_compute().clone()

Re: [I] Ballista: Partition columns are duplicated in protobuf decoding. [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm closed issue #484: Ballista: Partition columns are duplicated in protobuf decoding. URL: https://github.com/apache/datafusion-ballista/issues/484 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Parquet: coerce_int96 does not work for int96 in nested types with repeated names [datafusion]

2025-04-18 Thread via GitHub
Adez017 commented on issue #15763: URL: https://github.com/apache/datafusion/issues/15763#issuecomment-2815842125 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2050865353 ## tests/sqlparser_mssql.rs: ## @@ -2053,3 +2054,171 @@ fn parse_drop_trigger() { } ); } + +#[test] +fn parse_mssql_go_keyword() { +

Re: [PR] feat: Add support for complex types in native shuffle [datafusion-comet]

2025-04-18 Thread via GitHub
mbutrovich commented on PR #1655: URL: https://github.com/apache/datafusion-comet/pull/1655#issuecomment-2815525699 Shout out to @Kontinuation! #1511 removed a lot a of the custom logic in the shuffle writer that would have needed to be extended to support complex types. Instead we now rel

Re: [PR] fix: update row groups count in internal metrics accumulator [datafusion-comet]

2025-04-18 Thread via GitHub
codecov-commenter commented on PR #1658: URL: https://github.com/apache/datafusion-comet/pull/1658#issuecomment-2815836188 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1658?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Add `CREATE FUNCTION` support for SQL Server [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
aharpervc commented on code in PR #1808: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1808#discussion_r2047778337 ## tests/sqlparser_mssql.rs: ## @@ -187,6 +188,386 @@ fn parse_mssql_create_procedure() { let _ = ms().verified_stmt("CREATE PROCEDURE [foo] AS

Re: [PR] Set DataFusion runtime configurations through SQL interface [datafusion]

2025-04-18 Thread via GitHub
2010YOUY01 merged PR #15594: URL: https://github.com/apache/datafusion/pull/15594 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] [wip] Add scripts for running benchmarks on EC2 [datafusion-comet]

2025-04-18 Thread via GitHub
parthchandra commented on PR #1654: URL: https://github.com/apache/datafusion-comet/pull/1654#issuecomment-2810825234 That is very odd. We don't see the same in an EKS cluster with S3 storage. Is this consistently bad? Not a noisy neighbor issue, I hope? -- This is an automated message f

Re: [I] Linear Aggregate Functions Optimization [datafusion]

2025-04-18 Thread via GitHub
Rachelint commented on issue #15633: URL: https://github.com/apache/datafusion/issues/15633#issuecomment-2812144748 Will start working on this weekend. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] Linear Aggregate Functions Optimization [datafusion]

2025-04-18 Thread via GitHub
Rachelint commented on issue #15633: URL: https://github.com/apache/datafusion/issues/15633#issuecomment-2812144759 Will start working on this weekend. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] chore: update python deps to 45 [datafusion-ballista]

2025-04-18 Thread via GitHub
milenkovicm commented on PR #1240: URL: https://github.com/apache/datafusion-ballista/pull/1240#issuecomment-2815619257 thanks @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2050847853 ## src/parser/mod.rs: ## @@ -15017,6 +15026,48 @@ impl<'a> Parser<'a> { } } +fn parse_go(&mut self) -> Result { +// previ

Re: [PR] User defined window functions blog post [datafusion-site]

2025-04-18 Thread via GitHub
Adez017 commented on PR #66: URL: https://github.com/apache/datafusion-site/pull/66#issuecomment-2815810736 > Thanks @Adez017 ! Let's give it another day or two for any remaining comments and then I'll plan to publish it I think now , we should move forward -- This is an automated

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-18 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2050843298 ## src/ast/mod.rs: ## @@ -4054,6 +4054,12 @@ pub enum Statement { arguments: Vec, options: Vec, }, +/// Go (MSSQL) +//

  1   2   3   >