Re: [PR] [SPARK-49038][SQL] Fix regression in Spark UI SQL operator metrics calculation to filter out invalid accumulator values correctly [spark]

2024-07-30 Thread via GitHub
dongjoon-hyun commented on code in PR #47516: URL: https://github.com/apache/spark/pull/47516#discussion_r1697997183 ## sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala: ## @@ -96,9 +96,12 @@ class SQLMetric( def +=(v: Long): Unit = add(v) -

Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-07-30 Thread via GitHub
viirya commented on PR #47546: URL: https://github.com/apache/spark/pull/47546#issuecomment-2259798746 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] [SPARK-49038][SQL] Fix regression in Spark UI SQL operator metrics calculation to filter out invalid accumulator values correctly [spark]

2024-07-30 Thread via GitHub
dongjoon-hyun commented on PR #47516: URL: https://github.com/apache/spark/pull/47516#issuecomment-2259793071 Sure, @abmodi and @virrrat . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-49064][BUILD] Upgrade Kafka to 3.8.0 [spark]

2024-07-30 Thread via GitHub
panbingkun commented on PR #47540: URL: https://github.com/apache/spark/pull/47540#issuecomment-2259782011 > +1, LGTM for Apache Spark 4.0.0-preview2. Thank you for keeping tracking this area, @panbingkun . Thank you for your review! ❤️ -- This is an automated message from the Apac

Re: [PR] [SPARK-49064][BUILD] Upgrade Kafka to 3.8.0 [spark]

2024-07-30 Thread via GitHub
panbingkun commented on PR #47540: URL: https://github.com/apache/spark/pull/47540#issuecomment-2259780415 > Since the title is changed, could you make it `Ready for review`? Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-07-30 Thread via GitHub
JeonDaehong commented on PR #47546: URL: https://github.com/apache/spark/pull/47546#issuecomment-2259780351 > > 안녕하세요, 저는 한국에 있는 개발자 전씨입니다. Apache Spark의 커미터가 되고 싶습니다. 이것을 달성하는 방법에 대한 조언을 주시겠습니까? > > 안녕하세요. 다음은 커뮤니티 가이드입니다. 그러나 이러한 종류의 Q&A는 이 PR과 관련이 없습니다. 따라서 일반적인 질문은 공식 메일링 리스트를 사용

Re: [PR] [SPARK-49064][BUILD] Upgrade Kafka to 3.8.0 [spark]

2024-07-30 Thread via GitHub
panbingkun commented on code in PR #47540: URL: https://github.com/apache/spark/pull/47540#discussion_r1697976383 ## connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala: ## @@ -447,8 +449,9 @@ class KafkaTestUtils( sendMessages(msgs.to

Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-07-30 Thread via GitHub
dongjoon-hyun commented on code in PR #47546: URL: https://github.com/apache/spark/pull/47546#discussion_r1697967206 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala: ## @@ -1231,6 +1232,46 @@ class TransformWithStateSuite extends StateSto

Re: [PR] [SPARK-49064][BUILD] Upgrade Kafka to 3.8.0 [spark]

2024-07-30 Thread via GitHub
panbingkun commented on code in PR #47540: URL: https://github.com/apache/spark/pull/47540#discussion_r1697972253 ## connector/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala: ## @@ -117,7 +118,7 @@ class KafkaRDDSuite extends SparkFunSuite {

Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-07-30 Thread via GitHub
dongjoon-hyun commented on PR #47546: URL: https://github.com/apache/spark/pull/47546#issuecomment-2259772917 > Hello, I am Mr. Jeon, a developer in Korea. I would like to become a Committer for Apache Spark. Could you please give me some advice on how to achieve this? Hi, @JeonDaeho

[PR] [SPARK-49071][SQL] Cleanup SortArray [spark]

2024-07-30 Thread via GitHub
ulysses-you opened a new pull request, #47547: URL: https://github.com/apache/spark/pull/47547 ### What changes were proposed in this pull request? This pr cleanup the legacy code of `SortArray` to remove `ArraySortLike` and inline `nullOrder`. The `ArraySort` has been rewritt

Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-07-30 Thread via GitHub
dongjoon-hyun commented on code in PR #47546: URL: https://github.com/apache/spark/pull/47546#discussion_r1697967206 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala: ## @@ -1231,6 +1232,46 @@ class TransformWithStateSuite extends StateSto

Re: [PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-07-30 Thread via GitHub
JeonDaehong commented on PR #47546: URL: https://github.com/apache/spark/pull/47546#issuecomment-2259766374 Hello, I am Mr. Jeon, a developer in Korea. I would like to become a Committer for Apache Spark. Could you please give me some advice on how to achieve this? -- This is an automa

Re: [PR] [SPARK-48964][SQL][DOCS] Fix the discrepancy between implementation, comment and documentation of option `recursive.fields.max.depth` in ProtoBuf connector [spark]

2024-07-30 Thread via GitHub
LuciferYang commented on PR #47458: URL: https://github.com/apache/spark/pull/47458#issuecomment-2259765134 cc @rangadi FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-49064][BUILD] Upgrade Kafka to 3.8.0 [spark]

2024-07-30 Thread via GitHub
panbingkun commented on code in PR #47540: URL: https://github.com/apache/spark/pull/47540#discussion_r1697963622 ## connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala: ## @@ -447,8 +449,9 @@ class KafkaTestUtils( sendMessages(msgs.to

Re: [PR] [SPARK-49064][BUILD] Upgrade Kafka to 3.8.0 [spark]

2024-07-30 Thread via GitHub
dongjoon-hyun commented on code in PR #47540: URL: https://github.com/apache/spark/pull/47540#discussion_r1697962560 ## connector/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala: ## @@ -117,7 +118,7 @@ class KafkaRDDSuite extends SparkFunSuite {

Re: [PR] [SPARK-49064][BUILD] Upgrade Kafka to 3.8.0 [spark]

2024-07-30 Thread via GitHub
dongjoon-hyun commented on code in PR #47540: URL: https://github.com/apache/spark/pull/47540#discussion_r1697957597 ## connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala: ## @@ -447,8 +449,9 @@ class KafkaTestUtils( sendMessages(msgs

[PR] [SPARK-49070][SS][SQL] TransformWithStateExec.initialState is rewritten incorrectly to produce invalid query plan [spark]

2024-07-30 Thread via GitHub
viirya opened a new pull request, #47546: URL: https://github.com/apache/spark/pull/47546 ### What changes were proposed in this pull request? This patch fixes `TransformWithStateExec` so when its `hasInitialState` is false, the `initialState` won't be rewritten by planner

Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub
LuciferYang commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-2259749298 Thanks @dongjoon-hyun ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub
JeonDaehong commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-2259749276 Hello, I am Mr. Jeon, a developer in Korea. I would like to become a Committer for Apache Spark. Could you please give me some advice on how to achieve this? -- This is an automa

Re: [PR] [SPARK-49064][BUILD] Upgrade Kafka to 3.8.0 [spark]

2024-07-30 Thread via GitHub
dongjoon-hyun commented on PR #47540: URL: https://github.com/apache/spark/pull/47540#issuecomment-2259747887 Since the title is changed, could you make it `Ready for review`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub
dongjoon-hyun commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-2259746801 Thank you. Will take a look before the end of this week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-49038][SQL] Fix regression in Spark UI SQL operator metrics calculation to filter out invalid accumulator values correctly [spark]

2024-07-30 Thread via GitHub
abmodi commented on PR #47516: URL: https://github.com/apache/spark/pull/47516#issuecomment-2259740314 @dongjoon-hyun could you please review - thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-49062][SQL] Migrate XML to File Data Source V2 [spark]

2024-07-30 Thread via GitHub
wayneguow commented on PR #47539: URL: https://github.com/apache/spark/pull/47539#issuecomment-2259733836 cc @HyukjinKwon @cloud-fan thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-49067][SQL] Move utf-8 literal into internal methods of UrlCodec class [spark]

2024-07-30 Thread via GitHub
yaooqinn commented on PR #47544: URL: https://github.com/apache/spark/pull/47544#issuecomment-2259709039 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-49067][SQL] Move utf-8 literal into internal methods of UrlCodec class [spark]

2024-07-30 Thread via GitHub
yaooqinn closed pull request #47544: [SPARK-49067][SQL] Move utf-8 literal into internal methods of UrlCodec class URL: https://github.com/apache/spark/pull/47544 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-49056][SQL] ErrorClassesJsonReader cannot handle null properly [spark]

2024-07-30 Thread via GitHub
itholic commented on PR #47531: URL: https://github.com/apache/spark/pull/47531#issuecomment-2259705426 Thanks! Merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-49056][SQL] ErrorClassesJsonReader cannot handle null properly [spark]

2024-07-30 Thread via GitHub
itholic closed pull request #47531: [SPARK-49056][SQL] ErrorClassesJsonReader cannot handle null properly URL: https://github.com/apache/spark/pull/47531 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-49002][SQL] Consistently handle invalid locations in WAREHOUSE/SCHEMA/TABLE/PARTITION/DIRECTORY [spark]

2024-07-30 Thread via GitHub
yaooqinn commented on code in PR #47485: URL: https://github.com/apache/spark/pull/47485#discussion_r1697915085 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -134,9 +132,6 @@ class ResolveSessionCatalog(val catalogManager:

Re: [PR] [SPARK-48725][SQL] Integrate CollationAwareUTF8String.lowerCaseCodePoints into string expressions [spark]

2024-07-30 Thread via GitHub
cloud-fan closed pull request #47132: [SPARK-48725][SQL] Integrate CollationAwareUTF8String.lowerCaseCodePoints into string expressions URL: https://github.com/apache/spark/pull/47132 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] [SPARK-48725][SQL] Integrate CollationAwareUTF8String.lowerCaseCodePoints into string expressions [spark]

2024-07-30 Thread via GitHub
cloud-fan commented on PR #47132: URL: https://github.com/apache/spark/pull/47132#issuecomment-2259684831 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-49000][SQL][WIP] Fix "select count(distinct 1) from t" where t is empty table by expanding RewriteDistinctAggregates [spark]

2024-07-30 Thread via GitHub
uros-db commented on PR #47525: URL: https://github.com/apache/spark/pull/47525#issuecomment-2259660311 also, I think it's worth noting (at least in this comment) that the optimizer rule `RewriteDistinctAggregates` is in `nonExcludableRules` - this is important because the changes are relat

Re: [PR] [SPARK-49000][SQL][WIP] Fix "select count(distinct 1) from t" where t is empty table by expanding RewriteDistinctAggregates [spark]

2024-07-30 Thread via GitHub
uros-db commented on PR #47525: URL: https://github.com/apache/spark/pull/47525#issuecomment-2259656739 I think @nikolamand-db will have to do that because it's his PR, I don't have access to edit PR description ``` ### What changes were proposed in this pull request? Fix `R

Re: [PR] [SPARK-49002][SQL] Consistently handle invalid locations in WAREHOUSE/SCHEMA/TABLE/PARTITION/DIRECTORY [spark]

2024-07-30 Thread via GitHub
cloud-fan commented on code in PR #47485: URL: https://github.com/apache/spark/pull/47485#discussion_r1697893124 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -134,9 +132,6 @@ class ResolveSessionCatalog(val catalogManager:

Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub
LuciferYang commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-2259627620 > To achieve this, the pr also refactors `OrcEncryptionSuite`: > 1. Overrides `beforeAll` to back up the contents of `CryptoUtils#keyProviderCache`. > 2. Overrides `afterAll` to

Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub
LuciferYang commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-2259623836 cc @dongjoon-hyun I'm not sure if the original intention of adding `spark.hadoop.hadoop.security.key.provider.path` in `SparkBuild.scala` was to have it take effect globally, so pleas

Re: [PR] [SPARK-49000][SQL][WIP] Fix "select count(distinct 1) from t" where t is empty table by expanding RewriteDistinctAggregates [spark]

2024-07-30 Thread via GitHub
cloud-fan commented on PR #47525: URL: https://github.com/apache/spark/pull/47525#issuecomment-2259619624 Please fill the PR description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-49003][SQL] Fix interpreted code path hashing to be collation aware [spark]

2024-07-30 Thread via GitHub
cloud-fan closed pull request #47502: [SPARK-49003][SQL] Fix interpreted code path hashing to be collation aware URL: https://github.com/apache/spark/pull/47502 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-49003][SQL] Fix interpreted code path hashing to be collation aware [spark]

2024-07-30 Thread via GitHub
cloud-fan commented on PR #47502: URL: https://github.com/apache/spark/pull/47502#issuecomment-2259618241 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub
LuciferYang commented on PR #47543: URL: https://github.com/apache/spark/pull/47543#issuecomment-2259618173 For other test cases, such as `OrcV1QuerySuite`, the code has been modified to print the contents of `CryptoUtils#keyProviderCache` in `afterAll`. Before this PR, the contents o

Re: [PR] [SC-170296] GROUP BY with MapType nested inside complex type [spark]

2024-07-30 Thread via GitHub
cloud-fan commented on code in PR #47331: URL: https://github.com/apache/spark/pull/47331#discussion_r1697879637 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InsertMapSortInGroupingExpressions.scala: ## @@ -34,12 +35,46 @@ object InsertMapSortInGrouping

Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub
LuciferYang commented on code in PR #47543: URL: https://github.com/apache/spark/pull/47543#discussion_r1697852334 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala: ## @@ -17,20 +17,53 @@ package org.apache.spark.sql.execution

Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub
LuciferYang commented on code in PR #47543: URL: https://github.com/apache/spark/pull/47543#discussion_r1697852334 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala: ## @@ -17,20 +17,53 @@ package org.apache.spark.sql.execution

Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub
LuciferYang commented on code in PR #47543: URL: https://github.com/apache/spark/pull/47543#discussion_r1697851854 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala: ## @@ -17,20 +17,53 @@ package org.apache.spark.sql.execution

Re: [PR] [SPARK-49057][SQL] Do not block the AQE loop when submitting query stages [spark]

2024-07-30 Thread via GitHub
ulysses-you commented on code in PR #47533: URL: https://github.com/apache/spark/pull/47533#discussion_r1697849217 ## sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala: ## @@ -61,23 +61,33 @@ trait BroadcastExchangeLike extends Exchange

Re: [PR] [SPARK-47430][SQL] Rework group by map type [spark]

2024-07-30 Thread via GitHub
ulysses-you commented on PR #47545: URL: https://github.com/apache/spark/pull/47545#issuecomment-2259573365 cc @cloud-fan @stevomitric thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[PR] [SPARK-47430][SQL] Rework group by map type [spark]

2024-07-30 Thread via GitHub
ulysses-you opened a new pull request, #47545: URL: https://github.com/apache/spark/pull/47545 ### What changes were proposed in this pull request? This pr reworks the group by map type to fix issues: - Can not bind reference excpetion at runtume since the attribute was wr

[PR] [SPARK-49067][SQL] Move utf-8 literal into internal methods of UrlCodec class [spark]

2024-07-30 Thread via GitHub
wForget opened a new pull request, #47544: URL: https://github.com/apache/spark/pull/47544 ### What changes were proposed in this pull request? Move utf-8 literals in url encode/decode functions to internal methods of UrlCodec class ### Why are the changes needed?

Re: [PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub
LuciferYang commented on code in PR #47543: URL: https://github.com/apache/spark/pull/47543#discussion_r1697826152 ## .github/workflows/build_and_test.yml: ## @@ -135,6 +135,12 @@ jobs: IMG_URL="ghcr.io/$REPO_OWNER/$IMG_NAME" echo "image_url=$IMG_URL" >> $GITHU

[PR] [SPARK-49066][SQL][TESTS] Refactor `OrcEncryptionSuite` and make `spark.hadoop.hadoop.security.key.provider.path` effective only within `OrcEncryptionSuite` [spark]

2024-07-30 Thread via GitHub
LuciferYang opened a new pull request, #47543: URL: https://github.com/apache/spark/pull/47543 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48936][CONNECT] Makes spark-shell works with Spark connect [spark]

2024-07-30 Thread via GitHub
pan3793 commented on PR #47402: URL: https://github.com/apache/spark/pull/47402#issuecomment-2259517619 thanks for making this change, if I understand correctly, we don't need to manually install [Coursier CLI](https://get-coursier.io/docs/cli-installation) and `spark-connect-repl` after th

Re: [PR] [SPARK-49002][SQL] Consistently handle invalid locations in WAREHOUSE/SCHEMA/TABLE/PARTITION/DIRECTORY [spark]

2024-07-30 Thread via GitHub
yaooqinn commented on code in PR #47485: URL: https://github.com/apache/spark/pull/47485#discussion_r1697808731 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -134,9 +132,6 @@ class ResolveSessionCatalog(val catalogManager:

Re: [PR] [SPARK-49002][SQL] Consistently handle invalid locations in WAREHOUSE/SCHEMA/TABLE/PARTITION/DIRECTORY [spark]

2024-07-30 Thread via GitHub
yaooqinn commented on code in PR #47485: URL: https://github.com/apache/spark/pull/47485#discussion_r1697808731 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -134,9 +132,6 @@ class ResolveSessionCatalog(val catalogManager:

Re: [PR] [SPARK-49058][SQL] Display more primitive name for the `::` operator [spark]

2024-07-30 Thread via GitHub
panbingkun commented on PR #47535: URL: https://github.com/apache/spark/pull/47535#issuecomment-2259509458 cc @cloud-fan @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-49002][SQL] Consistently handle invalid locations in WAREHOUSE/SCHEMA/TABLE/PARTITION/DIRECTORY [spark]

2024-07-30 Thread via GitHub
cloud-fan commented on code in PR #47485: URL: https://github.com/apache/spark/pull/47485#discussion_r1697804012 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -134,9 +132,6 @@ class ResolveSessionCatalog(val catalogManager:

[PR] [MINOR][DOCS] Fix typos in `docs/sql-data-sources-xml.md` [spark]

2024-07-30 Thread via GitHub
wayneguow opened a new pull request, #47542: URL: https://github.com/apache/spark/pull/47542 ### What changes were proposed in this pull request? This PR aims to fix typos in `docs/sql-data-sources-xml.md` ### Why are the changes needed? Fix typos. ### Does

Re: [PR] [SPARK-49057][SQL] Do not block the AQE loop when submitting query stages [spark]

2024-07-30 Thread via GitHub
cloud-fan commented on code in PR #47533: URL: https://github.com/apache/spark/pull/47533#discussion_r1697759562 ## sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala: ## @@ -61,23 +61,33 @@ trait BroadcastExchangeLike extends Exchange {

Re: [PR] [SPARK-42199][SQL] Fix issues around Dataset.groupByKey [spark]

2024-07-30 Thread via GitHub
cloud-fan commented on PR #39754: URL: https://github.com/apache/spark/pull/39754#issuecomment-2259429630 @EnricoMi sorry this PR is lost track. Have you addressed all the review comments? -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[PR] [SPARK-42199][SQL] Fix issues around Dataset.groupByKey [spark]

2024-07-30 Thread via GitHub
EnricoMi opened a new pull request, #39754: URL: https://github.com/apache/spark/pull/39754 ### What changes were proposed in this pull request? Introduces `ScopedExpression`, which allows to resolve an expression against a set of attributes. This is used by `Dataset.groupByKey.agg`, `Da

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-30 Thread via GitHub
bogao007 commented on PR #47133: URL: https://github.com/apache/spark/pull/47133#issuecomment-2259422798 @HyukjinKwon I fixed the dependency issue, could you help take another look? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-42199][SQL] Fix issues around Dataset.groupByKey [spark]

2024-07-30 Thread via GitHub
github-actions[bot] closed pull request #39754: [SPARK-42199][SQL] Fix issues around Dataset.groupByKey URL: https://github.com/apache/spark/pull/39754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

2024-07-30 Thread via GitHub
github-actions[bot] closed pull request #45464: [SPARK-47323][K8S] Support custom executor log urls URL: https://github.com/apache/spark/pull/45464 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [MINOR][TEST][CONNECT] Discard stdout / stderr of test Spark connect server if not isDebug [spark]

2024-07-30 Thread via GitHub
github-actions[bot] closed pull request #44836: [MINOR][TEST][CONNECT] Discard stdout / stderr of test Spark connect server if not isDebug URL: https://github.com/apache/spark/pull/44836 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] [SPARK-47573][K8S] Support custom driver log url [spark]

2024-07-30 Thread via GitHub
github-actions[bot] closed pull request #45728: [SPARK-47573][K8S] Support custom driver log url URL: https://github.com/apache/spark/pull/45728 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[PR] [WIP][SPARK-49064][BUILD] Upgrade Kafka to 3.8.0 [spark]

2024-07-30 Thread via GitHub
panbingkun opened a new pull request, #47540: URL: https://github.com/apache/spark/pull/47540 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was this

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-30 Thread via GitHub
HeartSaVioR closed pull request #47508: [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 URL: https://github.com/apache/spark/pull/47508 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-30 Thread via GitHub
HeartSaVioR commented on PR #47508: URL: https://github.com/apache/spark/pull/47508#issuecomment-2259344206 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [WIP][SPARK-49000][SQL] Fix aggregation for distinct literal [spark]

2024-07-30 Thread via GitHub
uros-db closed pull request #47505: [WIP][SPARK-49000][SQL] Fix aggregation for distinct literal URL: https://github.com/apache/spark/pull/47505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-49000][SQL] Fix aggregate behaviour with DISTINCT literals [spark]

2024-07-30 Thread via GitHub
uros-db commented on code in PR #47482: URL: https://github.com/apache/spark/pull/47482#discussion_r1697676630 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -498,7 +498,9 @@ object EliminateDistinct extends Rule[LogicalPlan] { o

Re: [PR] [SPARK-49000][SQL] Fix aggregate behaviour with DISTINCT literals [spark]

2024-07-30 Thread via GitHub
uros-db closed pull request #47482: [SPARK-49000][SQL] Fix aggregate behaviour with DISTINCT literals URL: https://github.com/apache/spark/pull/47482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-49000][SQL][WIP] Fix "select count(distinct 1) from t" where t is empty table by expanding RewriteDistinctAggregates [spark]

2024-07-30 Thread via GitHub
uros-db commented on PR #47525: URL: https://github.com/apache/spark/pull/47525#issuecomment-2259225874 @cloud-fan we've updated the tests quite a bit to try and limit the impact of e2e sql testing, but we believe it's best to keep it like this instead of using golden files - we're using lo

Re: [PR] [SPARK-49000][SQL][WIP] Fix "select count(distinct 1) from t" where t is empty table by expanding RewriteDistinctAggregates [spark]

2024-07-30 Thread via GitHub
uros-db commented on code in PR #47525: URL: https://github.com/apache/spark/pull/47525#discussion_r1697604316 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala: ## @@ -205,7 +205,8 @@ object RewriteDistinctAggregates extends

Re: [PR] [SPARK-49054][SQL][3.5] Column default value should support current_* functions [spark]

2024-07-30 Thread via GitHub
gengliangwang closed pull request #47538: [SPARK-49054][SQL][3.5] Column default value should support current_* functions URL: https://github.com/apache/spark/pull/47538 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-49054][SQL][3.5] Column default value should support current_* functions [spark]

2024-07-30 Thread via GitHub
gengliangwang commented on PR #47538: URL: https://github.com/apache/spark/pull/47538#issuecomment-2259147428 Merged to branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[PR] [SPARK-49062][SQL] Migrate XML to File Data Source V2 [spark]

2024-07-30 Thread via GitHub
wayneguow opened a new pull request, #47539: URL: https://github.com/apache/spark/pull/47539 ### What changes were proposed in this pull request? This PR aims to Migrate XML to File Data Source V2. ### Why are the changes needed? Add v2 support for XML. ###

[PR] [SPARK-49054][SQL] Column default value should support current_* functions [spark]

2024-07-30 Thread via GitHub
gengliangwang opened a new pull request, #47538: URL: https://github.com/apache/spark/pull/47538 ### What changes were proposed in this pull request? This is a regression between Spark 3.5.0 and Spark 4. The following queries work on Spark 3.5.0 while fails on latest master

Re: [PR] [SPARK-49059][CONNECT] Move `SessionHolder.forTesting(...)` to the test package [spark]

2024-07-30 Thread via GitHub
HyukjinKwon closed pull request #47536: [SPARK-49059][CONNECT] Move `SessionHolder.forTesting(...)` to the test package URL: https://github.com/apache/spark/pull/47536 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-49059][CONNECT] Move `SessionHolder.forTesting(...)` to the test package [spark]

2024-07-30 Thread via GitHub
HyukjinKwon commented on PR #47536: URL: https://github.com/apache/spark/pull/47536#issuecomment-2258744767 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48338][SQL] Improve exceptions thrown from parser/interpreter [spark]

2024-07-30 Thread via GitHub
davidm-db commented on code in PR #47537: URL: https://github.com/apache/spark/pull/47537#discussion_r1697249446 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/SqlScriptingException.scala: ## @@ -17,40 +17,77 @@ package org.apache.spark.sql.errors -import org.ap

Re: [PR] [SPARK-48338][SQL] Improve exceptions thrown from parser/interpreter [spark]

2024-07-30 Thread via GitHub
davidm-db commented on code in PR #47537: URL: https://github.com/apache/spark/pull/47537#discussion_r1697245949 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/SqlScriptingException.scala: ## @@ -17,40 +17,77 @@ package org.apache.spark.sql.errors -import org.ap

Re: [PR] [SPARK-48338][SQL] Improve exceptions thrown from parser/interpreter [spark]

2024-07-30 Thread via GitHub
davidm-db commented on code in PR #47537: URL: https://github.com/apache/spark/pull/47537#discussion_r1697234609 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/SqlScriptingException.scala: ## @@ -17,40 +17,77 @@ package org.apache.spark.sql.errors -import org.ap

Re: [PR] [SPARK-48338][SQL] Improve exceptions thrown from parser/interpreter [spark]

2024-07-30 Thread via GitHub
davidm-db commented on code in PR #47537: URL: https://github.com/apache/spark/pull/47537#discussion_r1697237880 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/SqlScriptingException.scala: ## @@ -17,40 +17,77 @@ package org.apache.spark.sql.errors -import org.ap

Re: [PR] [SPARK-48338][SQL] Improve exceptions thrown from parser/interpreter [spark]

2024-07-30 Thread via GitHub
davidm-db commented on code in PR #47537: URL: https://github.com/apache/spark/pull/47537#discussion_r1697234609 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/SqlScriptingException.scala: ## @@ -17,40 +17,77 @@ package org.apache.spark.sql.errors -import org.ap

Re: [PR] [SPARK-48338][SQL] Improve exceptions thrown from parser/interpreter [spark]

2024-07-30 Thread via GitHub
davidm-db commented on code in PR #47537: URL: https://github.com/apache/spark/pull/47537#discussion_r1697231118 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/SqlScriptingException.scala: ## @@ -17,40 +17,77 @@ package org.apache.spark.sql.errors -import org.ap

Re: [PR] [SPARK-48338][SQL] Improve exceptions thrown from parser/interpreter [spark]

2024-07-30 Thread via GitHub
davidm-db commented on code in PR #47537: URL: https://github.com/apache/spark/pull/47537#discussion_r1697231118 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/SqlScriptingException.scala: ## @@ -17,40 +17,77 @@ package org.apache.spark.sql.errors -import org.ap

Re: [PR] [SPARK-48338][SQL] Improve exceptions thrown from parser/interpreter [spark]

2024-07-30 Thread via GitHub
davidm-db commented on code in PR #47537: URL: https://github.com/apache/spark/pull/47537#discussion_r1697229129 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/SqlScriptingException.scala: ## @@ -17,40 +17,77 @@ package org.apache.spark.sql.errors -import org.ap

Re: [PR] [SPARK-48338][SQL] Improve exceptions thrown from parser/interpreter [spark]

2024-07-30 Thread via GitHub
davidm-db commented on code in PR #47537: URL: https://github.com/apache/spark/pull/47537#discussion_r1697227432 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -181,10 +183,15 @@ class AstBuilder extends DataTypeAstBuilder with SQLCo

Re: [PR] [SPARK-48762][SQL] Introduce clusterBy DataFrameWriter API for Python [spark]

2024-07-30 Thread via GitHub
HyukjinKwon closed pull request #47452: [SPARK-48762][SQL] Introduce clusterBy DataFrameWriter API for Python URL: https://github.com/apache/spark/pull/47452 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-48762][SQL] Introduce clusterBy DataFrameWriter API for Python [spark]

2024-07-30 Thread via GitHub
HyukjinKwon commented on PR #47452: URL: https://github.com/apache/spark/pull/47452#issuecomment-2258673478 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-49054][SQL] Column default value should support current_* functions [spark]

2024-07-30 Thread via GitHub
gengliangwang closed pull request #47529: [SPARK-49054][SQL] Column default value should support current_* functions URL: https://github.com/apache/spark/pull/47529 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-49054][SQL] Column default value should support current_* functions [spark]

2024-07-30 Thread via GitHub
gengliangwang commented on PR #47529: URL: https://github.com/apache/spark/pull/47529#issuecomment-2258643927 Merging to master/3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-49031] Implement validation for the TransformWithStateExec operator using OperatorStateMetadataV2 [spark]

2024-07-30 Thread via GitHub
ericm-db commented on code in PR #47508: URL: https://github.com/apache/spark/pull/47508#discussion_r1697160596 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateVariableUtils.scala: ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-49059][CONNECT] Move `SessionHolder.forTesting(...)` to the test package [spark]

2024-07-30 Thread via GitHub
vicennial commented on PR #47536: URL: https://github.com/apache/spark/pull/47536#issuecomment-2258608990 Thanks for the review @HyukjinKwon! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-49053][PYTHON][ML] Make model save/load helper functions accept spark session [spark]

2024-07-30 Thread via GitHub
HyukjinKwon closed pull request #47527: [SPARK-49053][PYTHON][ML] Make model save/load helper functions accept spark session URL: https://github.com/apache/spark/pull/47527 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-49053][PYTHON][ML] Make model save/load helper functions accept spark session [spark]

2024-07-30 Thread via GitHub
HyukjinKwon commented on PR #47527: URL: https://github.com/apache/spark/pull/47527#issuecomment-2258307830 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [MINOR][DOCS] Fix typos in `docs/sql-migration-guide.md` [spark]

2024-07-30 Thread via GitHub
HyukjinKwon closed pull request #47530: [MINOR][DOCS] Fix typos in `docs/sql-migration-guide.md` URL: https://github.com/apache/spark/pull/47530 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [MINOR][DOCS] Fix typos in `docs/sql-migration-guide.md` [spark]

2024-07-30 Thread via GitHub
HyukjinKwon commented on PR #47530: URL: https://github.com/apache/spark/pull/47530#issuecomment-2258305211 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [WIP] Goldenfile tests [spark]

2024-07-30 Thread via GitHub
asl3 closed pull request #47150: [WIP] Goldenfile tests URL: https://github.com/apache/spark/pull/47150 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [SPARK-49057][SQL] Do not block the AQE loop when submitting query stages [spark]

2024-07-30 Thread via GitHub
ulysses-you commented on code in PR #47533: URL: https://github.com/apache/spark/pull/47533#discussion_r1696900342 ## sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala: ## @@ -61,23 +61,33 @@ trait BroadcastExchangeLike extends Exchange

[PR] [SPARK-48338][SQL] Improve exceptions thrown from parser/interpreter [spark]

2024-07-30 Thread via GitHub
dusantism-db opened a new pull request, #47537: URL: https://github.com/apache/spark/pull/47537 ### What changes were proposed in this pull request? Introduced a new class SqlScriptingException, which is thrown during SQL script parsing/interpreting, and contains information about the

  1   2   >