Re: [PR] [SPARK-46301][CORE] Support `spark.worker.(initial|max)RegistrationRetries` [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun commented on PR #44229: URL: https://github.com/apache/spark/pull/44229#issuecomment-1844839244 The PR is updated with the log and the revised comment. ``` 23/12/06 23:52:50 INFO Worker: spark.worker.initialRegistrationRetries (20) is capped by

[PR] [SPARK-45597][PYTHON][SQL] Support creating table using a Python data source in SQL (single Python wrapper) [spark]

2023-12-06 Thread via GitHub
HyukjinKwon opened a new pull request, #44233: URL: https://github.com/apache/spark/pull/44233 ### What changes were proposed in this pull request? This PR is another approach of https://github.com/apache/spark/pull/43784 which proposes to support Python Data Source can be with SQL

Re: [PR] [SPARK-45597][PYTHON][SQL] Support creating table using a Python data source in SQL (single Python wrapper) [spark]

2023-12-06 Thread via GitHub
HyukjinKwon commented on PR #44233: URL: https://github.com/apache/spark/pull/44233#issuecomment-1844839197 cc @allisonwang-db and @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46303][PS][TESTS] Remove unused code in `pyspark.pandas.tests.series.* ` [spark]

2023-12-06 Thread via GitHub
zhengruifeng commented on PR #44232: URL: https://github.com/apache/spark/pull/44232#issuecomment-1844837464 ci https://github.com/zhengruifeng/spark/actions/runs/7124436504/job/19398677455 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[PR] [SPARK-46303][PS][TESTS] Remove unused code in `pyspark.pandas.tests.series.* ` [spark]

2023-12-06 Thread via GitHub
zhengruifeng opened a new pull request, #44232: URL: https://github.com/apache/spark/pull/44232 ### What changes were proposed in this pull request? Remove unused code in `pyspark.pandas.tests.series.* ` ### Why are the changes needed? clean up the code ### Does

[PR] [SPARK-46260][CONNECT] `DataFrame.withColumnsRenamed` should keep the dict/map ordering [spark]

2023-12-06 Thread via GitHub
zhengruifeng opened a new pull request, #44231: URL: https://github.com/apache/spark/pull/44231 ### What changes were proposed in this pull request? this is a follow up of https://github.com/apache/spark/pull/44177 ### Why are the changes needed? according to

Re: [PR] [Don't merge & review] [SPARK-46302][BUILD] Fix maven daily testing [spark]

2023-12-06 Thread via GitHub
panbingkun commented on PR #44208: URL: https://github.com/apache/spark/pull/44208#issuecomment-1844806071 https://github.com/apache/spark/assets/15246973/fb28e61c-510c-4e4e-b924-4f99cc7fe29e;> -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [WIP] Make `streaming-kafka-0-10` and `sql-kafka-0-10` test with the same zookeeper version as the project [spark]

2023-12-06 Thread via GitHub
LuciferYang commented on code in PR #44230: URL: https://github.com/apache/spark/pull/44230#discussion_r1418481237 ## connector/kafka-0-10-sql/pom.xml: ## @@ -122,11 +122,9 @@ org.apache.hadoop hadoop-minikdc - Review Comment: When this configuration

Re: [PR] [SPARK-46301][CORE] Support `spark.worker.(initial|max)RegistrationRetries` [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun commented on code in PR #44229: URL: https://github.com/apache/spark/pull/44229#discussion_r1418481067 ## core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala: ## @@ -96,12 +96,13 @@ private[deploy] class Worker( private val HEARTBEAT_MILLIS =

Re: [PR] [SPARK-46301][CORE] Support `spark.worker.(initial|max)RegistrationRetries` [spark]

2023-12-06 Thread via GitHub
yaooqinn commented on code in PR #44229: URL: https://github.com/apache/spark/pull/44229#discussion_r1418479088 ## core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala: ## @@ -96,12 +96,13 @@ private[deploy] class Worker( private val HEARTBEAT_MILLIS =

Re: [PR] [WIP] Make `streaming-kafka-0-10` and `sql-kafka-0-10` test with the same zookeeper version as the project [spark]

2023-12-06 Thread via GitHub
LuciferYang commented on PR #44230: URL: https://github.com/apache/spark/pull/44230#issuecomment-1844787101 Test first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] Make `streaming-kafka-0-10` and `sql-kafka-0-10` test with the same zookeeper version as the project [spark]

2023-12-06 Thread via GitHub
LuciferYang opened a new pull request, #44230: URL: https://github.com/apache/spark/pull/44230 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-46300][PYTHON][CONNECT] Match minor behaviour matching in Column with full test coverage [spark]

2023-12-06 Thread via GitHub
HyukjinKwon closed pull request #44228: [SPARK-46300][PYTHON][CONNECT] Match minor behaviour matching in Column with full test coverage URL: https://github.com/apache/spark/pull/44228 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-46300][PYTHON][CONNECT] Match minor behaviour matching in Column with full test coverage [spark]

2023-12-06 Thread via GitHub
HyukjinKwon commented on PR #44228: URL: https://github.com/apache/spark/pull/44228#issuecomment-1844772944 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46301][CORE] Support `spark.worker.(initial|max)RegistrationRetries` [spark]

2023-12-06 Thread via GitHub
LuciferYang commented on code in PR #44229: URL: https://github.com/apache/spark/pull/44229#discussion_r1418462913 ## core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala: ## @@ -96,12 +96,13 @@ private[deploy] class Worker( private val HEARTBEAT_MILLIS =

Re: [PR] [SPARK-46301][CORE] Support `spark.worker.(initial|max)RegistrationRetries` [spark]

2023-12-06 Thread via GitHub
LuciferYang commented on code in PR #44229: URL: https://github.com/apache/spark/pull/44229#discussion_r1418462913 ## core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala: ## @@ -96,12 +96,13 @@ private[deploy] class Worker( private val HEARTBEAT_MILLIS =

Re: [PR] [SPARK-46301][CORE] Support `spark.worker.(initial|max)RegistrationRetries` [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun commented on code in PR #44229: URL: https://github.com/apache/spark/pull/44229#discussion_r1418453681 ## core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala: ## @@ -96,12 +96,13 @@ private[deploy] class Worker( private val HEARTBEAT_MILLIS =

Re: [PR] [SPARK-46301][CORE] Support `spark.worker.(initial|max)RegistrationRetries` [spark]

2023-12-06 Thread via GitHub
LuciferYang commented on code in PR #44229: URL: https://github.com/apache/spark/pull/44229#discussion_r1418452248 ## core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala: ## @@ -96,12 +96,13 @@ private[deploy] class Worker( private val HEARTBEAT_MILLIS =

Re: [PR] [SPARK-46301][CORE] Support `spark.worker.(initial|max)RegistrationRetries` [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun commented on code in PR #44229: URL: https://github.com/apache/spark/pull/44229#discussion_r1418437164 ## core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala: ## @@ -96,12 +96,13 @@ private[deploy] class Worker( private val HEARTBEAT_MILLIS =

Re: [PR] [SPARK-46301][CORE] Support `spark.worker.(initial|max)RegistrationRetries` [spark]

2023-12-06 Thread via GitHub
yaooqinn commented on code in PR #44229: URL: https://github.com/apache/spark/pull/44229#discussion_r1418421509 ## core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala: ## @@ -96,12 +96,13 @@ private[deploy] class Worker( private val HEARTBEAT_MILLIS =

Re: [PR] [SPARK-45515][CORE][SQL][FOLLOWUP] Use enhanced switch expressions to replace the regular switch statement [spark]

2023-12-06 Thread via GitHub
beliefer commented on PR #44183: URL: https://github.com/apache/spark/pull/44183#issuecomment-1844597695 The GA failure is unrelated. Merged to Master. @dongjoon-hyun @LuciferYang Thank you! -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-45515][CORE][SQL][FOLLOWUP] Use enhanced switch expressions to replace the regular switch statement [spark]

2023-12-06 Thread via GitHub
beliefer closed pull request #44183: [SPARK-45515][CORE][SQL][FOLLOWUP] Use enhanced switch expressions to replace the regular switch statement URL: https://github.com/apache/spark/pull/44183 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-46301][CORE] Support `spark.worker.(initial|max)RegistrationRetries` [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun commented on PR #44229: URL: https://github.com/apache/spark/pull/44229#issuecomment-1844517261 Could you review this PR, @yaooqinn and @LuciferYang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-46279][SQL] Support write partition values to data files in FileFormatWritter [spark]

2023-12-06 Thread via GitHub
lzlfred commented on code in PR #44195: URL: https://github.com/apache/spark/pull/44195#discussion_r1418363798 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala: ## @@ -52,6 +52,11 @@ object DataSourceUtils extends PredicateHelper {

Re: [PR] [SPARK-46298][PYTHON][CONNECT] Match deprecation warning, test case, and error of Catalog.createExternalTable [spark]

2023-12-06 Thread via GitHub
HyukjinKwon closed pull request #44226: [SPARK-46298][PYTHON][CONNECT] Match deprecation warning, test case, and error of Catalog.createExternalTable URL: https://github.com/apache/spark/pull/44226 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-46298][PYTHON][CONNECT] Match deprecation warning, test case, and error of Catalog.createExternalTable [spark]

2023-12-06 Thread via GitHub
HyukjinKwon commented on PR #44226: URL: https://github.com/apache/spark/pull/44226#issuecomment-1844410191 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46296][PYTHON][TESTS] Test missing test coverage for captured errors (pyspark.errors.exceptions) [spark]

2023-12-06 Thread via GitHub
HyukjinKwon closed pull request #44224: [SPARK-46296][PYTHON][TESTS] Test missing test coverage for captured errors (pyspark.errors.exceptions) URL: https://github.com/apache/spark/pull/44224 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[PR] [SPARK-46301][CORE] Support `spark.worker.(initial|max)RegistrationRetries` [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun opened a new pull request, #44229: URL: https://github.com/apache/spark/pull/44229 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-46296][PYTHON][TESTS] Test missing test coverage for captured errors (pyspark.errors.exceptions) [spark]

2023-12-06 Thread via GitHub
HyukjinKwon commented on PR #44224: URL: https://github.com/apache/spark/pull/44224#issuecomment-1844335046 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

2023-12-06 Thread via GitHub
junyuc25 commented on code in PR #44211: URL: https://github.com/apache/spark/pull/44211#discussion_r1418314328 ## connector/kinesis-asl-assembly/pom.xml: ## @@ -62,12 +62,18 @@ com.google.protobuf protobuf-java - 2.6.1 - + compile + +

Re: [PR] [SPARK-46272][SQL] Support CTAS using DSv2 sources [spark]

2023-12-06 Thread via GitHub
cloud-fan commented on code in PR #44190: URL: https://github.com/apache/spark/pull/44190#discussion_r1418295715 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala: ## @@ -723,6 +724,158 @@ class DataSourceV2Suite extends QueryTest with

Re: [PR] [SPARK-46272][SQL] Support CTAS using DSv2 sources [spark]

2023-12-06 Thread via GitHub
cloud-fan commented on code in PR #44190: URL: https://github.com/apache/spark/pull/44190#discussion_r1418289369 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala: ## @@ -723,6 +724,158 @@ class DataSourceV2Suite extends QueryTest with

[PR] [SPARK-46300][PYTHON][CONNECT] Match minor behaviour matching in Column with full test coverage [spark]

2023-12-06 Thread via GitHub
HyukjinKwon opened a new pull request, #44228: URL: https://github.com/apache/spark/pull/44228 ### What changes were proposed in this pull request? This PR matches the corner case behaviours in `Column` between Spark Connect and non-Spark Connect with adding unittests with the full

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-06 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1418279638 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +72,57 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-06 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1418275721 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +71,56 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-06 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1418275721 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +71,56 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-06 Thread via GitHub
beliefer commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1418273111 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +71,56 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-46275] Protobuf: Return null in permissive mode when deserialization fails. [spark]

2023-12-06 Thread via GitHub
rangadi commented on PR #44214: URL: https://github.com/apache/spark/pull/44214#issuecomment-1844204708 @LuciferYang, I was thinking about such an option too. Mostly it is not required. The current behaviour this PR is more surprising to the users. We have see only a few customers try

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-06 Thread via GitHub
beliefer commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1418264040 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +72,57 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-45580][SQL][3.3] Handle case where a nested subquery becomes an existence join [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun commented on PR #44223: URL: https://github.com/apache/spark/pull/44223#issuecomment-1844194165 Merged to branch-3.3 for Apache Spark 3.3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-45580][SQL][3.3] Handle case where a nested subquery becomes an existence join [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun closed pull request #44223: [SPARK-45580][SQL][3.3] Handle case where a nested subquery becomes an existence join URL: https://github.com/apache/spark/pull/44223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-45580][SQL][3.4] Handle case where a nested subquery becomes an existence join [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun closed pull request #44219: [SPARK-45580][SQL][3.4] Handle case where a nested subquery becomes an existence join URL: https://github.com/apache/spark/pull/44219 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-45580][SQL][3.4] Handle case where a nested subquery becomes an existence join [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun commented on PR #44219: URL: https://github.com/apache/spark/pull/44219#issuecomment-1844193661 Merged to branch-3.4 for Apache Spark 3.4.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-46299][DOCS] Make `spark.deploy.recovery*` docs up-to-date [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun closed pull request #44227: [SPARK-46299][DOCS] Make `spark.deploy.recovery*` docs up-to-date URL: https://github.com/apache/spark/pull/44227 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-46299][DOCS] Make `spark.deploy.recovery*` docs up-to-date [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun commented on PR #44227: URL: https://github.com/apache/spark/pull/44227#issuecomment-1844189961 I verified this manually when I attached the screenshot. Let me merge this~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-46299][DOCS] Make `spark.deploy.recovery*` docs up-to-date [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun commented on PR #44227: URL: https://github.com/apache/spark/pull/44227#issuecomment-1844189086 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46298][PYTHON][CONNECT] Match deprecation warning, test case, and error of Catalog.createExternalTable [spark]

2023-12-06 Thread via GitHub
HyukjinKwon commented on PR #44226: URL: https://github.com/apache/spark/pull/44226#issuecomment-1844179401 Build: https://github.com/HyukjinKwon/spark/actions/runs/7123232849 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] [SPARK-46299][DOCS] Make `spark.deploy.recovery*` up-to-date [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun opened a new pull request, #44227: URL: https://github.com/apache/spark/pull/44227 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[PR] [SPARK-46298][PYTHON] Test catalog error cases (pyspark.sql.catalog) with minor cleanups [spark]

2023-12-06 Thread via GitHub
HyukjinKwon opened a new pull request, #44226: URL: https://github.com/apache/spark/pull/44226 ### What changes were proposed in this pull request? This PR adds tests for catalog error cases for `createExternalTable`. Also, this PR includes several minor cleanups: - Show a

Re: [PR] [SPARK-46283][INFRA] Remove `streaming-kinesis-asl` module from `MODULES_TO_TEST` for branch-3.x daily tests [spark]

2023-12-06 Thread via GitHub
LuciferYang commented on PR #44204: URL: https://github.com/apache/spark/pull/44204#issuecomment-1844160700 Thanks @dongjoon-hyun @HyukjinKwon @LuciferYang @junyuc25 ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-06 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1418257351 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +71,56 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-45515][CORE][SQL][FOLLOWUP] Use enhanced switch expressions to replace the regular switch statement [spark]

2023-12-06 Thread via GitHub
beliefer commented on code in PR #44183: URL: https://github.com/apache/spark/pull/44183#discussion_r1418256199 ## sql/hive/src/test/java/org/apache/spark/sql/hive/test/Complex.java: ## @@ -441,79 +441,79 @@ public void setMStringStringIsSet(boolean value) { public void

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-06 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1418256276 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +72,57 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-46275] Protobuf: Return null in permissive mode when deserialization fails. [spark]

2023-12-06 Thread via GitHub
LuciferYang commented on PR #44214: URL: https://github.com/apache/spark/pull/44214#issuecomment-1844151075 This is a change in the default behavior of the built-in function. Should we consider adding a config to restore the legacy behavior? Additionally, since this is a user-facing

Re: [PR] [SPARK-46297][PYTHON][INFRA] Exclude generated files from the code coverage report [spark]

2023-12-06 Thread via GitHub
HyukjinKwon closed pull request #44225: [SPARK-46297][PYTHON][INFRA] Exclude generated files from the code coverage report URL: https://github.com/apache/spark/pull/44225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46297][PYTHON][INFRA] Exclude generated files from the code coverage report [spark]

2023-12-06 Thread via GitHub
HyukjinKwon commented on PR #44225: URL: https://github.com/apache/spark/pull/44225#issuecomment-1844150466 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-06 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1418250683 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +72,57 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-45515][CORE][SQL][FOLLOWUP] Use enhanced switch expressions to replace the regular switch statement [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun commented on code in PR #44183: URL: https://github.com/apache/spark/pull/44183#discussion_r1418249978 ## sql/hive/src/test/java/org/apache/spark/sql/hive/test/Complex.java: ## @@ -441,79 +441,79 @@ public void setMStringStringIsSet(boolean value) { public

Re: [PR] [SPARK-45515][CORE][SQL][FOLLOWUP] Use enhanced switch expressions to replace the regular switch statement [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun commented on code in PR #44183: URL: https://github.com/apache/spark/pull/44183#discussion_r1418249312 ## sql/hive/src/test/java/org/apache/spark/sql/hive/test/Complex.java: ## @@ -441,79 +441,79 @@ public void setMStringStringIsSet(boolean value) { public

[PR] [SPARK-46297][PYTHON][INFRA] Exclude generated files from the code coverage report [spark]

2023-12-06 Thread via GitHub
HyukjinKwon opened a new pull request, #44225: URL: https://github.com/apache/spark/pull/44225 ### What changes were proposed in this pull request? This PR proposes to exclude generated files from the code coverage report, `pyspark/sql/connect/proto/*`. ### Why are the changes

Re: [PR] [SPARK-46296[PYTHON][TESTS] Test captured errors of TestResult (pyspark.errors.exceptions) [spark]

2023-12-06 Thread via GitHub
HyukjinKwon commented on code in PR #44224: URL: https://github.com/apache/spark/pull/44224#discussion_r1418241835 ## python/pyspark/errors/exceptions/captured.py: ## @@ -104,7 +104,7 @@ def getMessageParameters(self) -> Optional[Dict[str, str]]: if self._origin is not

Re: [PR] [SPARK-46296[PYTHON][TESTS] Test captured errors of TestResult (pyspark.errors.exceptions) [spark]

2023-12-06 Thread via GitHub
HyukjinKwon commented on PR #44224: URL: https://github.com/apache/spark/pull/44224#issuecomment-1844123954 Build: https://github.com/HyukjinKwon/spark/actions/runs/7123082964 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] [SPARK-46296[PYTHON][TESTS] Test captured errors of TestResult (pyspark.errors.exceptions) [spark]

2023-12-06 Thread via GitHub
HyukjinKwon opened a new pull request, #44224: URL: https://github.com/apache/spark/pull/44224 ### What changes were proposed in this pull request? This PR adds tests for negative cases of `getErrorClass` and `getSqlState`. And test case for `getMessageParameters` for errors.

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-06 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1418239659 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +71,56 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-45515][CORE][SQL][FOLLOWUP] Use enhanced switch expressions to replace the regular switch statement [spark]

2023-12-06 Thread via GitHub
LuciferYang commented on code in PR #44183: URL: https://github.com/apache/spark/pull/44183#discussion_r1418237100 ## sql/hive/src/test/java/org/apache/spark/sql/hive/test/Complex.java: ## @@ -441,79 +441,79 @@ public void setMStringStringIsSet(boolean value) { public void

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-06 Thread via GitHub
beliefer commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1418231837 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +71,56 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-45515][CORE][SQL][FOLLOWUP] Use enhanced switch expressions to replace the regular switch statement [spark]

2023-12-06 Thread via GitHub
beliefer commented on PR #44183: URL: https://github.com/apache/spark/pull/44183#issuecomment-1844090656 cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-45515][CORE][SQL][FOLLOWUP] Use enhanced switch expressions to replace the regular switch statement [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun commented on PR #44183: URL: https://github.com/apache/spark/pull/44183#issuecomment-1844099331 Feel free to merge, @beliefer ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-06 Thread via GitHub
beliefer commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1418226850 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +72,57 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-44976] Preserve full principal user name on executor side [spark]

2023-12-06 Thread via GitHub
eubnara commented on PR #42690: URL: https://github.com/apache/spark/pull/42690#issuecomment-1844087739 It should be considered when using kerberized cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-06 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1418213459 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +71,56 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-06 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1418213459 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +71,56 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-06 Thread via GitHub
beliefer commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1418209839 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +71,56 @@ object InferWindowGroupLimit extends

Re: [PR] [WIP][SPARK-46282][PYTHON][DOCS] Create a Standalone Page for DataFrame API in PySpark Documentation [spark]

2023-12-06 Thread via GitHub
itholic commented on PR #44201: URL: https://github.com/apache/spark/pull/44201#issuecomment-1844055734 Hmm... I think we need to have a further discussion how to manage the Spark SQL and DataFrame documentation. Let me just close this PR for now. -- This is an automated message from the

Re: [PR] [SPARK-46058][CORE] Add separate flag for privateKeyPassword [spark]

2023-12-06 Thread via GitHub
mridulm commented on PR #43998: URL: https://github.com/apache/spark/pull/43998#issuecomment-1844054833 Merged to master. Thanks for adding this @hasnain-db ! Thanks for the review @JoshRosen :-) -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-46058][CORE] Add separate flag for privateKeyPassword [spark]

2023-12-06 Thread via GitHub
mridulm closed pull request #43998: [SPARK-46058][CORE] Add separate flag for privateKeyPassword URL: https://github.com/apache/spark/pull/43998 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [WIP][SPARK-46282][PYTHON][DOCS] Create a Standalone Page for DataFrame API in PySpark Documentation [spark]

2023-12-06 Thread via GitHub
itholic closed pull request #44201: [WIP][SPARK-46282][PYTHON][DOCS] Create a Standalone Page for DataFrame API in PySpark Documentation URL: https://github.com/apache/spark/pull/44201 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-06 Thread via GitHub
beliefer commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1418204345 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +71,56 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-46292][CORE][UI] Show a summary of workers in MasterPage [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun closed pull request #44218: [SPARK-46292][CORE][UI] Show a summary of workers in MasterPage URL: https://github.com/apache/spark/pull/44218 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-46292][CORE][UI] Show a summary of workers in MasterPage [spark]

2023-12-06 Thread via GitHub
dongjoon-hyun commented on PR #44218: URL: https://github.com/apache/spark/pull/44218#issuecomment-1844027166 Thank you, @yaooqinn and @itholic . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-46286][DOCS] Document `spark.io.compression.zstd.bufferPool.enabled` [spark]

2023-12-06 Thread via GitHub
yaooqinn commented on PR #44207: URL: https://github.com/apache/spark/pull/44207#issuecomment-1844024371 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46292][CORE][UI] Show a summary of workers in MasterPage [spark]

2023-12-06 Thread via GitHub
yaooqinn commented on PR #44218: URL: https://github.com/apache/spark/pull/44218#issuecomment-1844023158 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-06 Thread via GitHub
beliefer commented on code in PR #44110: URL: https://github.com/apache/spark/pull/44110#discussion_r1418184528 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala: ## @@ -305,21 +305,35 @@ trait SparkDateTimeUtils { (segment == 0 &&

Re: [PR] [SPARK-46270][SQL][CORE][SS] Use java16 instanceof expressions to replace the java8 instanceof statement. [spark]

2023-12-06 Thread via GitHub
beliefer commented on PR #44187: URL: https://github.com/apache/spark/pull/44187#issuecomment-1843976510 @dongjoon-hyun @LuciferYang Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [WIP][SPARK-46293][CONNECT][DOCS][PYTHON] Add `protobuf` to required dependency for Spark Connect [spark]

2023-12-06 Thread via GitHub
itholic commented on PR #44221: URL: https://github.com/apache/spark/pull/44221#issuecomment-1843971025 Let me mark this PR as draft until CI is done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-46293][CONNECT][DOCS][PYTHON] Add `protobuf` to required dependency for Spark Connect [spark]

2023-12-06 Thread via GitHub
itholic commented on code in PR #44221: URL: https://github.com/apache/spark/pull/44221#discussion_r1418178298 ## python/docs/source/getting_started/install.rst: ## @@ -161,6 +161,7 @@ PackageSupported version Note `numpy`>=1.21

Re: [PR] [SPARK-46112][BUILD][PYTHON] Implement lint check for PySpark custom errors [spark]

2023-12-06 Thread via GitHub
itholic commented on code in PR #44203: URL: https://github.com/apache/spark/pull/44203#discussion_r1418176822 ## dev/check_pyspark_custom_errors.py: ## @@ -0,0 +1,100 @@ +import sys + +sys.path.insert(0, "python") +import os +from pyspark import errors as pyspark_errors +from

Re: [PR] [SPARK-46112][BUILD][PYTHON] Implement lint check for PySpark custom errors [spark]

2023-12-06 Thread via GitHub
itholic commented on code in PR #44203: URL: https://github.com/apache/spark/pull/44203#discussion_r1418176822 ## dev/check_pyspark_custom_errors.py: ## @@ -0,0 +1,100 @@ +import sys + +sys.path.insert(0, "python") +import os +from pyspark import errors as pyspark_errors +from

Re: [PR] [SPARK-46112][BUILD][PYTHON] Implement lint check for PySpark custom errors [spark]

2023-12-06 Thread via GitHub
itholic commented on code in PR #44203: URL: https://github.com/apache/spark/pull/44203#discussion_r1418176488 ## dev/check_pyspark_custom_errors.py: ## @@ -0,0 +1,100 @@ +import sys Review Comment: Added license. -- This is an automated message from the Apache Git

Re: [PR] [SPARK-46112][BUILD][PYTHON] Implement lint check for PySpark custom errors [spark]

2023-12-06 Thread via GitHub
itholic commented on code in PR #44203: URL: https://github.com/apache/spark/pull/44203#discussion_r1418176326 ## dev/check_pyspark_custom_errors.py: ## @@ -0,0 +1,100 @@ +import sys + +sys.path.insert(0, "python") +import os +from pyspark import errors as pyspark_errors

Re: [PR] [SPARK-46112][BUILD][PYTHON] Implement lint check for PySpark custom errors [spark]

2023-12-06 Thread via GitHub
itholic commented on code in PR #44203: URL: https://github.com/apache/spark/pull/44203#discussion_r1418176189 ## dev/check_pyspark_custom_errors.py: ## @@ -0,0 +1,100 @@ +import sys + +sys.path.insert(0, "python") +import os +from pyspark import errors as pyspark_errors +from

Re: [PR] [SPARK-46112][BUILD][PYTHON] Implement lint check for PySpark custom errors [spark]

2023-12-06 Thread via GitHub
itholic commented on code in PR #44203: URL: https://github.com/apache/spark/pull/44203#discussion_r1418176014 ## dev/check_pyspark_custom_errors.py: ## @@ -0,0 +1,100 @@ +import sys + +sys.path.insert(0, "python") +import os +from pyspark import errors as pyspark_errors +from

Re: [PR] [SPARK-46112][BUILD][PYTHON] Implement lint check for PySpark custom errors [spark]

2023-12-06 Thread via GitHub
itholic commented on code in PR #44203: URL: https://github.com/apache/spark/pull/44203#discussion_r1418175172 ## dev/lint-python: ## @@ -46,6 +46,9 @@ while (( "$#" )); do --black) BLACK_TEST=true ;; +--custom-pyspark-error) Review Comment: Fixed

Re: [PR] [SPARK-46112][BUILD][PYTHON] Implement lint check for PySpark custom errors [spark]

2023-12-06 Thread via GitHub
itholic commented on code in PR #44203: URL: https://github.com/apache/spark/pull/44203#discussion_r1418175863 ## dev/check_pyspark_custom_errors.py: ## @@ -0,0 +1,100 @@ +import sys + +sys.path.insert(0, "python") +import os +from pyspark import errors as pyspark_errors +from

Re: [PR] [SPARK-46058][CORE] Add separate flag for privateKeyPassword [spark]

2023-12-06 Thread via GitHub
hasnain-db commented on PR #43998: URL: https://github.com/apache/spark/pull/43998#issuecomment-1843956053 @mridulm CI is now green -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-45580][SQL][3.3] Handle case where a nested subquery becomes an existence join [spark]

2023-12-06 Thread via GitHub
bersprockets opened a new pull request, #44223: URL: https://github.com/apache/spark/pull/44223 ### What changes were proposed in this pull request? This is a back-port of https://github.com/apache/spark/pull/44193. In `RewritePredicateSubquery`, prune existence flags from the

Re: [PR] [SPARK-46290][PYTHON] Change saveMode to a boolean flag for DataSourceWriter [spark]

2023-12-06 Thread via GitHub
HyukjinKwon commented on PR #44216: URL: https://github.com/apache/spark/pull/44216#issuecomment-1843946657 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46290][PYTHON] Change saveMode to a boolean flag for DataSourceWriter [spark]

2023-12-06 Thread via GitHub
HyukjinKwon closed pull request #44216: [SPARK-46290][PYTHON] Change saveMode to a boolean flag for DataSourceWriter URL: https://github.com/apache/spark/pull/44216 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-46293][CONNECT][DOCS][PYTHON] Add `protobuf` to required dependency for Spark Connect [spark]

2023-12-06 Thread via GitHub
HyukjinKwon commented on code in PR #44221: URL: https://github.com/apache/spark/pull/44221#discussion_r1418163093 ## python/docs/source/getting_started/install.rst: ## @@ -161,6 +161,7 @@ PackageSupported version Note `numpy`>=1.21

Re: [PR] [SPARK-46112][BUILD][PYTHON] Implement lint check for PySpark custom errors [spark]

2023-12-06 Thread via GitHub
HyukjinKwon commented on code in PR #44203: URL: https://github.com/apache/spark/pull/44203#discussion_r1418161851 ## dev/check_pyspark_custom_errors.py: ## @@ -0,0 +1,100 @@ +import sys + +sys.path.insert(0, "python") +import os +from pyspark import errors as pyspark_errors

  1   2   3   >