Re: [PR] [SPARK-48134][CORE] Spark core (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-06 Thread via GitHub
gengliangwang commented on code in PR #46390: URL: https://github.com/apache/spark/pull/46390#discussion_r1591851400 ## core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java: ## @@ -17,21 +17,22 @@ package

Re: [PR] [SPARK-48049][BUILD] Upgrade Scala to 2.13.14 [spark]

2024-05-06 Thread via GitHub
LuciferYang commented on PR #46288: URL: https://github.com/apache/spark/pull/46288#issuecomment-2097501153 I think we need wait a new Ammonite release version, as 3.0-M1 does not support Scala 2.13.14.

[PR] [SPARK-48090][SS][PYTHON][TESTS] Shorten the traceback in the test checking error message in UDF [spark]

2024-05-06 Thread via GitHub
HyukjinKwon opened a new pull request, #46426: URL: https://github.com/apache/spark/pull/46426 ### What changes were proposed in this pull request? This PR reduces traceback so the actual error `ZeroDivisionError` can be tested in

Re: [PR] [SPARK-48037][CORE] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on PR #46273: URL: https://github.com/apache/spark/pull/46273#issuecomment-2097474800 > Looks like this missed the 3.4 release by a month @dongjoon-hyun ... might have been a nice addition to it ! Ya, I agree and this is still a good addition to 4.0.0-preview.

Re: [PR] [SPARK-48131][Core] Unify MDC key `mdc.taskName` and `task_name` [spark]

2024-05-06 Thread via GitHub
gengliangwang commented on PR #46386: URL: https://github.com/apache/spark/pull/46386#issuecomment-2097446465 @mridulm As the task name MDC is frequently showing in the logs; I would say this is necessary for the new logging framework. After the renaming, the MDC names are consistent and

Re: [PR] [SPARK-48037][CORE] Fix SortShuffleWriter lacks shuffle write related metrics resulting in potentially inaccurate data [spark]

2024-05-06 Thread via GitHub
mridulm commented on PR #46273: URL: https://github.com/apache/spark/pull/46273#issuecomment-2097438661 Looks like this missed the 3.4 release @dongjoon-hyun ... might have been a nice addition to it ! -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-43861][CORE] Do not delete inprogress log [spark]

2024-05-06 Thread via GitHub
mridulm commented on PR #46025: URL: https://github.com/apache/spark/pull/46025#issuecomment-2097435499 `reader.completed` is checking for the `IN_PROGRESS` suffix - which will be the case here @bluzy: and so with this PR, it will never clean up those files. (Some users/deployments have

Re: [PR] [SPARK-48163][CONNECT][TESTS] Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command` [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on PR #46425: URL: https://github.com/apache/spark/pull/46425#issuecomment-2097432716 Merged to master for Apache Spark 4.0.0-preview. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48163][CONNECT][TESTS] Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command` [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun closed pull request #46425: [SPARK-48163][CONNECT][TESTS] Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command` URL: https://github.com/apache/spark/pull/46425 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-48131][Core] Unify MDC key `mdc.taskName` and `task_name` [spark]

2024-05-06 Thread via GitHub
mridulm commented on PR #46386: URL: https://github.com/apache/spark/pull/46386#issuecomment-2097430754 Is this change strictly necessary would be the question ... if it is, we can evaluate it in that context. If not and is a nice to have, it is better not to make breaking changes which

Re: [PR] [SPARK-48163][CONNECT][TESTS] Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command` [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on PR #46425: URL: https://github.com/apache/spark/pull/46425#issuecomment-2097420985 Thank you, @yaooqinn . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR][CONNECT][TESTS] Improve test failure error message in StreamingParityTests [spark]

2024-05-06 Thread via GitHub
HyukjinKwon closed pull request #46420: [MINOR][CONNECT][TESTS] Improve test failure error message in StreamingParityTests URL: https://github.com/apache/spark/pull/46420 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [MINOR][CONNECT][TESTS] Improve test failure error message in StreamingParityTests [spark]

2024-05-06 Thread via GitHub
HyukjinKwon commented on PR #46420: URL: https://github.com/apache/spark/pull/46420#issuecomment-2097420760 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48163][CONNECT][TESTS] Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command` [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on PR #46425: URL: https://github.com/apache/spark/pull/46425#issuecomment-2097417035 Thank you, @HyukjinKwon . I believe this will give a proper and better visibility to this test case issue before `Connect GA`. -- This is an automated message from the Apache Git

[PR] [SPARK-48163][CONNECT][TESTS] Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command` [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun opened a new pull request, #46425: URL: https://github.com/apache/spark/pull/46425 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-48154][PYTHON][CONNECT][TESTS] Enable `PandasUDFGroupedAggParityTests.test_manual` [spark]

2024-05-06 Thread via GitHub
zhengruifeng commented on PR #46418: URL: https://github.com/apache/spark/pull/46418#issuecomment-2097401736 thanks @dongjoon-hyun and @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48154][PYTHON][CONNECT][TESTS] Enable `PandasUDFGroupedAggParityTests.test_manual` [spark]

2024-05-06 Thread via GitHub
zhengruifeng commented on PR #46418: URL: https://github.com/apache/spark/pull/46418#issuecomment-2097401503 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48154][PYTHON][CONNECT][TESTS] Enable `PandasUDFGroupedAggParityTests.test_manual` [spark]

2024-05-06 Thread via GitHub
zhengruifeng closed pull request #46418: [SPARK-48154][PYTHON][CONNECT][TESTS] Enable `PandasUDFGroupedAggParityTests.test_manual` URL: https://github.com/apache/spark/pull/46418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] [WIP][SQL] Add collation support for variant expressions [spark]

2024-05-06 Thread via GitHub
uros-db opened a new pull request, #46424: URL: https://github.com/apache/spark/pull/46424 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [WIP][SQL] Format expressions [spark]

2024-05-06 Thread via GitHub
uros-db opened a new pull request, #46423: URL: https://github.com/apache/spark/pull/46423 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [WIP][SQL] Add collation support for hash expressions [spark]

2024-05-06 Thread via GitHub
uros-db opened a new pull request, #46422: URL: https://github.com/apache/spark/pull/46422 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [WIP][SQL] Eliminate unnecessary COLLATE expressions in query analysis [spark]

2024-05-06 Thread via GitHub
uros-db opened a new pull request, #46421: URL: https://github.com/apache/spark/pull/46421 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48083][SPARK-48084][ML][TESTS] Remove JIRA comments for reenabling ML compatibility tests [spark]

2024-05-06 Thread via GitHub
WeichenXu123 commented on code in PR #46419: URL: https://github.com/apache/spark/pull/46419#discussion_r1591778164 ## python/pyspark/ml/tests/connect/test_connect_evaluation.py: ## @@ -20,7 +20,6 @@ from pyspark.sql import SparkSession from

Re: [PR] [SPARK-48083][SPARK-48084][ML][TESTS] Remove JIRA comments for reenabling ML compatibility tests [spark]

2024-05-06 Thread via GitHub
WeichenXu123 commented on code in PR #46419: URL: https://github.com/apache/spark/pull/46419#discussion_r1591777525 ## python/pyspark/ml/tests/connect/test_connect_classification.py: ## @@ -21,7 +21,6 @@ from pyspark.sql import SparkSession from

[PR] [MINOR][CONNECT][TESTS] Improve test failure error message in StreamingParityTests [spark]

2024-05-06 Thread via GitHub
HyukjinKwon opened a new pull request, #46420: URL: https://github.com/apache/spark/pull/46420 ### What changes were proposed in this pull request? This PR improves the test failure error message in `StreamingParityTests` ### Why are the changes needed? To see the

Re: [PR] [SPARK-48143][SQL] Use lightweight exceptions for control-flow between UnivocityParser and FailureSafeParser [spark]

2024-05-06 Thread via GitHub
gene-db commented on code in PR #46400: URL: https://github.com/apache/spark/pull/46400#discussion_r1591776687 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala: ## @@ -67,16 +67,23 @@ case class PartialResultArrayException( extends

Re: [PR] [SPARK-48141][TEST] Update the Oracle docker image version used for test and integration to use Oracle Database 23ai Free [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun closed pull request #46399: [SPARK-48141][TEST] Update the Oracle docker image version used for test and integration to use Oracle Database 23ai Free URL: https://github.com/apache/spark/pull/46399 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-48083][SPARK-48084][ML][TESTS] Remove JIRAs for reenabling ML compatibility tests [spark]

2024-05-06 Thread via GitHub
HyukjinKwon commented on PR #46419: URL: https://github.com/apache/spark/pull/46419#issuecomment-2097387986 Would you mind describing why those skips are legitimate? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47336][SQL][CONNECT] Provide to PySpark a functionality to get estimated size of DataFrame in bytes [spark]

2024-05-06 Thread via GitHub
zhengruifeng commented on code in PR #46368: URL: https://github.com/apache/spark/pull/46368#discussion_r1591773238 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -283,6 +283,16 @@ class Dataset[T] private[sql] ( def

Re: [PR] [SPARK-48154][PYTHON][CONNECT][TESTS] Enable `PandasUDFGroupedAggParityTests.test_manual` [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on code in PR #46418: URL: https://github.com/apache/spark/pull/46418#discussion_r1591773149 ## python/pyspark/sql/tests/connect/test_parity_pandas_udf_grouped_agg.py: ## @@ -20,10 +20,11 @@ from pyspark.testing.connectutils import ReusedConnectTestCase

[PR] [SPARK-48083] [SPARK-48084] Enable spark ml test [spark]

2024-05-06 Thread via GitHub
WeichenXu123 opened a new pull request, #46419: URL: https://github.com/apache/spark/pull/46419 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[PR] [SPARK-48154][PYTHON][CONNECT][TESTS] Enable `PandasUDFGroupedAggParityTests.test_manual` [spark]

2024-05-06 Thread via GitHub
zhengruifeng opened a new pull request, #46418: URL: https://github.com/apache/spark/pull/46418 ### What changes were proposed in this pull request? Enable `PandasUDFGroupedAggParityTests.test_manual` ### Why are the changes needed? for test coverage ### Does this

Re: [PR] [SPARK-47960][SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-05-06 Thread via GitHub
HeartSaVioR commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1591767311 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -347,6 +347,28 @@ class IncrementalExecution(

Re: [PR] [SPARK-47960][SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-05-06 Thread via GitHub
HeartSaVioR commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1591752313 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -347,6 +347,28 @@ class IncrementalExecution(

Re: [PR] [SPARK-48152][BUILD] Publish the module `spark-profiler` to `maven central repository` [spark]

2024-05-06 Thread via GitHub
panbingkun commented on code in PR #46402: URL: https://github.com/apache/spark/pull/46402#discussion_r1591763188 ## connector/profiler/pom.xml: ## @@ -44,7 +44,7 @@ me.bechberger ap-loader-all - 3.0-8 + 3.0-9 Review Comment: Okay -- This

Re: [PR] [SPARK-48150][SQL] try_parse_json output should be declared as nullable [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on PR #46409: URL: https://github.com/apache/spark/pull/46409#issuecomment-2097363087 Merged to master for Apache Spark 4.0.0-preview. Thank you, @JoshRosen and all. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-48150][SQL] try_parse_json output should be declared as nullable [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun closed pull request #46409: [SPARK-48150][SQL] try_parse_json output should be declared as nullable URL: https://github.com/apache/spark/pull/46409 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] Make some corrections in the docstring of pyspark DataStreamReader methods [spark]

2024-05-06 Thread via GitHub
chloeh13q opened a new pull request, #46416: URL: https://github.com/apache/spark/pull/46416 ### What changes were proposed in this pull request? The docstrings of the pyspark DataStream Reader methods `csv()` and `text()` say that the `path` parameter can be a list,

[PR] [SPARK-48105][SS][3.5] Fix the race condition between state store unloading and snapshotting [spark]

2024-05-06 Thread via GitHub
huanliwang-db opened a new pull request, #46415: URL: https://github.com/apache/spark/pull/46415 * When we close the hdfs state store, we should only remove the entry from `loadedMaps` rather than doing the active data cleanup. JVM GC should be able to help us GC those objects.

Re: [PR] [SPARK-48143][SQL] Use lightweight exceptions for control-flow between UnivocityParser and FailureSafeParser [spark]

2024-05-06 Thread via GitHub
cloud-fan commented on code in PR #46400: URL: https://github.com/apache/spark/pull/46400#discussion_r1591749162 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/BadRecordException.scala: ## @@ -67,16 +67,23 @@ case class PartialResultArrayException( extends

Re: [PR] [SPARK-48153][INFRA] Run `build` job of `build_and_test.yml` only if needed [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on PR #46412: URL: https://github.com/apache/spark/pull/46412#issuecomment-2097332326 For the record, the very next commit (SPARK-48147) shows that `build` step (with 10 sub test pipelines) is skipped completely and successfully.

Re: [PR] [SPARK-47336][SQL][CONNECT] Provide to PySpark a functionality to get estimated size of DataFrame in bytes [spark]

2024-05-06 Thread via GitHub
HyukjinKwon commented on code in PR #46368: URL: https://github.com/apache/spark/pull/46368#discussion_r1591746209 ## python/pyspark/sql/connect/client/core.py: ## @@ -1157,6 +1163,20 @@ def _analyze_plan_request_with_metadata(self) -> pb2.AnalyzePlanRequest:

Re: [PR] [SPARK-47336][SQL][CONNECT] Provide to PySpark a functionality to get estimated size of DataFrame in bytes [spark]

2024-05-06 Thread via GitHub
HyukjinKwon commented on code in PR #46368: URL: https://github.com/apache/spark/pull/46368#discussion_r1591746323 ## python/pyspark/sql/dataframe.py: ## @@ -657,6 +657,19 @@ def printSchema(self, level: Optional[int] = None) -> None: """ ... +

Re: [PR] [SPARK-47336][SQL][CONNECT] Provide to PySpark a functionality to get estimated size of DataFrame in bytes [spark]

2024-05-06 Thread via GitHub
HyukjinKwon commented on code in PR #46368: URL: https://github.com/apache/spark/pull/46368#discussion_r1591745159 ## .gitignore: ## @@ -26,6 +26,7 @@ .scala_dependencies .settings .vscode +.dir-locals.el Review Comment: let's remove this -- This is an automated

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-05-06 Thread via GitHub
HyukjinKwon commented on PR #46404: URL: https://github.com/apache/spark/pull/46404#issuecomment-2097319430 Can you fill the PR description please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48147][SS][CONNECT] Remove client side listeners when local Spark session is deleted [spark]

2024-05-06 Thread via GitHub
HyukjinKwon closed pull request #46406: [SPARK-48147][SS][CONNECT] Remove client side listeners when local Spark session is deleted URL: https://github.com/apache/spark/pull/46406 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48147][SS][CONNECT] Remove client side listeners when local Spark session is deleted [spark]

2024-05-06 Thread via GitHub
HyukjinKwon commented on PR #46406: URL: https://github.com/apache/spark/pull/46406#issuecomment-2097317802 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48035][SQL][FOLLOWUP] Fix try_add/try_multiply being semantic equal to add/multiply [spark]

2024-05-06 Thread via GitHub
db-scnakandala commented on PR #46414: URL: https://github.com/apache/spark/pull/46414#issuecomment-2097313695 cc: @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] SPARK-48035][SQL][FOLLOWUP] Fix try_add/try_multiply being semantic equal to add/multiply [spark]

2024-05-06 Thread via GitHub
db-scnakandala closed pull request #46413: SPARK-48035][SQL][FOLLOWUP] Fix try_add/try_multiply being semantic equal to add/multiply URL: https://github.com/apache/spark/pull/46413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48152][BUILD] Publish the module `spark-profiler` to `maven central repository` [spark]

2024-05-06 Thread via GitHub
panbingkun commented on code in PR #46402: URL: https://github.com/apache/spark/pull/46402#discussion_r1591736775 ## dev/test-dependencies.sh: ## @@ -31,7 +31,7 @@ export LC_ALL=C # NOTE: These should match those in the release publishing script, and be kept in sync with #

Re: [PR] [SPARK-48152][BUILD] Publish the module `spark-profiler` to `maven central repository` [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on code in PR #46402: URL: https://github.com/apache/spark/pull/46402#discussion_r1591736077 ## connector/profiler/pom.xml: ## @@ -44,7 +44,7 @@ me.bechberger ap-loader-all - 3.0-8 + 3.0-9 Review Comment: Could you

Re: [PR] [SPARK-48131][Core] Unify MDC key `mdc.taskName` and `task_name` [spark]

2024-05-06 Thread via GitHub
gengliangwang commented on PR #46386: URL: https://github.com/apache/spark/pull/46386#issuecomment-2097287397 > That change was not visible to end users, as there was not release made - right ? Yes you are right. Are you ok with the changes in this PR? If not, please let me know

Re: [PR] [SPARK-48035][SQL] Fix try_add/try_multiply being semantic equal to add/multiply [spark]

2024-05-06 Thread via GitHub
db-scnakandala commented on code in PR #46307: URL: https://github.com/apache/spark/pull/46307#discussion_r1591732236 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -452,13 +452,14 @@ case class Add( copy(left = newLeft,

Re: [PR] [SPARK-48153][INFRA] Run `build` job of `build_and_test.yml` only if needed [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on PR #46412: URL: https://github.com/apache/spark/pull/46412#issuecomment-2097284326 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48153][INFRA] Run `build` job of `build_and_test.yml` only if needed [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun closed pull request #46412: [SPARK-48153][INFRA] Run `build` job of `build_and_test.yml` only if needed URL: https://github.com/apache/spark/pull/46412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-48153][INFRA] Run `build` job of `build_and_test.yml` only if needed [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on PR #46412: URL: https://github.com/apache/spark/pull/46412#issuecomment-2097282839 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48152][BUILD] Publish the module `spark-profiler` to `maven central repository` [spark]

2024-05-06 Thread via GitHub
panbingkun commented on code in PR #46402: URL: https://github.com/apache/spark/pull/46402#discussion_r1591730444 ## dev/test-dependencies.sh: ## @@ -31,7 +31,7 @@ export LC_ALL=C # NOTE: These should match those in the release publishing script, and be kept in sync with #

Re: [PR] [SPARK-48141][TEST] Update the Oracle docker image version used for test and integration to use Oracle Database 23ai Free [spark]

2024-05-06 Thread via GitHub
yaooqinn commented on code in PR #46399: URL: https://github.com/apache/spark/pull/46399#discussion_r1591728412 ## .github/workflows/build_and_test.yml: ## @@ -929,7 +929,7 @@ jobs: HIVE_PROFILE: hive2.3 GITHUB_PREV_SHA: ${{ github.event.before }}

Re: [PR] [SPARK-48112][CONNECT] Expose session in SparkConnectPlanner to plugins [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on PR #46363: URL: https://github.com/apache/spark/pull/46363#issuecomment-2097265334 Thank you. +1, LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48088][PYTHON][CONNECT][TESTS][FOLLOW-UP][3.5] Skips another that that requires JVM access [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on PR #46411: URL: https://github.com/apache/spark/pull/46411#issuecomment-2097263320 Merged to branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48088][PYTHON][CONNECT][TESTS][FOLLOW-UP][3.5] Skips another that that requires JVM access [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun closed pull request #46411: [SPARK-48088][PYTHON][CONNECT][TESTS][FOLLOW-UP][3.5] Skips another that that requires JVM access URL: https://github.com/apache/spark/pull/46411 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-48152][BUILD] Publish the module `spark-profiler` to `maven central repository` [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on code in PR #46402: URL: https://github.com/apache/spark/pull/46402#discussion_r1591726179 ## dev/test-dependencies.sh: ## @@ -31,7 +31,7 @@ export LC_ALL=C # NOTE: These should match those in the release publishing script, and be kept in sync with

Re: [PR] [SPARK-48152][BUILD] Publish the module `spark-profiler` to `maven central repository` [spark]

2024-05-06 Thread via GitHub
panbingkun commented on code in PR #46402: URL: https://github.com/apache/spark/pull/46402#discussion_r1591725780 ## dev/test-dependencies.sh: ## @@ -31,7 +31,7 @@ export LC_ALL=C # NOTE: These should match those in the release publishing script, and be kept in sync with #

Re: [PR] [SPARK-48153][INFRA] Run `build` job of `build_and_test.yml` only if needed [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on PR #46412: URL: https://github.com/apache/spark/pull/46412#issuecomment-2097252456 Could you review this PR, @HyukjinKwon ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [SPARK-48153][INFRA] Run `build` job of `build_and_test.yml` only if needed [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun opened a new pull request, #46412: URL: https://github.com/apache/spark/pull/46412 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-48152][BUILD] Make the module `spark-profiler` as a part of Spark release [spark]

2024-05-06 Thread via GitHub
panbingkun commented on code in PR #46402: URL: https://github.com/apache/spark/pull/46402#discussion_r1591720244 ## dev/test-dependencies.sh: ## @@ -31,7 +31,7 @@ export LC_ALL=C # NOTE: These should match those in the release publishing script, and be kept in sync with #

Re: [PR] [SPARK-47960][SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-05-06 Thread via GitHub
HeartSaVioR commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1591717075 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveUpdateEventTimeWatermarkColumn.scala: ## @@ -0,0 +1,50 @@ +/* + * Licensed to the

Re: [PR] [SPARK-48035][SQL] Fix try_add/try_multiply being semantic equal to add/multiply [spark]

2024-05-06 Thread via GitHub
cloud-fan commented on code in PR #46307: URL: https://github.com/apache/spark/pull/46307#discussion_r1591716254 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -452,13 +452,14 @@ case class Add( copy(left = newLeft, right =

Re: [PR] [SPARK-47960][SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-05-06 Thread via GitHub
HeartSaVioR commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1591716219 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveUpdateEventTimeWatermarkColumn.scala: ## @@ -0,0 +1,50 @@ +/* + * Licensed to the

Re: [PR] [SPARK-47960][SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-05-06 Thread via GitHub
HeartSaVioR commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1591715505 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveUpdateEventTimeWatermarkColumn.scala: ## @@ -0,0 +1,50 @@ +/* + * Licensed to the

Re: [PR] [SPARK-47960][SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-05-06 Thread via GitHub
HeartSaVioR commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1591714252 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -331,6 +331,7 @@ class Analyzer(override val catalogManager:

Re: [PR] [SPARK-48152][BUILD] Make the module `spark-profiler` as a part of Spark release [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on code in PR #46402: URL: https://github.com/apache/spark/pull/46402#discussion_r1591713237 ## dev/test-dependencies.sh: ## @@ -31,7 +31,7 @@ export LC_ALL=C # NOTE: These should match those in the release publishing script, and be kept in sync with

Re: [PR] [SPARK-48152][BUILD] Make the module `spark-profiler` as a part of Spark release [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on code in PR #46402: URL: https://github.com/apache/spark/pull/46402#discussion_r1591711449 ## dev/test-dependencies.sh: ## @@ -31,7 +31,7 @@ export LC_ALL=C # NOTE: These should match those in the release publishing script, and be kept in sync with

Re: [PR] [SPARK-48152][BUILD] Make the module `spark-profiler` as a part of Spark release [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on code in PR #46402: URL: https://github.com/apache/spark/pull/46402#discussion_r1591711449 ## dev/test-dependencies.sh: ## @@ -31,7 +31,7 @@ export LC_ALL=C # NOTE: These should match those in the release publishing script, and be kept in sync with

Re: [PR] [SPARK-48151][INFRA] `build_and_test.yml` should use `Volcano` 1.7.0 for `branch-3.4/3.5` [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on PR #46410: URL: https://github.com/apache/spark/pull/46410#issuecomment-2097195862 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48151][INFRA] `build_and_test.yml` should use `Volcano` 1.7.0 for `branch-3.4/3.5` [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun closed pull request #46410: [SPARK-48151][INFRA] `build_and_test.yml` should use `Volcano` 1.7.0 for `branch-3.4/3.5` URL: https://github.com/apache/spark/pull/46410 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] [SPARK-48088][PYTHON][CONNECT][TESTS][FOLLOW-UP][3.5] Skips another that that requires JVM access [spark]

2024-05-06 Thread via GitHub
HyukjinKwon opened a new pull request, #46411: URL: https://github.com/apache/spark/pull/46411 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/46334 that missed one more test case. ### Why are the changes

Re: [PR] [SPARK-48150][SQL] try_parse_json output should be declared as nullable [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on PR #46409: URL: https://github.com/apache/spark/pull/46409#issuecomment-2097193827 Wow! Thank you for the fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48150][SQL] try_parse_json output should be declared as nullable [spark]

2024-05-06 Thread via GitHub
JoshRosen commented on PR #46409: URL: https://github.com/apache/spark/pull/46409#issuecomment-2097192426 It's not flaky, it's a legitimate test failure: ``` [info] - function_try_parse_json *** FAILED *** (5 milliseconds) [info] Expected and actual plans do not match:

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-06 Thread via GitHub
HyukjinKwon commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2097190957 follow https://github.com/apache/spark/blob/master/.github/workflows/build_python_connect.yml#L80-L113 to reproduce the failure -- This is an automated message from the Apache Git

Re: [PR] [SPARK-48142][PYTHON][CONNECT][TESTS] Enable `CogroupedApplyInPandasTests.test_wrong_args` [spark]

2024-05-06 Thread via GitHub
zhengruifeng commented on PR #46397: URL: https://github.com/apache/spark/pull/46397#issuecomment-2097185722 thanks @xinrong-meng merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48142][PYTHON][CONNECT][TESTS] Enable `CogroupedApplyInPandasTests.test_wrong_args` [spark]

2024-05-06 Thread via GitHub
zhengruifeng closed pull request #46397: [SPARK-48142][PYTHON][CONNECT][TESTS] Enable `CogroupedApplyInPandasTests.test_wrong_args` URL: https://github.com/apache/spark/pull/46397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48134][CORE] Spark core (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-06 Thread via GitHub
panbingkun commented on PR #46390: URL: https://github.com/apache/spark/pull/46390#issuecomment-2097183311 @gengliangwang All done. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48131][Core] Unify MDC key `mdc.taskName` and `task_name` [spark]

2024-05-06 Thread via GitHub
mridulm commented on PR #46386: URL: https://github.com/apache/spark/pull/46386#issuecomment-2097182888 That change was not visible to end users, as there was not release made - right ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-48035][SQL] Fix try_add/try_multiply being semantic equal to add/multiply [spark]

2024-05-06 Thread via GitHub
HyukjinKwon closed pull request #46307: [SPARK-48035][SQL] Fix try_add/try_multiply being semantic equal to add/multiply URL: https://github.com/apache/spark/pull/46307 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-48035][SQL] Fix try_add/try_multiply being semantic equal to add/multiply [spark]

2024-05-06 Thread via GitHub
HyukjinKwon commented on PR #46307: URL: https://github.com/apache/spark/pull/46307#issuecomment-2097170436 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48151][INFRA] `build_and_test.yml` should use `Volcano` 1.7.0 for `branch-3.4/3.5` [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on PR #46410: URL: https://github.com/apache/spark/pull/46410#issuecomment-2097169044 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48150][SQL] try_parse_json output should be declared as nullable [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun commented on PR #46409: URL: https://github.com/apache/spark/pull/46409#issuecomment-2097168838 According to the CI result, maybe `ProtoToParsedPlanTestSuite` seems to become flaky too. ``` [info] *** 1 TEST FAILED *** [error] Failed tests: [error]

Re: [PR] [MINOR][SQL][DOCS] Correct comments for UnresolvedRelation [spark]

2024-05-06 Thread via GitHub
HyukjinKwon closed pull request #46319: [MINOR][SQL][DOCS] Correct comments for UnresolvedRelation URL: https://github.com/apache/spark/pull/46319 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Add collation support to `mode` [spark]

2024-05-06 Thread via GitHub
GideonPotok closed pull request #46403: Add collation support to `mode` URL: https://github.com/apache/spark/pull/46403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [MINOR][SQL][DOCS] Correct comments for UnresolvedRelation [spark]

2024-05-06 Thread via GitHub
HyukjinKwon commented on PR #46319: URL: https://github.com/apache/spark/pull/46319#issuecomment-2097167163 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-48151][INFRA] `build_and_test.yml` should use `Volcano` 1.7.0 for `branch-3.4/3.5` [spark]

2024-05-06 Thread via GitHub
dongjoon-hyun opened a new pull request, #46410: URL: https://github.com/apache/spark/pull/46410 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting [spark]

2024-05-06 Thread via GitHub
HeartSaVioR commented on PR #46351: URL: https://github.com/apache/spark/pull/46351#issuecomment-2097153000 3.5 has a conflict - probably 3.4 would also have a conflict. @huanliwang-db Could you please help submitting backport PRs for these branches? Thanks in advance! -- This is an

Re: [PR] [SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting [spark]

2024-05-06 Thread via GitHub
HeartSaVioR closed pull request #46351: [SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting URL: https://github.com/apache/spark/pull/46351 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting [spark]

2024-05-06 Thread via GitHub
HeartSaVioR commented on PR #46351: URL: https://github.com/apache/spark/pull/46351#issuecomment-2097151349 Thanks! Merging to master/3.5/3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-06 Thread via GitHub
HyukjinKwon commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2097139796 py4j shouldn't be referred for connect test. can we move them, and import when it's actually used? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-48112][CONNECT] Expose session in SparkConnectPlanner to plugins [spark]

2024-05-06 Thread via GitHub
HyukjinKwon closed pull request #46363: [SPARK-48112][CONNECT] Expose session in SparkConnectPlanner to plugins URL: https://github.com/apache/spark/pull/46363 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-46848] XML: Enhance XML bad record handling with partial results support [spark]

2024-05-06 Thread via GitHub
github-actions[bot] closed pull request #44875: [SPARK-46848] XML: Enhance XML bad record handling with partial results support URL: https://github.com/apache/spark/pull/44875 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [WIP] [SPARK-46884] Spark Connect - ExecutePlanRequest new property - job description [spark]

2024-05-06 Thread via GitHub
github-actions[bot] commented on PR #44909: URL: https://github.com/apache/spark/pull/44909#issuecomment-2097136401 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-48112][CONNECT] Expose session in SparkConnectPlanner to plugins [spark]

2024-05-06 Thread via GitHub
HyukjinKwon commented on PR #46363: URL: https://github.com/apache/spark/pull/46363#issuecomment-2097136188 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

  1   2   3   >