[GitHub] [spark] vinodkc commented on pull request #39449: [SPARK-40688][SQL] Support data masking built-in function 'mask_first_n'

2023-01-21 Thread via GitHub
vinodkc commented on PR #39449: URL: https://github.com/apache/spark/pull/39449#issuecomment-1399417725 @dtenedor @srielau , Could please review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] gengliangwang closed pull request #39696: [SPARK-42153][UI] Handle null string values in PairStrings/RDDOperationNode/RDDOperationClusterWrapper

2023-01-21 Thread via GitHub
gengliangwang closed pull request #39696: [SPARK-42153][UI] Handle null string values in PairStrings/RDDOperationNode/RDDOperationClusterWrapper URL: https://github.com/apache/spark/pull/39696 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [spark] gengliangwang commented on pull request #39696: [SPARK-42153][UI] Handle null string values in PairStrings/RDDOperationNode/RDDOperationClusterWrapper

2023-01-21 Thread via GitHub
gengliangwang commented on PR #39696: URL: https://github.com/apache/spark/pull/39696#issuecomment-1399399512 @dongjoon-hyun @LuciferYang Thanks for the review. Merging this one to master -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] mridulm commented on pull request #39674: [SPARK-42149][YARN] Remove the env `SPARK_USE_CONC_INCR_GC` used to enable CMS GC for Yarn AM

2023-01-21 Thread via GitHub
mridulm commented on PR #39674: URL: https://github.com/apache/spark/pull/39674#issuecomment-1399398921 Merged to master. Thanks for working on this @LuciferYang ! Thanks for the review @tgravescs, and discussion @dongjoon-hyun, @HyukjinKwon :-) -- This is an automated message from

[GitHub] [spark] mridulm closed pull request #39674: [SPARK-42149][YARN] Remove the env `SPARK_USE_CONC_INCR_GC` used to enable CMS GC for Yarn AM

2023-01-21 Thread via GitHub
mridulm closed pull request #39674: [SPARK-42149][YARN] Remove the env `SPARK_USE_CONC_INCR_GC` used to enable CMS GC for Yarn AM URL: https://github.com/apache/spark/pull/39674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] mridulm commented on a diff in pull request #39674: [SPARK-42149][YARN] Remove the env `SPARK_USE_CONC_INCR_GC` used to enable CMS GC for Yarn AM

2023-01-21 Thread via GitHub
mridulm commented on code in PR #39674: URL: https://github.com/apache/spark/pull/39674#discussion_r1083387253 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -1005,26 +1005,6 @@ private[spark] class Client( val tmpDir = new Path(En

[GitHub] [spark] mridulm commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-21 Thread via GitHub
mridulm commented on PR #39190: URL: https://github.com/apache/spark/pull/39190#issuecomment-1399397779 Late LGTM. Thanks for fixing this @kuwii ! Thanks for merging it @srowen :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [spark] LuciferYang commented on pull request #39642: [SPARK-41677][CORE][SQL][SS][UI] Add Protobuf serializer for `StreamingQueryProgressWrapper`

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39642: URL: https://github.com/apache/spark/pull/39642#issuecomment-1399397643 rebased -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] LuciferYang commented on pull request #39683: [SPARK-42144][CORE][SQL] Handle null string values in StageDataWrapper/StreamBlockData/StreamingQueryData

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39683: URL: https://github.com/apache/spark/pull/39683#issuecomment-1399397574 rebased -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] LuciferYang commented on pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39682: URL: https://github.com/apache/spark/pull/39682#issuecomment-1399397378 rebased -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [spark] LuciferYang commented on pull request #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39688: URL: https://github.com/apache/spark/pull/39688#issuecomment-1399396999 Thanks @gengliangwang @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] gengliangwang closed pull request #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method

2023-01-21 Thread via GitHub
gengliangwang closed pull request #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method URL: https://github.com/apache/spark/pull/39688 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] gengliangwang commented on pull request #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method

2023-01-21 Thread via GitHub
gengliangwang commented on PR #39688: URL: https://github.com/apache/spark/pull/39688#issuecomment-1399396937 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] LuciferYang commented on pull request #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39688: URL: https://github.com/apache/spark/pull/39688#issuecomment-1399396145 should we merge this one?I need rebase others -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng opened a new pull request, #39699: [SPARK-41772][CONNECT][PYTHON] Fix incorrect column name in `withField`'s doctest

2023-01-21 Thread via GitHub
zhengruifeng opened a new pull request, #39699: URL: https://github.com/apache/spark/pull/39699 ### What changes were proposed in this pull request? Fix incorrect column name in `withField`'s doctest ``` pyspark.sql.connect.column.Column.withField Failed example: df.wit

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39585: [SPARK-42124][PYTHON][CONNECT] Scalar Inline Python UDF in Spark Connect

2023-01-21 Thread via GitHub
HyukjinKwon commented on code in PR #39585: URL: https://github.com/apache/spark/pull/39585#discussion_r1083382917 ## python/pyspark/sql/connect/udf.py: ## @@ -0,0 +1,165 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreement

[GitHub] [spark] HyukjinKwon commented on pull request #39692: [SPARK-41629][CONNECT][FOLLOW] Enable access to SparkSession from Plugin

2023-01-21 Thread via GitHub
HyukjinKwon commented on PR #39692: URL: https://github.com/apache/spark/pull/39692#issuecomment-1399391123 These aren't API. Configuration is supposed to be internal, and SparkConnectPlanner isn't also supposed to be exposed to the end users, and we don't keep the binary compatibility ther

[GitHub] [spark] zhengruifeng opened a new pull request, #39698: [SPARK-41283][CONNECT][PYTHON] Add `array_append` to Connect

2023-01-21 Thread via GitHub
zhengruifeng opened a new pull request, #39698: URL: https://github.com/apache/spark/pull/39698 ### What changes were proposed in this pull request? `array_append` was recently added in SQL and PySpark, this PR adds it to Connect. ### Why are the changes needed? For parity

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39585: [SPARK-42124][PYTHON][CONNECT] Scalar Inline Python UDF in Spark Connect

2023-01-21 Thread via GitHub
zhengruifeng commented on code in PR #39585: URL: https://github.com/apache/spark/pull/39585#discussion_r1083380299 ## python/pyspark/sql/connect/udf.py: ## @@ -0,0 +1,165 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreemen

[GitHub] [spark] zhengruifeng closed pull request #39692: [SPARK-41629][CONNECT][FOLLOW] Enable access to SparkSession from Plugin

2023-01-21 Thread via GitHub
zhengruifeng closed pull request #39692: [SPARK-41629][CONNECT][FOLLOW] Enable access to SparkSession from Plugin URL: https://github.com/apache/spark/pull/39692 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] zhengruifeng commented on pull request #39692: [SPARK-41629][CONNECT][FOLLOW] Enable access to SparkSession from Plugin

2023-01-21 Thread via GitHub
zhengruifeng commented on PR #39692: URL: https://github.com/apache/spark/pull/39692#issuecomment-1399382836 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39697: [SPARK-42154][K8S][TESTS] Enable Volcano unit tests and integration tests in GitHub Action

2023-01-21 Thread via GitHub
dongjoon-hyun opened a new pull request, #39697: URL: https://github.com/apache/spark/pull/39697 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] xinrong-meng commented on a diff in pull request #39585: [SPARK-42124][PYTHON][CONNECT] Scalar Inline Python UDF in Spark Connect

2023-01-21 Thread via GitHub
xinrong-meng commented on code in PR #39585: URL: https://github.com/apache/spark/pull/39585#discussion_r1083373417 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -217,6 +218,28 @@ message Expression { bool is_user_defined_function = 4;

[GitHub] [spark] dongjoon-hyun closed pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0

2023-01-21 Thread via GitHub
dongjoon-hyun closed pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0 URL: https://github.com/apache/spark/pull/39690 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] dongjoon-hyun commented on pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0

2023-01-21 Thread via GitHub
dongjoon-hyun commented on PR #39690: URL: https://github.com/apache/spark/pull/39690#issuecomment-1399366517 Thank you so much, @gengliangwang . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0

2023-01-21 Thread via GitHub
dongjoon-hyun commented on code in PR #39690: URL: https://github.com/apache/spark/pull/39690#discussion_r1083372057 ## resource-managers/kubernetes/integration-tests/README.md: ## @@ -364,13 +360,5 @@ You can also specify `volcano` tag to only run Volcano test: ## Cleanup V

[GitHub] [spark] gengliangwang commented on a diff in pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0

2023-01-21 Thread via GitHub
gengliangwang commented on code in PR #39690: URL: https://github.com/apache/spark/pull/39690#discussion_r1083371912 ## resource-managers/kubernetes/integration-tests/README.md: ## @@ -364,13 +360,5 @@ You can also specify `volcano` tag to only run Volcano test: ## Cleanup V

[GitHub] [spark] dongjoon-hyun commented on pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0

2023-01-21 Thread via GitHub
dongjoon-hyun commented on PR #39690: URL: https://github.com/apache/spark/pull/39690#issuecomment-1399364776 Could you review this, @gengliangwang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] gengliangwang commented on pull request #39696: [SPARK-42153][UI] Handle null string values in PairStrings/RDDOperationNode/RDDOperationClusterWrapper

2023-01-21 Thread via GitHub
gengliangwang commented on PR #39696: URL: https://github.com/apache/spark/pull/39696#issuecomment-1399364595 cc @LuciferYang @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] gengliangwang opened a new pull request, #39696: [SPARK-42153][UI] Handle null string values in PairStrings/RDDOperationNode/RDDOperationClusterWrapper

2023-01-21 Thread via GitHub
gengliangwang opened a new pull request, #39696: URL: https://github.com/apache/spark/pull/39696 ### What changes were proposed in this pull request? Similar to #39666, this PR handles null string values in PairStrings/RDDOperationNode/RDDOperationClusterWrapper ### Why are the cha

[GitHub] [spark] vinodkc commented on a diff in pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2023-01-21 Thread via GitHub
vinodkc commented on code in PR #38419: URL: https://github.com/apache/spark/pull/38419#discussion_r1083371035 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala: ## @@ -937,4 +937,135 @@ class MathExpressionsSuite extends SparkFu

[GitHub] [spark] vinodkc commented on a diff in pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2023-01-21 Thread via GitHub
vinodkc commented on code in PR #38419: URL: https://github.com/apache/spark/pull/38419#discussion_r1083371035 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala: ## @@ -937,4 +937,135 @@ class MathExpressionsSuite extends SparkFu

[GitHub] [spark] dongjoon-hyun commented on pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0

2023-01-21 Thread via GitHub
dongjoon-hyun commented on PR #39690: URL: https://github.com/apache/spark/pull/39690#issuecomment-1399359006 Could you review this when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] gengliangwang commented on pull request #39686: [SPARK-42143][UI] Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo

2023-01-21 Thread via GitHub
gengliangwang commented on PR #39686: URL: https://github.com/apache/spark/pull/39686#issuecomment-1399358487 @dongjoon-hyun @LuciferYang Thanks for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] gengliangwang commented on pull request #39685: [SPARK-42142][UI] Handle null string values in CachedQuantile/ExecutorSummary/PoolData

2023-01-21 Thread via GitHub
gengliangwang commented on PR #39685: URL: https://github.com/apache/spark/pull/39685#issuecomment-1399358483 @dongjoon-hyun @LuciferYang Thanks for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] gengliangwang closed pull request #39684: [SPARK-42140][CORE] Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper

2023-01-21 Thread via GitHub
gengliangwang closed pull request #39684: [SPARK-42140][CORE] Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper URL: https://github.com/apache/spark/pull/39684 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [spark] gengliangwang commented on pull request #39684: [SPARK-42140][CORE] Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper

2023-01-21 Thread via GitHub
gengliangwang commented on PR #39684: URL: https://github.com/apache/spark/pull/39684#issuecomment-1399354886 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] grundprinzip commented on a diff in pull request #39585: [SPARK-42124][PYTHON][CONNECT] Scalar Inline Python UDF in Spark Connect

2023-01-21 Thread via GitHub
grundprinzip commented on code in PR #39585: URL: https://github.com/apache/spark/pull/39585#discussion_r1083341487 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -217,6 +218,28 @@ message Expression { bool is_user_defined_function = 4;

[GitHub] [spark] grundprinzip opened a new pull request, #39695: [SPARK-XXXX] SparkConnectClient supports RetryPolicies now

2023-01-21 Thread via GitHub
grundprinzip opened a new pull request, #39695: URL: https://github.com/apache/spark/pull/39695 ### What changes were proposed in this pull request? To support retryable errors either produced by Spark directly or an intermediate proxy, the Spark Connect client can now properly handle tho

[GitHub] [spark] tedyu commented on pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-21 Thread via GitHub
tedyu commented on PR #39654: URL: https://github.com/apache/spark/pull/39654#issuecomment-1399341276 @dongjoon-hyun @srowen @mridulm Thanks for reviewing this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] srowen closed pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-21 Thread via GitHub
srowen closed pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge URL: https://github.com/apache/spark/pull/39654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] srowen commented on pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-21 Thread via GitHub
srowen commented on PR #39654: URL: https://github.com/apache/spark/pull/39654#issuecomment-1399340205 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [spark] dongjoon-hyun closed pull request #39668: [WIP] Test 3.4.0 tagging

2023-01-21 Thread via GitHub
dongjoon-hyun closed pull request #39668: [WIP] Test 3.4.0 tagging URL: https://github.com/apache/spark/pull/39668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [spark] dongjoon-hyun commented on pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-21 Thread via GitHub
dongjoon-hyun commented on PR #39654: URL: https://github.com/apache/spark/pull/39654#issuecomment-1399336797 I'll leave this to the other committers, @tedyu . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] dongjoon-hyun commented on pull request #39690: [SPARK-42150][K8S][DOCS] Upgrade `Volcano` to 1.7.0

2023-01-21 Thread via GitHub
dongjoon-hyun commented on PR #39690: URL: https://github.com/apache/spark/pull/39690#issuecomment-1399336416 Since this is a doc-only PR, GitHub action result is irrelevant. cc @Yikun -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [spark] dongjoon-hyun commented on pull request #39686: [SPARK-42143][UI] Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo

2023-01-21 Thread via GitHub
dongjoon-hyun commented on PR #39686: URL: https://github.com/apache/spark/pull/39686#issuecomment-1399336316 Merged to master, @gengliangwang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] dongjoon-hyun closed pull request #39686: [SPARK-42143][UI] Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo

2023-01-21 Thread via GitHub
dongjoon-hyun closed pull request #39686: [SPARK-42143][UI] Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo URL: https://github.com/apache/spark/pull/39686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on pull request #39685: [SPARK-42142][UI] Handle null string values in CachedQuantile/ExecutorSummary/PoolData

2023-01-21 Thread via GitHub
dongjoon-hyun commented on PR #39685: URL: https://github.com/apache/spark/pull/39685#issuecomment-1399336202 Merged to master, @gengliangwang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] dongjoon-hyun closed pull request #39685: [SPARK-42142][UI] Handle null string values in CachedQuantile/ExecutorSummary/PoolData

2023-01-21 Thread via GitHub
dongjoon-hyun closed pull request #39685: [SPARK-42142][UI] Handle null string values in CachedQuantile/ExecutorSummary/PoolData URL: https://github.com/apache/spark/pull/39685 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] RyanBerti commented on pull request #39678: [SPARK-16484][SQL] Add HyperLogLogPlusPlus sketch generator/evaluator/aggregator

2023-01-21 Thread via GitHub
RyanBerti commented on PR #39678: URL: https://github.com/apache/spark/pull/39678#issuecomment-1399332706 Hi @dtenedor and @huaxingao Thanks for the input! I agree with you both that migrating Spark's existing HLL++ implementation to use the Apache Datasketches library would be ideal

[GitHub] [spark] dongjoon-hyun closed pull request #39689: [SPARK-42148][K8S][BUILD] Upgrade `kubernetes-client` to 6.4.0

2023-01-21 Thread via GitHub
dongjoon-hyun closed pull request #39689: [SPARK-42148][K8S][BUILD] Upgrade `kubernetes-client` to 6.4.0 URL: https://github.com/apache/spark/pull/39689 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on pull request #39689: [SPARK-42148][K8S][BUILD] Upgrade `kubernetes-client` to 6.4.0

2023-01-21 Thread via GitHub
dongjoon-hyun commented on PR #39689: URL: https://github.com/apache/spark/pull/39689#issuecomment-1399323248 Thank you. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] tedyu commented on pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-21 Thread via GitHub
tedyu commented on PR #39654: URL: https://github.com/apache/spark/pull/39654#issuecomment-1399319165 @dongjoon-hyun Do you think this PR is in mergeable state ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] tedyu commented on pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-21 Thread via GitHub
tedyu commented on PR #39654: URL: https://github.com/apache/spark/pull/39654#issuecomment-1399319090 @srowen @mridulm Tests passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] LuciferYang opened a new pull request, #39694: [SPARK-42152][BUILD] Use `_` instead of `-` in `shadedPattern` for relocation package name

2023-01-21 Thread via GitHub
LuciferYang opened a new pull request, #39694: URL: https://github.com/apache/spark/pull/39694 ### What changes were proposed in this pull request? This pr aims change to use `_` instead of `-` in `shadedPattern` for relocation package name. ### Why are the changes needed? J

[GitHub] [spark] itholic commented on a diff in pull request #39693: [SPARK-41712][PYTHON][CONNECT] Migrate the Spark Connect errors into PySpark error framework.

2023-01-21 Thread via GitHub
itholic commented on code in PR #39693: URL: https://github.com/apache/spark/pull/39693#discussion_r1083316715 ## python/pyspark/errors/exceptions.py: ## @@ -288,7 +291,57 @@ class UnknownException(CapturedException): class SparkUpgradeException(CapturedException): """ -

[GitHub] [spark] itholic opened a new pull request, #39693: [SPARK-41712][PYTHON][CONNECT] Migrate the Spark Connect errors into PySpark error framework.

2023-01-21 Thread via GitHub
itholic opened a new pull request, #39693: URL: https://github.com/apache/spark/pull/39693 ### What changes were proposed in this pull request? This PR proposes to migrate the Spark Connect errors into PySpark error framework. Also introducing 5 exceptions to handle

[GitHub] [spark] grundprinzip commented on pull request #39692: [SPARK-41629][CONNECT][FOLLOW] Enable access to SparkSession from Plugin

2023-01-21 Thread via GitHub
grundprinzip commented on PR #39692: URL: https://github.com/apache/spark/pull/39692#issuecomment-1399287673 R: @HyukjinKwon @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] wangyum commented on pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-01-21 Thread via GitHub
wangyum commented on PR #39691: URL: https://github.com/apache/spark/pull/39691#issuecomment-1399282308 cc @xinrong-meng @MaxGekk @gengliangwang @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] grundprinzip opened a new pull request, #39692: [SPARK-41629][CONNECT][FOLLOW] Enable access to SparkSession from Plugin

2023-01-21 Thread via GitHub
grundprinzip opened a new pull request, #39692: URL: https://github.com/apache/spark/pull/39692 ### What changes were proposed in this pull request? This patch allows the planner and command plugins for Spark Connect to access the Spark Session and let other consumers access the configura

[GitHub] [spark] wangyum commented on pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-01-21 Thread via GitHub
wangyum commented on PR #39691: URL: https://github.com/apache/spark/pull/39691#issuecomment-1399281359 In fact databricks also supports this clause. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] wangyum opened a new pull request, #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-01-21 Thread via GitHub
wangyum opened a new pull request, #39691: URL: https://github.com/apache/spark/pull/39691 ### What changes were proposed in this pull request? The `QUALIFY` clause is used to filter the results of [window functions](https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-windo

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39690: [SPARK-42150][K8S][DOCS] Upgrade Volcano to 1.7.0

2023-01-21 Thread via GitHub
dongjoon-hyun opened a new pull request, #39690: URL: https://github.com/apache/spark/pull/39690 ### What changes were proposed in this pull request? This PR aims to upgrade `Volcano` from 1.5.1 to 1.7.0. ### Why are the changes needed? Volcano 1.7.0 finally provides `mul

[GitHub] [spark] LuciferYang commented on pull request #39679: [SPARK-42137][CORE] Enable `spark.kryo.unsafe` by default

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39679: URL: https://github.com/apache/spark/pull/39679#issuecomment-1399278365 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [spark] LuciferYang commented on pull request #39684: [SPARK-42140][CORE] Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39684: URL: https://github.com/apache/spark/pull/39684#issuecomment-1399275488 Yeah, this one GA passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] LuciferYang commented on a diff in pull request #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39688: URL: https://github.com/apache/spark/pull/39688#discussion_r1083302432 ## core/src/main/scala/org/apache/spark/status/protobuf/Utils.scala: ## @@ -17,16 +17,18 @@ package org.apache.spark.status.protobuf -import com.google.protobu

[GitHub] [spark] srowen commented on a diff in pull request #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method

2023-01-21 Thread via GitHub
srowen commented on code in PR #39688: URL: https://github.com/apache/spark/pull/39688#discussion_r1083302080 ## core/src/main/scala/org/apache/spark/status/protobuf/Utils.scala: ## @@ -17,16 +17,18 @@ package org.apache.spark.status.protobuf -import com.google.protobuf.Mes

[GitHub] [spark] srowen commented on pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-21 Thread via GitHub
srowen commented on PR #39654: URL: https://github.com/apache/spark/pull/39654#issuecomment-1399271648 Yeah looks fine, just rerun tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] srowen commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-21 Thread via GitHub
srowen commented on PR #39190: URL: https://github.com/apache/spark/pull/39190#issuecomment-1399271485 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [spark] srowen closed pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-21 Thread via GitHub
srowen closed pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API URL: https://github.com/apache/spark/pull/39190 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] HyukjinKwon closed pull request #39638: [SPARK-42082][SPARK-41598][PYTHON][CONNECT] Introduce `PySparkValueError` and `PySparkTypeError`

2023-01-21 Thread via GitHub
HyukjinKwon closed pull request #39638: [SPARK-42082][SPARK-41598][PYTHON][CONNECT] Introduce `PySparkValueError` and `PySparkTypeError` URL: https://github.com/apache/spark/pull/39638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [spark] HyukjinKwon commented on pull request #39638: [SPARK-42082][SPARK-41598][PYTHON][CONNECT] Introduce `PySparkValueError` and `PySparkTypeError`

2023-01-21 Thread via GitHub
HyukjinKwon commented on PR #39638: URL: https://github.com/apache/spark/pull/39638#issuecomment-1399243499 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] tedyu commented on pull request #39654: [MINOR][SHUFFLE] Include IOException in warning log of finalizeShuffleMerge

2023-01-21 Thread via GitHub
tedyu commented on PR #39654: URL: https://github.com/apache/spark/pull/39654#issuecomment-1399242161 Test failures were not related to the PR. https://github.com/tedyu/spark/actions/runs/3973317986/jobs/6811901738#step:9:23488 ``` Error: Exception in thread "streaming-job-executor-

[GitHub] [spark] HyukjinKwon commented on pull request #39674: [SPARK-42149][YARN] Remove the env `SPARK_USE_CONC_INCR_GC` used to enable CMS GC for Yarn AM

2023-01-21 Thread via GitHub
HyukjinKwon commented on PR #39674: URL: https://github.com/apache/spark/pull/39674#issuecomment-1399234450 I would defer to either @tgravescs or @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] LuciferYang commented on pull request #39674: [SPARK-42149][YARN] Remove the env `SPARK_USE_CONC_INCR_GC` used to enable CMS GC for Yarn AM

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39674: URL: https://github.com/apache/spark/pull/39674#issuecomment-1399233117 Updated pr description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [spark] yabola commented on pull request #39687: [SPARK-41470][Core] Relax constraints on Storage-Partitioned Join should assume InternalRow implements equals and hashCode

2023-01-21 Thread via GitHub
yabola commented on PR #39687: URL: https://github.com/apache/spark/pull/39687#issuecomment-1399232731 @sunchao @aokolnychyi Please take a look, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] LuciferYang commented on pull request #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method

2023-01-21 Thread via GitHub
LuciferYang commented on PR #39688: URL: https://github.com/apache/spark/pull/39688#issuecomment-1399230015 GA failed case: https://github.com/LuciferYang/spark/actions/runs/3973073352/jobs/6811519184 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39689: [SPARK-42148][K8S][BUILD] Upgrade `kubernetes-client` to 6.4.0

2023-01-21 Thread via GitHub
dongjoon-hyun opened a new pull request, #39689: URL: https://github.com/apache/spark/pull/39689 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

[GitHub] [spark] LuciferYang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39682: URL: https://github.com/apache/spark/pull/39682#discussion_r1083274700 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala: ## @@ -23,17 +23,17 @@ import collection.JavaConverters._ imp

[GitHub] [spark] LuciferYang opened a new pull request, #39688: [SPARK-42146][CORE] Refactor `Utils#setStringField` to make maven build pass when sql module use this method

2023-01-21 Thread via GitHub
LuciferYang opened a new pull request, #39688: URL: https://github.com/apache/spark/pull/39688 ### What changes were proposed in this pull request? This pr aims refactor input parameter type of `Utils#setStringField` function to make maven build pass when sql module use this functions.

[GitHub] [spark] yabola commented on pull request #39687: [SPARK-41470][Core] Relax constraints on Storage-Partitioned Join should assume InternalRow implements equals and hashCode

2023-01-21 Thread via GitHub
yabola commented on PR #39687: URL: https://github.com/apache/spark/pull/39687#issuecomment-1399219285 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] yabola opened a new pull request, #39687: [SPARK-41470][Core] Relax constraints on Storage-Partitioned Join should assume InternalRow implements equals and hashCode

2023-01-21 Thread via GitHub
yabola opened a new pull request, #39687: URL: https://github.com/apache/spark/pull/39687 …uld assume InternalRow implements equals and hashCode ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR i

[GitHub] [spark] LuciferYang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39682: URL: https://github.com/apache/spark/pull/39682#discussion_r1083268932 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala: ## @@ -23,17 +23,17 @@ import collection.JavaConverters._ imp

[GitHub] [spark] LuciferYang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39682: URL: https://github.com/apache/spark/pull/39682#discussion_r1083268583 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala: ## @@ -23,17 +23,17 @@ import collection.JavaConverters._ imp

[GitHub] [spark] kuwii commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2023-01-21 Thread via GitHub
kuwii commented on PR #39190: URL: https://github.com/apache/spark/pull/39190#issuecomment-1399216960 Tried the example code in the [JIRA](https://issues.apache.org/jira/browse/SPARK-24415), and it is not affected by this change. Tasks showed in the stage are the same before and after this

[GitHub] [spark] dcoliversun commented on pull request #39306: [SPARK-41781][K8S] Add the ability to create pvc before creating driver/executor pod

2023-01-21 Thread via GitHub
dcoliversun commented on PR #39306: URL: https://github.com/apache/spark/pull/39306#issuecomment-1399216750 Thank you for the reviews @dongjoon-hyun , I believe I've addressed your comments! Tomorrow is also the Chinese New Year, I wish you a happy Chinese New Year. -- This is an automat

[GitHub] [spark] itholic commented on a diff in pull request #39543: [SPARK-42044][SQL] Fix incorrect error message for `MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY`

2023-01-21 Thread via GitHub
itholic commented on code in PR #39543: URL: https://github.com/apache/spark/pull/39543#discussion_r1083266330 ## core/src/main/resources/error/error-classes.json: ## @@ -1592,7 +1592,7 @@ }, "MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY" : { "message" : [ -

[GitHub] [spark] itholic commented on a diff in pull request #39543: [SPARK-42044][SQL] Fix incorrect error message for `MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY`

2023-01-21 Thread via GitHub
itholic commented on code in PR #39543: URL: https://github.com/apache/spark/pull/39543#discussion_r1083266330 ## core/src/main/resources/error/error-classes.json: ## @@ -1592,7 +1592,7 @@ }, "MUST_AGGREGATE_CORRELATED_SCALAR_SUBQUERY" : { "message" : [ -

[GitHub] [spark] itholic commented on a diff in pull request #39638: [SPARK-42082][SPARK-41598][PYTHON][CONNECT] Introduce `PySparkValueError` and `PySparkTypeError`

2023-01-21 Thread via GitHub
itholic commented on code in PR #39638: URL: https://github.com/apache/spark/pull/39638#discussion_r1083264192 ## python/pyspark/sql/tests/test_functions.py: ## @@ -763,25 +798,55 @@ def test_higher_order_function_failures(self): from pyspark.sql.functions import col, t

[GitHub] [spark] peter-toth commented on pull request #39676: [SPARK-42134][SQL] Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes

2023-01-21 Thread via GitHub
peter-toth commented on PR #39676: URL: https://github.com/apache/spark/pull/39676#issuecomment-1399207914 Thanks for the quik review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on a diff in pull request #39674: [DON'T MERGE] Test remove SPARK_USE_CONC_INCR_GC

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39674: URL: https://github.com/apache/spark/pull/39674#discussion_r1083262519 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -1005,26 +1005,6 @@ private[spark] class Client( val tmpDir = new Pat

[GitHub] [spark] LuciferYang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39682: URL: https://github.com/apache/spark/pull/39682#discussion_r1083262236 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala: ## @@ -23,17 +23,17 @@ import collection.JavaConverters._ imp

[GitHub] [spark] gengliangwang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-21 Thread via GitHub
gengliangwang commented on code in PR #39682: URL: https://github.com/apache/spark/pull/39682#discussion_r1083262127 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala: ## @@ -23,17 +23,17 @@ import collection.JavaConverters._ i

[GitHub] [spark] LuciferYang commented on a diff in pull request #39683: [SPARK-42144][CORE][SQL] Handle null string values in StageDataWrapper/StreamBlockData/StreamingQueryData

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39683: URL: https://github.com/apache/spark/pull/39683#discussion_r1083261992 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -495,9 +495,10 @@ message RDDOperationGraphWrapper { } message StreamingQuer

[GitHub] [spark] LuciferYang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39682: URL: https://github.com/apache/spark/pull/39682#discussion_r1083261688 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala: ## @@ -23,17 +23,17 @@ import collection.JavaConverters._ imp

[GitHub] [spark] LuciferYang commented on a diff in pull request #39682: [SPARK-42139][CORE][SQL] Handle null string values in SQLExecutionUIData/SparkPlanGraphWrapper/SQLPlanMetric

2023-01-21 Thread via GitHub
LuciferYang commented on code in PR #39682: URL: https://github.com/apache/spark/pull/39682#discussion_r1083259668 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLPlanMetricSerializer.scala: ## @@ -19,18 +19,24 @@ package org.apache.spark.status.protobuf.sql