[GitHub] [spark] MaxGekk commented on a diff in pull request #38940: [SPARK-41409][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_1043` to `WRONG_NUM_ARGS.WITHOUT_SUGGESTION`

2022-12-12 Thread GitBox
MaxGekk commented on code in PR #38940: URL: https://github.com/apache/spark/pull/38940#discussion_r1046766089 ## core/src/main/resources/error/error-classes.json: ## @@ -1526,8 +1526,20 @@ }, "WRONG_NUM_ARGS" : { "message" : [ - "The requires parameters but

[GitHub] [spark] cloud-fan commented on a diff in pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #38799: URL: https://github.com/apache/spark/pull/38799#discussion_r1046766381 ## sql/core/src/main/scala/org/apache/spark/sql/execution/RemoveRedundantWindowGroupLimits.scala: ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] cloud-fan commented on a diff in pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #38799: URL: https://github.com/apache/spark/pull/38799#discussion_r1046765714 ## sql/core/src/main/scala/org/apache/spark/sql/execution/RemoveRedundantWindowGroupLimits.scala: ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] cloud-fan commented on a diff in pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #38799: URL: https://github.com/apache/spark/pull/38799#discussion_r1046764675 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InsertWindowGroupLimit.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] cloud-fan commented on a diff in pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #38799: URL: https://github.com/apache/spark/pull/38799#discussion_r1046763580 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InsertWindowGroupLimitSuite.scala: ## @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache

[GitHub] [spark] cloud-fan commented on a diff in pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #38799: URL: https://github.com/apache/spark/pull/38799#discussion_r1046762409 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InsertWindowGroupLimitSuite.scala: ## @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache

[GitHub] [spark] cloud-fan commented on a diff in pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #38799: URL: https://github.com/apache/spark/pull/38799#discussion_r1046761416 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InsertWindowGroupLimitSuite.scala: ## @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache

[GitHub] [spark] cloud-fan commented on a diff in pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #38799: URL: https://github.com/apache/spark/pull/38799#discussion_r1046759151 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InsertWindowGroupLimitSuite.scala: ## @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache

[GitHub] [spark] cloud-fan commented on a diff in pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #38799: URL: https://github.com/apache/spark/pull/38799#discussion_r1046756910 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InsertWindowGroupLimitSuite.scala: ## @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache

[GitHub] [spark] mridulm commented on pull request #38959: SPARK-41415: SASL Request Retries

2022-12-12 Thread GitBox
mridulm commented on PR #38959: URL: https://github.com/apache/spark/pull/38959#issuecomment-1347874874 It is not clear to me why we need the protocol change, and why not simply recreate a new socket connection ? -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] cloud-fan commented on a diff in pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #38799: URL: https://github.com/apache/spark/pull/38799#discussion_r1046754926 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1723,6 +1723,14 @@ object SQLConf { .booleanConf

[GitHub] [spark] cloud-fan commented on a diff in pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #38799: URL: https://github.com/apache/spark/pull/38799#discussion_r1046753470 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -146,7 +146,9 @@ abstract class Optimizer(catalogManager:

[GitHub] [spark] cloud-fan commented on a diff in pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #38799: URL: https://github.com/apache/spark/pull/38799#discussion_r1046751527 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InsertWindowGroupLimit.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] amaliujia commented on a diff in pull request #39035: [SPARK-41438][CONNECT][PYTHON] Implement `DataFrame.colRegex`

2022-12-12 Thread GitBox
amaliujia commented on code in PR #39035: URL: https://github.com/apache/spark/pull/39035#discussion_r1046751150 ## python/pyspark/sql/connect/functions.py: ## @@ -89,6 +90,10 @@ def col(col: str) -> Column: column = col +def colRegex(col: str) -> Column: Review Comment:

[GitHub] [spark] amaliujia commented on a diff in pull request #39035: [SPARK-41438][CONNECT][PYTHON] Implement `DataFrame.colRegex`

2022-12-12 Thread GitBox
amaliujia commented on code in PR #39035: URL: https://github.com/apache/spark/pull/39035#discussion_r1046751150 ## python/pyspark/sql/connect/functions.py: ## @@ -89,6 +90,10 @@ def col(col: str) -> Column: column = col +def colRegex(col: str) -> Column: Review Comment:

[GitHub] [spark] cloud-fan commented on a diff in pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #38799: URL: https://github.com/apache/spark/pull/38799#discussion_r1046750710 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InsertWindowGroupLimit.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-12 Thread GitBox
LuciferYang commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1046749466 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,40 @@ case class ArrayExcept(left:

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-12 Thread GitBox
LuciferYang commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1046748387 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,40 @@ case class ArrayExcept(left:

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-12 Thread GitBox
LuciferYang commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1046747994 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,40 @@ case class ArrayExcept(left:

[GitHub] [spark] LuciferYang commented on a diff in pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

2022-12-12 Thread GitBox
LuciferYang commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1046742799 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala: ## @@ -1840,6 +1840,47 @@ class

[GitHub] [spark] beliefer commented on a diff in pull request #39035: [SPARK-41438][CONNECT][PYTHON] Implement `DataFrame.colRegex`

2022-12-12 Thread GitBox
beliefer commented on code in PR #39035: URL: https://github.com/apache/spark/pull/39035#discussion_r1046740571 ## python/pyspark/sql/connect/functions.py: ## @@ -89,6 +90,10 @@ def col(col: str) -> Column: column = col +def colRegex(col: str) -> Column: Review Comment:

[GitHub] [spark] cloud-fan commented on a diff in pull request #39017: [SPARK-41440][CONNECT][PYTHON] Implement `DataFrame.randomSplit`

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #39017: URL: https://github.com/apache/spark/pull/39017#discussion_r1046738676 ## python/pyspark/sql/connect/dataframe.py: ## @@ -875,6 +877,60 @@ def to_jcols( melt = unpivot +def randomSplit( +self, +weights:

[GitHub] [spark] beliefer commented on a diff in pull request #39035: [SPARK-41438][CONNECT][PYTHON] Implement `DataFrame.colRegex`

2022-12-12 Thread GitBox
beliefer commented on code in PR #39035: URL: https://github.com/apache/spark/pull/39035#discussion_r1046737921 ## python/pyspark/sql/connect/functions.py: ## @@ -89,6 +90,10 @@ def col(col: str) -> Column: column = col +def colRegex(col: str) -> Column: Review Comment:

[GitHub] [spark] beliefer commented on a diff in pull request #39017: [SPARK-41440][CONNECT][PYTHON] Implement `DataFrame.randomSplit`

2022-12-12 Thread GitBox
beliefer commented on code in PR #39017: URL: https://github.com/apache/spark/pull/39017#discussion_r1046733400 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -132,12 +132,31 @@ class SparkConnectPlanner(session:

[GitHub] [spark] LuciferYang opened a new pull request, #39045: [MINOR][SQL] Change the group of `ArraySize` from `collection_funcs` to `array_funcs`

2022-12-12 Thread GitBox
LuciferYang opened a new pull request, #39045: URL: https://github.com/apache/spark/pull/39045 ### What changes were proposed in this pull request? This pr change the group of `ArraySize` from `collection_funcs` to `array_funcs`. ### Why are the changes needed? `ArraySize`

[GitHub] [spark] akpatnam25 commented on pull request #38959: SPARK-41415: SASL Request Retries

2022-12-12 Thread GitBox
akpatnam25 commented on PR #38959: URL: https://github.com/apache/spark/pull/38959#issuecomment-1347832052 @otterc @mridulm removing the WIP tag from this PR. this should be good to review now. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] amaliujia commented on a diff in pull request #39035: [SPARK-41438][CONNECT][PYTHON] Implement `DataFrame.colRegex`

2022-12-12 Thread GitBox
amaliujia commented on code in PR #39035: URL: https://github.com/apache/spark/pull/39035#discussion_r1046715636 ## python/pyspark/sql/connect/functions.py: ## @@ -89,6 +90,10 @@ def col(col: str) -> Column: column = col +def colRegex(col: str) -> Column: Review Comment:

[GitHub] [spark] beliefer commented on a diff in pull request #39035: [SPARK-41438][CONNECT][PYTHON] Implement `DataFrame.colRegex`

2022-12-12 Thread GitBox
beliefer commented on code in PR #39035: URL: https://github.com/apache/spark/pull/39035#discussion_r1046712929 ## python/pyspark/sql/connect/functions.py: ## @@ -89,6 +90,10 @@ def col(col: str) -> Column: column = col +def colRegex(col: str) -> Column: Review Comment:

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-12 Thread GitBox
sandeep-katta commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1046710242 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala: ## @@ -2596,4 +2596,33 @@ class

[GitHub] [spark] shuyouZZ commented on a diff in pull request #38983: [SPARK-41447][CORE] Clean up expired event log files that don't exist in listing db

2022-12-12 Thread GitBox
shuyouZZ commented on code in PR #38983: URL: https://github.com/apache/spark/pull/38983#discussion_r1046706662 ## core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala: ## @@ -1705,6 +1705,45 @@ abstract class FsHistoryProviderSuite extends

[GitHub] [spark] LuciferYang commented on pull request #39025: [SPARK-41481][CORE][SQL] Reuse `INVALID_TYPED_LITERAL` instead of `_LEGACY_ERROR_TEMP_0020`

2022-12-12 Thread GitBox
LuciferYang commented on PR #39025: URL: https://github.com/apache/spark/pull/39025#issuecomment-1347802440 Thanks @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-12 Thread GitBox
sandeep-katta commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1046697980 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala: ## @@ -5237,6 +5237,53 @@ class DataFrameFunctionsSuite extends QueryTest with

[GitHub] [spark] WolverineJiang commented on pull request #39036: [SPARK-41461][BUILD][CORE][CONNECT][PROTOBUF] Unify the environment variable of *_PROTOC_EXEC_PATH.

2022-12-12 Thread GitBox
WolverineJiang commented on PR #39036: URL: https://github.com/apache/spark/pull/39036#issuecomment-1347789334 Thanks @HyukjinKwon @LuciferYang ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on a diff in pull request #38968: [SPARK-41441][SQL] Support Generate with no required child output to host outer references

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #38968: URL: https://github.com/apache/spark/pull/38968#discussion_r1046690391 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -774,15 +775,28 @@ object OptimizeOneRowRelationSubquery extends

[GitHub] [spark] MaxGekk closed pull request #39025: [SPARK-41481][CORE][SQL] Reuse `INVALID_TYPED_LITERAL` instead of `_LEGACY_ERROR_TEMP_0020`

2022-12-12 Thread GitBox
MaxGekk closed pull request #39025: [SPARK-41481][CORE][SQL] Reuse `INVALID_TYPED_LITERAL` instead of `_LEGACY_ERROR_TEMP_0020` URL: https://github.com/apache/spark/pull/39025 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] amaliujia commented on a diff in pull request #39035: [SPARK-41438][CONNECT][PYTHON] Implement `DataFrame.colRegex`

2022-12-12 Thread GitBox
amaliujia commented on code in PR #39035: URL: https://github.com/apache/spark/pull/39035#discussion_r1046689985 ## python/pyspark/sql/connect/functions.py: ## @@ -89,6 +90,10 @@ def col(col: str) -> Column: column = col +def colRegex(col: str) -> Column: Review Comment:

[GitHub] [spark] MaxGekk commented on pull request #39025: [SPARK-41481][CORE][SQL] Reuse `INVALID_TYPED_LITERAL` instead of `_LEGACY_ERROR_TEMP_0020`

2022-12-12 Thread GitBox
MaxGekk commented on PR #39025: URL: https://github.com/apache/spark/pull/39025#issuecomment-1347787823 +1, LGTM. Merging to master. Thank you, @LuciferYang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on a diff in pull request #38968: [SPARK-41441][SQL] Support Generate with no required child output to host outer references

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #38968: URL: https://github.com/apache/spark/pull/38968#discussion_r1046688763 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala: ## @@ -667,6 +667,18 @@ object DecorrelateInnerQuery extends

[GitHub] [spark] rangadi commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-12 Thread GitBox
rangadi commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1046683120 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class

[GitHub] [spark] shrprasa commented on pull request #37880: [SPARK-39399] [CORE] [K8S]: Fix proxy-user authentication for Spark on k8s in cluster deploy mode

2022-12-12 Thread GitBox
shrprasa commented on PR #37880: URL: https://github.com/apache/spark/pull/37880#issuecomment-134890 @holdenk I made the suggested change. Can you please review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] holdenk commented on pull request #37417: [SPARK-33782][K8S][CORE]Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode

2022-12-12 Thread GitBox
holdenk commented on PR #37417: URL: https://github.com/apache/spark/pull/37417#issuecomment-1347772151 Thanks for making the PR :D -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on a diff in pull request #38711: [SPARK-41192][CORE] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-12 Thread GitBox
LuciferYang commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1046674086 ## project/MimaExcludes.scala: ## @@ -122,6 +122,13 @@ object MimaExcludes { // [SPARK-41072][SS] Add the error class STREAM_FAILED to

[GitHub] [spark] pralabhkumar commented on pull request #37417: [SPARK-33782][K8S][CORE]Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster m

2022-12-12 Thread GitBox
pralabhkumar commented on PR #37417: URL: https://github.com/apache/spark/pull/37417#issuecomment-1347768333 Thx @holdenk for your help . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] holdenk commented on pull request #37417: [SPARK-33782][K8S][CORE]Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode

2022-12-12 Thread GitBox
holdenk commented on PR #37417: URL: https://github.com/apache/spark/pull/37417#issuecomment-1347764987 Merged, thanks for the reminder. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] asfgit closed pull request #37417: [SPARK-33782][K8S][CORE]Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode

2022-12-12 Thread GitBox
asfgit closed pull request #37417: [SPARK-33782][K8S][CORE]Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mode URL: https://github.com/apache/spark/pull/37417 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] itholic commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-12-12 Thread GitBox
itholic commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1046656616 ## core/src/main/resources/error/error-classes.json: ## @@ -1443,6 +1443,11 @@ "A correlated outer name reference within a subquery expression body was not

[GitHub] [spark] itholic commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-12-12 Thread GitBox
itholic commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1046656452 ## core/src/main/resources/error/error-classes.json: ## @@ -1443,6 +1443,11 @@ "A correlated outer name reference within a subquery expression body was not

[GitHub] [spark] beliefer commented on pull request #39035: [SPARK-41438][CONNECT][PYTHON] Implement `DataFrame.colRegex`

2022-12-12 Thread GitBox
beliefer commented on PR #39035: URL: https://github.com/apache/spark/pull/39035#issuecomment-1347737140 ping @amaliujia @zhengruifeng @cloud-fan @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] pralabhkumar commented on pull request #37417: [SPARK-33782][K8S][CORE]Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster m

2022-12-12 Thread GitBox
pralabhkumar commented on PR #37417: URL: https://github.com/apache/spark/pull/37417#issuecomment-1347736718 @HyukjinKwon Thx for giving LGTM. Please merge the PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] LuciferYang commented on pull request #39038: [SPARK-41378][SQL][FOLLOWUP] Use toAttributeMap before comparison

2022-12-12 Thread GitBox
LuciferYang commented on PR #39038: URL: https://github.com/apache/spark/pull/39038#issuecomment-1347731497 Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] gengliangwang closed pull request #39042: [SPARK-41499][BUILD] Upgrade Protobuf version to 3.21.11

2022-12-12 Thread GitBox
gengliangwang closed pull request #39042: [SPARK-41499][BUILD] Upgrade Protobuf version to 3.21.11 URL: https://github.com/apache/spark/pull/39042 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] gengliangwang commented on pull request #39042: [SPARK-41499][BUILD] Upgrade Protobuf version to 3.21.11

2022-12-12 Thread GitBox
gengliangwang commented on PR #39042: URL: https://github.com/apache/spark/pull/39042#issuecomment-1347723065 @HyukjinKwon @dongjoon-hyun Thanks for the review. Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon closed pull request #39036: [SPARK-41461][BUILD][CORE][CONNECT][PROTOBUF] Unify the environment variable of *_PROTOC_EXEC_PATH.

2022-12-12 Thread GitBox
HyukjinKwon closed pull request #39036: [SPARK-41461][BUILD][CORE][CONNECT][PROTOBUF] Unify the environment variable of *_PROTOC_EXEC_PATH. URL: https://github.com/apache/spark/pull/39036 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] HyukjinKwon commented on pull request #39036: [SPARK-41461][BUILD][CORE][CONNECT][PROTOBUF] Unify the environment variable of *_PROTOC_EXEC_PATH.

2022-12-12 Thread GitBox
HyukjinKwon commented on PR #39036: URL: https://github.com/apache/spark/pull/39036#issuecomment-1347701826 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #39044: [SPARK-41504][K8S][R][TESTS] Update R version to 4.1.2 in Dockerfile comment

2022-12-12 Thread GitBox
dongjoon-hyun closed pull request #39044: [SPARK-41504][K8S][R][TESTS] Update R version to 4.1.2 in Dockerfile comment URL: https://github.com/apache/spark/pull/39044 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] LuciferYang commented on pull request #39025: [SPARK-41481][CORE][SQL] Reuse `INVALID_TYPED_LITERAL` instead of `_LEGACY_ERROR_TEMP_0020`

2022-12-12 Thread GitBox
LuciferYang commented on PR #39025: URL: https://github.com/apache/spark/pull/39025#issuecomment-1347700483 ready now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun commented on pull request #39044: [SPARK-41504][K8S][R][TESTS] Update R version to 4.1.2 in Dockerfile comment

2022-12-12 Thread GitBox
dongjoon-hyun commented on PR #39044: URL: https://github.com/apache/spark/pull/39044#issuecomment-1347700473 Merged to master for Apache Spark 3.4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] pan3793 commented on pull request #38622: [SPARK-39601][YARN] AllocationFailure should not be treated as exitCausedByApp when driver is shutting down

2022-12-12 Thread GitBox
pan3793 commented on PR #38622: URL: https://github.com/apache/spark/pull/38622#issuecomment-1347698275 @tgravescs could you please help merging this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun commented on pull request #39044: [SPARK-41504][K8S][R][TESTS] Update R version to 4.1.2 in Dockerfile comment

2022-12-12 Thread GitBox
dongjoon-hyun commented on PR #39044: URL: https://github.com/apache/spark/pull/39044#issuecomment-1347696429 Thank you, @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39044: [SPARK-41504][K8S][R][TESTS] Update R version to 4.1.2 in Dockerfile comment

2022-12-12 Thread GitBox
dongjoon-hyun opened a new pull request, #39044: URL: https://github.com/apache/spark/pull/39044 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] dongjoon-hyun closed pull request #39043: [SPARK-41502][K8S][TESTS] Upgrade the minimum Minikube version to 1.28.0

2022-12-12 Thread GitBox
dongjoon-hyun closed pull request #39043: [SPARK-41502][K8S][TESTS] Upgrade the minimum Minikube version to 1.28.0 URL: https://github.com/apache/spark/pull/39043 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #39043: [SPARK-41502][K8S][TESTS] Upgrade the minimum Minikube version to 1.28.0

2022-12-12 Thread GitBox
dongjoon-hyun commented on PR #39043: URL: https://github.com/apache/spark/pull/39043#issuecomment-1347688181 Thank you, @HyukjinKwon . K8s IT passed. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] cloud-fan commented on a diff in pull request #39010: [SPARK-41468][SQL] Fix PlanExpression handling in EquivalentExpressions

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #39010: URL: https://github.com/apache/spark/pull/39010#discussion_r1046606606 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -142,28 +142,37 @@ class EquivalentExpressions {

[GitHub] [spark] panbingkun commented on a diff in pull request #39018: [SPARK-41478][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1234

2022-12-12 Thread GitBox
panbingkun commented on code in PR #39018: URL: https://github.com/apache/spark/pull/39018#discussion_r1046606345 ## core/src/main/resources/error/error-classes.json: ## @@ -1257,6 +1257,11 @@ "AES- with the padding by the function." ] }, +

[GitHub] [spark] cloud-fan commented on a diff in pull request #39017: [SPARK-41440][CONNECT][PYTHON] Implement `DataFrame.randomSplit`

2022-12-12 Thread GitBox
cloud-fan commented on code in PR #39017: URL: https://github.com/apache/spark/pull/39017#discussion_r1046606153 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -132,12 +132,31 @@ class

[GitHub] [spark] anchovYu commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project

2022-12-12 Thread GitBox
anchovYu commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1046592935 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala: ## @@ -424,8 +424,51 @@ case class OuterReference(e: NamedExpression)

[GitHub] [spark] rmcyang commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2022-12-12 Thread GitBox
rmcyang commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r1046602772 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1197,15 +1230,15 @@ public void onData(String streamId,

[GitHub] [spark] HeartSaVioR commented on pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-12 Thread GitBox
HeartSaVioR commented on PR #38911: URL: https://github.com/apache/spark/pull/38911#issuecomment-1347663347 cc. @zsxwing @xuanyuanking @viirya Friendly reminder. cc. @rangadi @LuciferYang @MaxGekk @srielau @jerrypeng Would you mind taking another look? -- This is an automated message

[GitHub] [spark] zhengruifeng commented on pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint

2022-12-12 Thread GitBox
zhengruifeng commented on PR #38984: URL: https://github.com/apache/spark/pull/38984#issuecomment-1347659727 if `test_fill_na` is irrelevant to this PR, I guess rebasing may help -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun opened a new pull request, #39043: [SPARK-41502][K8S][TESTS] Upgrade the minimum Minikube version to 1.28.0

2022-12-12 Thread GitBox
dongjoon-hyun opened a new pull request, #39043: URL: https://github.com/apache/spark/pull/39043 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] anchovYu commented on a diff in pull request #38776: [SPARK-27561][SQL] Support implicit lateral column alias resolution on Project

2022-12-12 Thread GitBox
anchovYu commented on code in PR #38776: URL: https://github.com/apache/spark/pull/38776#discussion_r1046592935 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala: ## @@ -424,8 +424,51 @@ case class OuterReference(e: NamedExpression)

[GitHub] [spark] cloud-fan commented on pull request #39038: [SPARK-41378][SQL][FOLLOWUP] Use toAttributeMap before comparison

2022-12-12 Thread GitBox
cloud-fan commented on PR #39038: URL: https://github.com/apache/spark/pull/39038#issuecomment-1347635987 thanks for the quick fix! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] beliefer commented on a diff in pull request #39017: [SPARK-41440][CONNECT][PYTHON] Implement `DataFrame.randomSplit`

2022-12-12 Thread GitBox
beliefer commented on code in PR #39017: URL: https://github.com/apache/spark/pull/39017#discussion_r1046585027 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -601,3 +602,10 @@ message Unpivot { // (Required) Name of the value column.

[GitHub] [spark] gengliangwang commented on pull request #39042: [SPARK-41499][BUILD] Upgrade Protobuf version to 3.21.11

2022-12-12 Thread GitBox
gengliangwang commented on PR #39042: URL: https://github.com/apache/spark/pull/39042#issuecomment-1347617840 Actually I got this error when running SBT locally ``` [error] file:/Users/gengliang.wang/.m2/repository/com/google/protobuf/protoc/3.21.9/protoc-3.21.9-osx-x86_64.exe: not

[GitHub] [spark] gengliangwang opened a new pull request, #39042: [SPARK-41499][BUILD] Upgrade Protobuf version to 3.21.11

2022-12-12 Thread GitBox
gengliangwang opened a new pull request, #39042: URL: https://github.com/apache/spark/pull/39042 ### What changes were proposed in this pull request? Upgrade Protobuf version to 3.21.11 ### Why are the changes needed? There are some bug fixes in the latest

[GitHub] [spark] beliefer commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-12 Thread GitBox
beliefer commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1046572899 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,30 @@ case class ArrayExcept(left:

[GitHub] [spark] zhengruifeng commented on pull request #39033: [SPARK-41495][CONNECT][PYTHON] Implement `collection` functions: P~Z

2022-12-12 Thread GitBox
zhengruifeng commented on PR #39033: URL: https://github.com/apache/spark/pull/39033#issuecomment-1347612464 @HyukjinKwon thank you for the reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon closed pull request #39033: [SPARK-41495][CONNECT][PYTHON] Implement `collection` functions: P~Z

2022-12-12 Thread GitBox
HyukjinKwon closed pull request #39033: [SPARK-41495][CONNECT][PYTHON] Implement `collection` functions: P~Z URL: https://github.com/apache/spark/pull/39033 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #39033: [SPARK-41495][CONNECT][PYTHON] Implement `collection` functions: P~Z

2022-12-12 Thread GitBox
HyukjinKwon commented on PR #39033: URL: https://github.com/apache/spark/pull/39033#issuecomment-1347608857 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #39038: [SPARK-41378][SQL][FOLLOWUP] Use toAttributeMap before comparison

2022-12-12 Thread GitBox
HyukjinKwon commented on PR #39038: URL: https://github.com/apache/spark/pull/39038#issuecomment-1347586999 Thx! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon opened a new pull request, #39041: [WIP][DO-NOT-MERGE][CONNECT] Merge namespace of Spark Connect and PySpark API

2022-12-12 Thread GitBox
HyukjinKwon opened a new pull request, #39041: URL: https://github.com/apache/spark/pull/39041 ### What changes were proposed in this pull request? This PR proposes to merge namespaces between Spark Connect and PySpark. ### Why are the changes needed? TBD ### Does

[GitHub] [spark] otterc commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2022-12-12 Thread GitBox
otterc commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r1046545776 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -593,6 +607,9 @@ public void onData(String streamId,

[GitHub] [spark] github-actions[bot] closed pull request #35667: [SPARK-38425][K8S] Avoid possible errors due to incorrect file size or type supplied in hadoop conf

2022-12-12 Thread GitBox
github-actions[bot] closed pull request #35667: [SPARK-38425][K8S] Avoid possible errors due to incorrect file size or type supplied in hadoop conf URL: https://github.com/apache/spark/pull/35667 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] github-actions[bot] closed pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer

2022-12-12 Thread GitBox
github-actions[bot] closed pull request #37404: [SPARK-39866][SQL] Memory leak when closing a session of Spark ThriftServer URL: https://github.com/apache/spark/pull/37404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] anchovYu opened a new pull request, #39040: [WIP][SPARK-27561][SQL] Support implicit lateral column alias resolution on Aggregate

2022-12-12 Thread GitBox
anchovYu opened a new pull request, #39040: URL: https://github.com/apache/spark/pull/39040 ### What changes were proposed in this pull request? This PR is based on https://github.com/apache/spark/pull/38776. To view difference,

[GitHub] [spark] rmcyang commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2022-12-12 Thread GitBox
rmcyang commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r1046495068 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1197,15 +1230,15 @@ public void onData(String streamId,

[GitHub] [spark] SandishKumarHN commented on pull request #39039: [SPARK-40776][SQL][PROTOBUF][DOCS] Spark-Protobuf docs

2022-12-12 Thread GitBox
SandishKumarHN commented on PR #39039: URL: https://github.com/apache/spark/pull/39039#issuecomment-1347493102 @rangadi please let me know if I have missed anything. two things we can add is - shading requirements - an example of how to use options parameters for recursive fields.

[GitHub] [spark] SandishKumarHN opened a new pull request, #39039: [SPARK-40776][SQL][PROTOBUF][DOCS] Spark-Protobuf docs

2022-12-12 Thread GitBox
SandishKumarHN opened a new pull request, #39039: URL: https://github.com/apache/spark/pull/39039 ### What changes were proposed in this pull request? The goal of this PR is to document protobuf-protobuf usage. ### Why are the changes needed? added new

[GitHub] [spark] sadikovi commented on pull request #38784: [SPARK-41248][SQL] Add "spark.sql.json.enablePartialResults" to enable/disable JSON partial results parsing added in SPARK-40646

2022-12-12 Thread GitBox
sadikovi commented on PR #38784: URL: https://github.com/apache/spark/pull/38784#issuecomment-1347431942 Benchmark results for when the config `spark.sql.json.enablePartialResults` is enabled and disabled (txt files). [JsonBenchmark results with config as

[GitHub] [spark] sadikovi commented on a diff in pull request #38784: [SPARK-41248][SQL] Add "spark.sql.json.enablePartialResults" to enable/disable JSON partial results parsing added in SPARK-40646

2022-12-12 Thread GitBox
sadikovi commented on code in PR #38784: URL: https://github.com/apache/spark/pull/38784#discussion_r1046441435 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3629,6 +3629,15 @@ object SQLConf { .booleanConf .createWithDefault(true)

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38983: [SPARK-41447][CORE] Clean up expired event log files that don't exist in listing db

2022-12-12 Thread GitBox
dongjoon-hyun commented on code in PR #38983: URL: https://github.com/apache/spark/pull/38983#discussion_r1046450220 ## core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala: ## @@ -1705,6 +1705,45 @@ abstract class FsHistoryProviderSuite extends

[GitHub] [spark] sadikovi commented on a diff in pull request #38784: [SPARK-41248][SQL] Add "spark.sql.json.enablePartialResults" to enable/disable JSON partial results parsing added in SPARK-40646

2022-12-12 Thread GitBox
sadikovi commented on code in PR #38784: URL: https://github.com/apache/spark/pull/38784#discussion_r1046441435 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3629,6 +3629,15 @@ object SQLConf { .booleanConf .createWithDefault(true)

[GitHub] [spark] amaliujia commented on pull request #39028: [SPARK-41484][CONNECT][PYTHON] Implement `collection` functions: E~M

2022-12-12 Thread GitBox
amaliujia commented on PR #39028: URL: https://github.com/apache/spark/pull/39028#issuecomment-1347259428 late LGTM! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] amaliujia commented on a diff in pull request #39017: [SPARK-41440][CONNECT][PYTHON] Implement `DataFrame.randomSplit`

2022-12-12 Thread GitBox
amaliujia commented on code in PR #39017: URL: https://github.com/apache/spark/pull/39017#discussion_r1046358161 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -601,3 +602,10 @@ message Unpivot { // (Required) Name of the value column.

[GitHub] [spark] amaliujia commented on pull request #38984: [SPARK-41349][CONNECT][PYTHON] Implement DataFrame.hint

2022-12-12 Thread GitBox
amaliujia commented on PR #38984: URL: https://github.com/apache/spark/pull/38984#issuecomment-1347253379 Looks like the CI is failing on 1. python/pyspark/sql/tests/connect/test_connect_basic.py.test_fill_na doe not pass 2. cannot pass ./dev/lint-scala -- This is an automated

[GitHub] [spark] amaliujia commented on pull request #39034: [SPARK-41412][CONNECT][FOLLOW-UP] Fix test_cast to pass with ANSI mode on

2022-12-12 Thread GitBox
amaliujia commented on PR #39034: URL: https://github.com/apache/spark/pull/39034#issuecomment-1347250746 late LGTM! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] otterc commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-12-12 Thread GitBox
otterc commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1046319200 ## core/src/main/scala/org/apache/spark/util/JsonProtocol.scala: ## @@ -1105,11 +1133,17 @@ private[spark] object JsonProtocol { case None =>

[GitHub] [spark] otterc commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-12-12 Thread GitBox
otterc commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1046313315 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -726,6 +736,56 @@ final class ShuffleBlockFetcherIterator( } } + //

[GitHub] [spark] mridulm commented on a diff in pull request #39011: [SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated

2022-12-12 Thread GitBox
mridulm commented on code in PR #39011: URL: https://github.com/apache/spark/pull/39011#discussion_r1046308698 ## core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala: ## @@ -51,6 +52,7 @@ import org.apache.spark.rdd.RDD * at the same time for

[GitHub] [spark] otterc commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-12-12 Thread GitBox
otterc commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1046299412 ## core/src/main/scala/org/apache/spark/status/LiveEntity.scala: ## @@ -843,6 +917,25 @@ private[spark] object LiveEntityHelpers {

  1   2   >