[GitHub] [spark] panbingkun commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes

2022-11-14 Thread GitBox
panbingkun commented on code in PR #38531: URL: https://github.com/apache/spark/pull/38531#discussion_r1022446300 ## core/src/main/resources/error/error-classes.json: ## @@ -290,6 +290,46 @@ "Null typed values cannot be used as arguments of ." ] }, +

[GitHub] [spark] itholic opened a new pull request, #38664: [SPARK-41147][SQL] Assign a name to the legacy error class `_LEGACY_ERROR_TEMP_1042`

2022-11-14 Thread GitBox
itholic opened a new pull request, #38664: URL: https://github.com/apache/spark/pull/38664 ### What changes were proposed in this pull request? This PR proposes to assign a name to `_LEGACY_ERROR_TEMP_1042` as `INVALID_FUNCTION_ARGUMENT`. ### Why are the changes needed?

[GitHub] [spark] HyukjinKwon closed pull request #38642: [SPARK-41127][CONNECT][PYTHON] Implement DataFrame.CreateGlobalView in Python client

2022-11-14 Thread GitBox
HyukjinKwon closed pull request #38642: [SPARK-41127][CONNECT][PYTHON] Implement DataFrame.CreateGlobalView in Python client URL: https://github.com/apache/spark/pull/38642 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #38642: [SPARK-41127][CONNECT][PYTHON] Implement DataFrame.CreateGlobalView in Python client

2022-11-14 Thread GitBox
HyukjinKwon commented on PR #38642: URL: https://github.com/apache/spark/pull/38642#issuecomment-1314896976 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] mridulm commented on pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-14 Thread GitBox
mridulm commented on PR #38064: URL: https://github.com/apache/spark/pull/38064#issuecomment-1314875312 Looks like TPCDS is still dying ... Can we try @HyukjinKwon's suggestion @liuzqt - it will ensure we are testing this during local builds and release votes, while not causing GA to

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38630: [SPARK-41115][CONNECT] Add ClientType to proto to indicate which client sends a request

2022-11-14 Thread GitBox
zhengruifeng commented on code in PR #38630: URL: https://github.com/apache/spark/pull/38630#discussion_r1022393958 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -48,6 +48,11 @@ message Request { // The logical plan to be executed / analyzed. Plan

[GitHub] [spark] mridulm commented on pull request #38441: [SPARK-40979][CORE] Keep removed executor info due to decommission

2022-11-14 Thread GitBox
mridulm commented on PR #38441: URL: https://github.com/apache/spark/pull/38441#issuecomment-1314871690 Can you address the remaining comment @warrenzhu25 ? Thx. Once done, and tests pass, will wait for a few days to give Dongjoon a chance to review. -- This is an automated message

[GitHub] [spark] MaxGekk closed pull request #38652: [SPARK-41137][SQL] Rename `LATERAL_JOIN_OF_TYPE` to `INVALID_LATERAL_JOIN_TYPE`

2022-11-14 Thread GitBox
MaxGekk closed pull request #38652: [SPARK-41137][SQL] Rename `LATERAL_JOIN_OF_TYPE` to `INVALID_LATERAL_JOIN_TYPE` URL: https://github.com/apache/spark/pull/38652 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] MaxGekk commented on pull request #38652: [SPARK-41137][SQL] Rename `LATERAL_JOIN_OF_TYPE` to `INVALID_LATERAL_JOIN_TYPE`

2022-11-14 Thread GitBox
MaxGekk commented on PR #38652: URL: https://github.com/apache/spark/pull/38652#issuecomment-1314839543 +1, LGTM. Merging to master. Thank you, @itholic and @LuciferYang for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] zhengchenyu commented on pull request #33674: [Spark-36328][CORE][SQL] Reuse the FileSystem delegation token while querying partitioned hive table.

2022-11-14 Thread GitBox
zhengchenyu commented on PR #33674: URL: https://github.com/apache/spark/pull/33674#issuecomment-1314836881 > This is slowing down the query as each delegation token has to go through KDC and SSL handshake on Secure Clusters. Can you fix this comment? It make someone confused. each

[GitHub] [spark] HeartSaVioR closed pull request #38528: [SPARK-41025][SS] Introduce ValidateOffsetRange/ComparableOffset to support offset range validation

2022-11-14 Thread GitBox
HeartSaVioR closed pull request #38528: [SPARK-41025][SS] Introduce ValidateOffsetRange/ComparableOffset to support offset range validation URL: https://github.com/apache/spark/pull/38528 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HeartSaVioR commented on pull request #38528: [SPARK-41025][SS] Introduce ValidateOffsetRange/ComparableOffset to support offset range validation

2022-11-14 Thread GitBox
HeartSaVioR commented on PR #38528: URL: https://github.com/apache/spark/pull/38528#issuecomment-1314833214 Let me just deal with each data source - I got some feedback internally that it seems to be an over-engineering. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-14 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1021870148 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationSuite.scala: ## @@ -190,20 +190,25 @@ class StreamingDeduplicationSuite extends

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-14 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1021870148 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationSuite.scala: ## @@ -190,20 +190,25 @@ class StreamingDeduplicationSuite extends

[GitHub] [spark] MaxGekk commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes

2022-11-14 Thread GitBox
MaxGekk commented on code in PR #38531: URL: https://github.com/apache/spark/pull/38531#discussion_r1022349552 ## core/src/main/resources/error/error-classes.json: ## @@ -290,6 +290,46 @@ "Null typed values cannot be used as arguments of ." ] }, +

[GitHub] [spark] srielau commented on a diff in pull request #38656: [SPARK-41140][SQL] Rename the error class `_LEGACY_ERROR_TEMP_2440` to `INVALID_WHERE_CONDITION`

2022-11-14 Thread GitBox
srielau commented on code in PR #38656: URL: https://github.com/apache/spark/pull/38656#discussion_r1022346987 ## core/src/main/resources/error/error-classes.json: ## @@ -699,6 +699,13 @@ } } }, + "INVALID_WHERE_CONDITION" : { +"message" : [ + "Found

[GitHub] [spark] amaliujia commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

2022-11-14 Thread GitBox
amaliujia commented on code in PR #38638: URL: https://github.com/apache/spark/pull/38638#discussion_r1022344836 ## python/pyspark/sql/connect/dataframe.py: ## @@ -667,12 +668,70 @@ def schema(self) -> StructType: else: return self._schema -def

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38653: [SPARK-41128][CONNECT][PYTHON] Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread GitBox
zhengruifeng commented on code in PR #38653: URL: https://github.com/apache/spark/pull/38653#discussion_r1022344761 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -316,6 +319,36 @@ message StatCrosstab { string col2 = 3; } +// Replaces null

[GitHub] [spark] srielau commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes

2022-11-14 Thread GitBox
srielau commented on code in PR #38531: URL: https://github.com/apache/spark/pull/38531#discussion_r1022343441 ## core/src/main/resources/error/error-classes.json: ## @@ -290,6 +290,46 @@ "Null typed values cannot be used as arguments of ." ] }, +

[GitHub] [spark] MaxGekk closed pull request #38629: [SPARK-41072][SQL][SS] Add the error class `STREAM_FAILED` to `StreamingQueryException`

2022-11-14 Thread GitBox
MaxGekk closed pull request #38629: [SPARK-41072][SQL][SS] Add the error class `STREAM_FAILED` to `StreamingQueryException` URL: https://github.com/apache/spark/pull/38629 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] amaliujia commented on pull request #38605: [SPARK-41103][CONNECT][DOC] Document how to add a new proto field of messages

2022-11-14 Thread GitBox
amaliujia commented on PR #38605: URL: https://github.com/apache/spark/pull/38605#issuecomment-1314804133 @zhengruifeng @HyukjinKwon @cloud-fan please take a look on this updated version of the proto style guide. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] MaxGekk commented on pull request #38629: [SPARK-41072][SQL][SS] Add the error class `STREAM_FAILED` to `StreamingQueryException`

2022-11-14 Thread GitBox
MaxGekk commented on PR #38629: URL: https://github.com/apache/spark/pull/38629#issuecomment-1314804246 Merging to master. Thank you, @HeartSaVioR @cloud-fan @LuciferYang for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] aokolnychyi commented on pull request #38005: [SPARK-40550][SQL] DataSource V2: Handle DELETE commands for delta-based sources

2022-11-14 Thread GitBox
aokolnychyi commented on PR #38005: URL: https://github.com/apache/spark/pull/38005#issuecomment-1314802801 I don't think the test failure is related. Let me re-trigger. ``` SPARK-33084: Add jar support Ivy URI in SQL *** FAILED *** ``` -- This is an automated message from

[GitHub] [spark] Yaohua628 opened a new pull request, #38663: [SPARK-41143][SQL] Add named argument function syntax support

2022-11-14 Thread GitBox
Yaohua628 opened a new pull request, #38663: URL: https://github.com/apache/spark/pull/38663 ### What changes were proposed in this pull request? Support named arguments functions in Spark SQL: General usage: `_FUNC_(arg0, arg1, arg2, arg5 => value5, arg8 => value8)`

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-14 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1021870148 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationSuite.scala: ## @@ -190,20 +190,25 @@ class StreamingDeduplicationSuite extends

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-14 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1021870148 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationSuite.scala: ## @@ -190,20 +190,25 @@ class StreamingDeduplicationSuite extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-11-14 Thread GitBox
cloud-fan commented on code in PR #38004: URL: https://github.com/apache/spark/pull/38004#discussion_r1022327315 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/DeltaWriter.java: ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] cloud-fan commented on a diff in pull request #38404: [SPARK-40956] SQL Equivalent for Dataframe overwrite command

2022-11-14 Thread GitBox
cloud-fan commented on code in PR #38404: URL: https://github.com/apache/spark/pull/38404#discussion_r1022326063 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -261,6 +261,7 @@ class AstBuilder extends

[GitHub] [spark] LuciferYang commented on a diff in pull request #38609: [SPARK-40593][BUILD][CONNECT] Make user can build and test `connect` module by specifying the user-defined `protoc` and `protoc

2022-11-14 Thread GitBox
LuciferYang commented on code in PR #38609: URL: https://github.com/apache/spark/pull/38609#discussion_r1022325876 ## connector/connect/README.md: ## @@ -24,7 +24,30 @@ or ```bash ./build/sbt -Phive clean package ``` - + +### Build with user-defined `protoc` and

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38653: [SPARK-41128][CONNECT][PYTHON] Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread GitBox
zhengruifeng commented on code in PR #38653: URL: https://github.com/apache/spark/pull/38653#discussion_r1022320039 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -316,6 +319,36 @@ message StatCrosstab { string col2 = 3; } +// Replaces null

[GitHub] [spark] cloud-fan commented on a diff in pull request #38653: [SPARK-41128][CONNECT][PYTHON] Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread GitBox
cloud-fan commented on code in PR #38653: URL: https://github.com/apache/spark/pull/38653#discussion_r1022318950 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -316,6 +319,36 @@ message StatCrosstab { string col2 = 3; } +// Replaces null

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38653: [SPARK-41128][CONNECT][PYTHON] Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread GitBox
zhengruifeng commented on code in PR #38653: URL: https://github.com/apache/spark/pull/38653#discussion_r1022318227 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -316,6 +319,36 @@ message StatCrosstab { string col2 = 3; } +// Replaces null

[GitHub] [spark] amaliujia commented on pull request #38653: [SPARK-41128][CONNECT][PYTHON] Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread GitBox
amaliujia commented on PR #38653: URL: https://github.com/apache/spark/pull/38653#issuecomment-1314771256 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38631: [SPARK-40809] [CONNECT] [FOLLOW] Support `alias()` in Python client

2022-11-14 Thread GitBox
zhengruifeng commented on code in PR #38631: URL: https://github.com/apache/spark/pull/38631#discussion_r1022317152 ## python/pyspark/sql/tests/connect/test_connect_column_expressions.py: ## @@ -134,6 +134,16 @@ def test_list_to_literal(self): lit_list_plan =

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38631: [SPARK-40809] [CONNECT] [FOLLOW] Support `alias()` in Python client

2022-11-14 Thread GitBox
zhengruifeng commented on code in PR #38631: URL: https://github.com/apache/spark/pull/38631#discussion_r1022317274 ## python/pyspark/sql/connect/dataframe.py: ## @@ -44,7 +44,7 @@ from pyspark.sql.connect.typing import ColumnOrString, ExpressionOrString from

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38653: [SPARK-41128][CONNECT][PYTHON] Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread GitBox
HyukjinKwon commented on code in PR #38653: URL: https://github.com/apache/spark/pull/38653#discussion_r1022317075 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -316,6 +319,36 @@ message StatCrosstab { string col2 = 3; } +// Replaces null

[GitHub] [spark] cloud-fan commented on pull request #38619: [SPARK-41112][SQL] RuntimeFilter should apply ColumnPruning eagerly with in-subquery filter

2022-11-14 Thread GitBox
cloud-fan commented on PR #38619: URL: https://github.com/apache/spark/pull/38619#issuecomment-1314770201 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] ulysses-you commented on pull request #38662: [SPARK-41144][SQL] Unresolved hint should not cause query failure

2022-11-14 Thread GitBox
ulysses-you commented on PR #38662: URL: https://github.com/apache/spark/pull/38662#issuecomment-1314769928 cc @cloud-fan @cfmcgrady @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #38640: [SPARK-41124][SQL][TEST] Add DSv2 PlanStabilitySuites

2022-11-14 Thread GitBox
cloud-fan commented on code in PR #38640: URL: https://github.com/apache/spark/pull/38640#discussion_r1022316324 ## sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala: ## @@ -351,6 +353,62 @@ class TPCDSModifiedPlanStabilityWithStatsSuite extends

[GitHub] [spark] ulysses-you opened a new pull request, #38662: [SPARK-41144][SQL] Unresolved hint should not cause query failure

2022-11-14 Thread GitBox
ulysses-you opened a new pull request, #38662: URL: https://github.com/apache/spark/pull/38662 ### What changes were proposed in this pull request? Skip `UnresolvedHint` in rule `AddMetadataColumns` to avoid call exprId on `UnresolvedAttribute`. ### Why are the changes

[GitHub] [spark] cloud-fan closed pull request #38648: [SPARK-41134][SQL] Improve error message of internal errors

2022-11-14 Thread GitBox
cloud-fan closed pull request #38648: [SPARK-41134][SQL] Improve error message of internal errors URL: https://github.com/apache/spark/pull/38648 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on pull request #38620: [SPARK-41113][BUILD] Upgrade sbt to 1.8.0

2022-11-14 Thread GitBox
LuciferYang commented on PR #38620: URL: https://github.com/apache/spark/pull/38620#issuecomment-1314767698 @LinhongLiu Please help double check. Thanks ~ I don't know what you need. I just checked `dev/sbt-checkstyle` manually -- This is an automated message from the

[GitHub] [spark] cloud-fan commented on pull request #38648: [SPARK-41134][SQL] Improve error message of internal errors

2022-11-14 Thread GitBox
cloud-fan commented on PR #38648: URL: https://github.com/apache/spark/pull/38648#issuecomment-1314767537 thanks for review, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on a diff in pull request #38648: [SPARK-41134][SQL] Improve error message of internal errors

2022-11-14 Thread GitBox
cloud-fan commented on code in PR #38648: URL: https://github.com/apache/spark/pull/38648#discussion_r1022314524 ## sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala: ## @@ -494,7 +494,8 @@ object QueryExecution { private[sql] def

[GitHub] [spark] pan3793 commented on a diff in pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-14 Thread GitBox
pan3793 commented on code in PR #38651: URL: https://github.com/apache/spark/pull/38651#discussion_r1022312510 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreImpl.scala: ## @@ -94,7 +95,9 @@ private[spark]

[GitHub] [spark] HyukjinKwon commented on pull request #38620: [SPARK-41113][BUILD] Upgrade sbt to 1.8.0

2022-11-14 Thread GitBox
HyukjinKwon commented on PR #38620: URL: https://github.com/apache/spark/pull/38620#issuecomment-1314763939 cc @linhongliu-db too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #38620: [SPARK-41113][BUILD] Upgrade sbt to 1.8.0

2022-11-14 Thread GitBox
dongjoon-hyun commented on PR #38620: URL: https://github.com/apache/spark/pull/38620#issuecomment-1314760693 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38642: [SPARK-41127][CONNECT][PYTHON] Implement DataFrame.CreateGlobalView in Python client

2022-11-14 Thread GitBox
HyukjinKwon commented on code in PR #38642: URL: https://github.com/apache/spark/pull/38642#discussion_r1022306947 ## python/pyspark/sql/connect/dataframe.py: ## @@ -633,6 +633,48 @@ def explain(self) -> str: else: return "" +def

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38651: [SPARK-41136][K8S] Shorten graceful shutdown time of ExecutorPodsSnapshotsStoreImpl to prevent blocking shutdown process

2022-11-14 Thread GitBox
dongjoon-hyun commented on code in PR #38651: URL: https://github.com/apache/spark/pull/38651#discussion_r1022306648 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreImpl.scala: ## @@ -94,7 +95,9 @@

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38642: [SPARK-41127][CONNECT][PYTHON] Implement DataFrame.CreateGlobalView in Python client

2022-11-14 Thread GitBox
HyukjinKwon commented on code in PR #38642: URL: https://github.com/apache/spark/pull/38642#discussion_r1022306530 ## python/pyspark/sql/connect/dataframe.py: ## @@ -633,6 +633,48 @@ def explain(self) -> str: else: return "" +def

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37616: [SPARK-40178][PYTHON][SQL] Fix partitioning hint parameters in PySpark

2022-11-14 Thread GitBox
HyukjinKwon commented on code in PR #37616: URL: https://github.com/apache/spark/pull/37616#discussion_r1022305164 ## python/pyspark/sql/dataframe.py: ## @@ -968,7 +968,7 @@ def hint( if not isinstance(name, str): raise TypeError("name should be provided

[GitHub] [spark] HyukjinKwon closed pull request #38603: [SPARK-41101][PYTHON][PROTOBUF] Message classname support for PYSPARK-PROTOBUF

2022-11-14 Thread GitBox
HyukjinKwon closed pull request #38603: [SPARK-41101][PYTHON][PROTOBUF] Message classname support for PYSPARK-PROTOBUF URL: https://github.com/apache/spark/pull/38603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #38603: [SPARK-41101][PYTHON][PROTOBUF] Message classname support for PYSPARK-PROTOBUF

2022-11-14 Thread GitBox
HyukjinKwon commented on PR #38603: URL: https://github.com/apache/spark/pull/38603#issuecomment-1314751610 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #38655: [SPARK-41138][PYTHON] `DataFrame.na.fill` should have the same augment types as `DataFrame.fillna`

2022-11-14 Thread GitBox
zhengruifeng closed pull request #38655: [SPARK-41138][PYTHON] `DataFrame.na.fill` should have the same augment types as `DataFrame.fillna` URL: https://github.com/apache/spark/pull/38655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38631: [SPARK-40809] [CONNECT] [FOLLOW] Support `alias()` in Python client

2022-11-14 Thread GitBox
HyukjinKwon commented on code in PR #38631: URL: https://github.com/apache/spark/pull/38631#discussion_r1022299234 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -141,10 +149,35 @@ class SparkConnectProtoSuite

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38631: [SPARK-40809] [CONNECT] [FOLLOW] Support `alias()` in Python client

2022-11-14 Thread GitBox
HyukjinKwon commented on code in PR #38631: URL: https://github.com/apache/spark/pull/38631#discussion_r1022298812 ## python/pyspark/sql/tests/connect/test_connect_column_expressions.py: ## @@ -134,6 +134,16 @@ def test_list_to_literal(self): lit_list_plan =

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38631: [SPARK-40809] [CONNECT] [FOLLOW] Support `alias()` in Python client

2022-11-14 Thread GitBox
HyukjinKwon commented on code in PR #38631: URL: https://github.com/apache/spark/pull/38631#discussion_r1022298464 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -248,6 +248,20 @@ def test_simple_datasource_read(self) -> None: actualResult =

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38631: [SPARK-40809] [CONNECT] [FOLLOW] Support `alias()` in Python client

2022-11-14 Thread GitBox
HyukjinKwon commented on code in PR #38631: URL: https://github.com/apache/spark/pull/38631#discussion_r1022297572 ## python/pyspark/sql/connect/column.py: ## @@ -82,6 +82,73 @@ def to_plan(self, session: "RemoteSparkSession") -> "proto.Expression": def __str__(self) ->

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38631: [SPARK-40809] [CONNECT] [FOLLOW] Support `alias()` in Python client

2022-11-14 Thread GitBox
HyukjinKwon commented on code in PR #38631: URL: https://github.com/apache/spark/pull/38631#discussion_r1022297572 ## python/pyspark/sql/connect/column.py: ## @@ -82,6 +82,73 @@ def to_plan(self, session: "RemoteSparkSession") -> "proto.Expression": def __str__(self) ->

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38653: [SPARK-41128][CONNECT][PYTHON] Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread GitBox
zhengruifeng commented on code in PR #38653: URL: https://github.com/apache/spark/pull/38653#discussion_r1022297166 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -316,6 +319,36 @@ message StatCrosstab { string col2 = 3; } +// Replaces null

[GitHub] [spark] zhengchenyu commented on pull request #33674: [Spark-36328][CORE][SQL] Reuse the FileSystem delegation token while querying partitioned hive table.

2022-11-14 Thread GitBox
zhengchenyu commented on PR #33674: URL: https://github.com/apache/spark/pull/33674#issuecomment-1314742800 How about the progress of this PR? I encounter this problem, and report a duplicate issue SPARK-41073. Can we continue this? > Note: This bug happen on spark-thriftserver. In my

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38653: [SPARK-41128][CONNECT][PYTHON] Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread GitBox
HyukjinKwon commented on code in PR #38653: URL: https://github.com/apache/spark/pull/38653#discussion_r1022295741 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -316,6 +319,36 @@ message StatCrosstab { string col2 = 3; } +// Replaces null

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38653: [SPARK-41128][CONNECT][PYTHON] Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread GitBox
HyukjinKwon commented on code in PR #38653: URL: https://github.com/apache/spark/pull/38653#discussion_r1022295525 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -316,6 +319,36 @@ message StatCrosstab { string col2 = 3; } +// Replaces null

[GitHub] [spark] zhengchenyu closed pull request #37949: [SPARK-40504][YARN] Make yarn appmaster load config from client

2022-11-14 Thread GitBox
zhengchenyu closed pull request #37949: [SPARK-40504][YARN] Make yarn appmaster load config from client URL: https://github.com/apache/spark/pull/37949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] zhengchenyu commented on pull request #37949: [SPARK-40504][YARN] Make yarn appmaster load config from client

2022-11-14 Thread GitBox
zhengchenyu commented on PR #37949: URL: https://github.com/apache/spark/pull/37949#issuecomment-1314728385 @xkrogen Sorry for miss configuration 'spark.yarn.populateHadoopClasspath'. Thank you very much! When spark.yarn.populateHadoopClasspath is false, HADOOP_CONF_DIR will be removed in

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

2022-11-14 Thread GitBox
HyukjinKwon commented on code in PR #38638: URL: https://github.com/apache/spark/pull/38638#discussion_r1022286561 ## python/pyspark/sql/connect/dataframe.py: ## @@ -667,12 +668,70 @@ def schema(self) -> StructType: else: return self._schema -def

[GitHub] [spark] HyukjinKwon commented on pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-14 Thread GitBox
HyukjinKwon commented on PR #38064: URL: https://github.com/apache/spark/pull/38064#issuecomment-1314714912 I actually don't think we can have a test with 2GB because GA is already using a lot of memory in fact. One way would be to add a test, and mark it as `ignore` so other people can

[GitHub] [spark] HyukjinKwon closed pull request #38639: [SPARK-41123][BUILD] Upgrade mysql-connector-java from 8.0.30 to 8.0.31

2022-11-14 Thread GitBox
HyukjinKwon closed pull request #38639: [SPARK-41123][BUILD] Upgrade mysql-connector-java from 8.0.30 to 8.0.31 URL: https://github.com/apache/spark/pull/38639 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon closed pull request #38636: [SPARK-41120][BUILD] Upgrade joda-time from 2.12.0 to 2.12.1

2022-11-14 Thread GitBox
HyukjinKwon closed pull request #38636: [SPARK-41120][BUILD] Upgrade joda-time from 2.12.0 to 2.12.1 URL: https://github.com/apache/spark/pull/38636 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #38639: [SPARK-41123][BUILD] Upgrade mysql-connector-java from 8.0.30 to 8.0.31

2022-11-14 Thread GitBox
HyukjinKwon commented on PR #38639: URL: https://github.com/apache/spark/pull/38639#issuecomment-1314711135 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #38636: [SPARK-41120][BUILD] Upgrade joda-time from 2.12.0 to 2.12.1

2022-11-14 Thread GitBox
HyukjinKwon commented on PR #38636: URL: https://github.com/apache/spark/pull/38636#issuecomment-1314709991 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] bersprockets commented on a diff in pull request #38635: [SPARK-41118][SQL] `to_number`/`try_to_number` should return `null` when format is `null`

2022-11-14 Thread GitBox
bersprockets commented on code in PR #38635: URL: https://github.com/apache/spark/pull/38635#discussion_r1022273802 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala: ## @@ -26,6 +26,62 @@ import

[GitHub] [spark] amaliujia commented on a diff in pull request #38653: [SPARK-41128][CONNECT][PYTHON] Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread GitBox
amaliujia commented on code in PR #38653: URL: https://github.com/apache/spark/pull/38653#discussion_r1022273256 ## python/pyspark/sql/tests/connect/test_connect_plan_only.py: ## @@ -70,6 +70,37 @@ def test_filter(self):

[GitHub] [spark] amaliujia commented on a diff in pull request #38653: [SPARK-41128][CONNECT][PYTHON] Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread GitBox
amaliujia commented on code in PR #38653: URL: https://github.com/apache/spark/pull/38653#discussion_r1022273256 ## python/pyspark/sql/tests/connect/test_connect_plan_only.py: ## @@ -70,6 +70,37 @@ def test_filter(self):

[GitHub] [spark] panbingkun commented on a diff in pull request #38531: [SPARK-40755][SQL] Migrate type check failures of number formatting onto error classes

2022-11-14 Thread GitBox
panbingkun commented on code in PR #38531: URL: https://github.com/apache/spark/pull/38531#discussion_r1022273230 ## core/src/main/resources/error/error-classes.json: ## @@ -290,6 +290,46 @@ "Null typed values cannot be used as arguments of ." ] }, +

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38653: [SPARK-41128][CONNECT][PYTHON] Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread GitBox
zhengruifeng commented on code in PR #38653: URL: https://github.com/apache/spark/pull/38653#discussion_r1022272524 ## python/pyspark/sql/tests/connect/test_connect_plan_only.py: ## @@ -70,6 +70,37 @@ def test_filter(self):

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38653: [SPARK-41128][CONNECT][PYTHON] Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread GitBox
zhengruifeng commented on code in PR #38653: URL: https://github.com/apache/spark/pull/38653#discussion_r1022271292 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -316,6 +319,36 @@ message StatCrosstab { string col2 = 3; } +// Replaces null

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

2022-11-14 Thread GitBox
zhengruifeng commented on code in PR #38638: URL: https://github.com/apache/spark/pull/38638#discussion_r1022266985 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -48,6 +72,9 @@ message Request { // The logical plan to be executed / analyzed. Plan

[GitHub] [spark] LuciferYang commented on a diff in pull request #38635: [SPARK-41118][SQL] `to_number`/`try_to_number` should return `null` when format is `null`

2022-11-14 Thread GitBox
LuciferYang commented on code in PR #38635: URL: https://github.com/apache/spark/pull/38635#discussion_r1022266640 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala: ## @@ -26,6 +26,62 @@ import

[GitHub] [spark] LuciferYang commented on a diff in pull request #38635: [SPARK-41118][SQL] `to_number`/`try_to_number` should return `null` when format is `null`

2022-11-14 Thread GitBox
LuciferYang commented on code in PR #38635: URL: https://github.com/apache/spark/pull/38635#discussion_r1022266469 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala: ## @@ -26,6 +26,62 @@ import

[GitHub] [spark] vinodkc opened a new pull request, #38661: [SPARK-41085][SQL] Support Bit manipulation function COUNTSET

2022-11-14 Thread GitBox
vinodkc opened a new pull request, #38661: URL: https://github.com/apache/spark/pull/38661 ### What changes were proposed in this pull request? Support Bit manipulation function COUNTSET . The function will return the number of 1 bits in the specified integer value. If the

[GitHub] [spark] itholic commented on pull request #38647: [SPARK-41133][SQL] Integrate `UNSCALED_VALUE_TOO_LARGE_FOR_PRECISION` into `NUMERIC_VALUE_OUT_OF_RANGE`

2022-11-14 Thread GitBox
itholic commented on PR #38647: URL: https://github.com/apache/spark/pull/38647#issuecomment-1314683996 cc @MaxGekk @srielau I believe this is ready for review, PTAL when you find some time -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] LuciferYang commented on a diff in pull request #38609: [SPARK-40593][BUILD][CONNECT] Make user can build and test `connect` module by specifying the user-defined `protoc` and `protoc

2022-11-14 Thread GitBox
LuciferYang commented on code in PR #38609: URL: https://github.com/apache/spark/pull/38609#discussion_r1022256460 ## project/SparkBuild.scala: ## @@ -109,6 +109,16 @@ object SparkBuild extends PomBuild { if (profiles.contains("jdwp-test-debug")) {

[GitHub] [spark] Yikun commented on a diff in pull request #38611: [SPARK-41107][PYTHON][INFRA][TEST] Install memory-profiler in the CI

2022-11-14 Thread GitBox
Yikun commented on code in PR #38611: URL: https://github.com/apache/spark/pull/38611#discussion_r1022254122 ## dev/infra/Dockerfile: ## @@ -32,7 +32,7 @@ RUN $APT_INSTALL software-properties-common git libxml2-dev pkg-config curl wget RUN update-alternatives --set java

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-14 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1022251949 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationSuite.scala: ## @@ -190,20 +190,25 @@ class StreamingDeduplicationSuite extends

[GitHub] [spark] amaliujia commented on a diff in pull request #38653: [SPARK-41128][CONNECT][PYTHON] Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread GitBox
amaliujia commented on code in PR #38653: URL: https://github.com/apache/spark/pull/38653#discussion_r1022249117 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -316,6 +319,36 @@ message StatCrosstab { string col2 = 3; } +// Replaces null

[GitHub] [spark] dengziming commented on pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-14 Thread GitBox
dengziming commented on PR #38659: URL: https://github.com/apache/spark/pull/38659#issuecomment-1314664845 Thank you all for you reviews @zhengruifeng @amaliujia @grundprinzip , there may be some delay since I need some time to get familiar with Arrow.欄 -- This is an automated message

[GitHub] [spark] zhengruifeng commented on pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-14 Thread GitBox
zhengruifeng commented on PR #38659: URL: https://github.com/apache/spark/pull/38659#issuecomment-1314662296 @dengziming thanks for the contributions! I think we'd better apply Arrow batch instead of structs in this proto message. you may refer to

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-14 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1021870148 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationSuite.scala: ## @@ -190,20 +190,25 @@ class StreamingDeduplicationSuite extends

[GitHub] [spark] yaooqinn commented on pull request #37355: [SPARK-39930][SQL] Introduce Cache Hints

2022-11-14 Thread GitBox
yaooqinn commented on PR #37355: URL: https://github.com/apache/spark/pull/37355#issuecomment-1314645862 For a multi-tenant scenario like Thrift Server, if we explicitly cache/uncache some plans or relations, it will affect other's request -- This is an automated message from the Apache

[GitHub] [spark] liuzqt commented on pull request #38064: [SPARK-40622][SQL][CORE]Remove the limitation that single task result must fit in 2GB

2022-11-14 Thread GitBox
liuzqt commented on PR #38064: URL: https://github.com/apache/spark/pull/38064#issuecomment-1314642626 Increasing jvm mem to 6G will lead to TPCDS test get killed https://github.com/liuzqt/spark/actions/runs/3464533302/jobs/5786183673 let me try 5GB... -- This is an automated

[GitHub] [spark] zhengruifeng commented on pull request #38654: [SPARK-41005][CONNECT][DOC][FOLLOW-UP] Document the reason of sending batch in main thread

2022-11-14 Thread GitBox
zhengruifeng commented on PR #38654: URL: https://github.com/apache/spark/pull/38654#issuecomment-1314634891 thank you for reivews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on pull request #38653: [SPARK-41128][CONNECT][PYTHON] Implement `DataFrame.fillna ` and `DataFrame.na.fill `

2022-11-14 Thread GitBox
zhengruifeng commented on PR #38653: URL: https://github.com/apache/spark/pull/38653#issuecomment-1314630040 cc @HyukjinKwon @cloud-fan @amaliujia @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38611: [SPARK-41107][PYTHON][INFRA][TEST] Install memory-profiler in the CI

2022-11-14 Thread GitBox
HyukjinKwon commented on code in PR #38611: URL: https://github.com/apache/spark/pull/38611#discussion_r102629 ## dev/infra/Dockerfile: ## @@ -32,7 +32,7 @@ RUN $APT_INSTALL software-properties-common git libxml2-dev pkg-config curl wget RUN update-alternatives --set java

[GitHub] [spark] HyukjinKwon closed pull request #38654: [SPARK-41005][CONNECT][DOC][FOLLOW-UP] Document the reason of sending batch in main thread

2022-11-14 Thread GitBox
HyukjinKwon closed pull request #38654: [SPARK-41005][CONNECT][DOC][FOLLOW-UP] Document the reason of sending batch in main thread URL: https://github.com/apache/spark/pull/38654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on pull request #38654: [SPARK-41005][CONNECT][DOC][FOLLOW-UP] Document the reason of sending batch in main thread

2022-11-14 Thread GitBox
HyukjinKwon commented on PR #38654: URL: https://github.com/apache/spark/pull/38654#issuecomment-1314619055 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia commented on pull request #38609: [SPARK-40593][BUILD][CONNECT] Make user can build and test `connect` module by specifying the user-defined `protoc` and `protoc-gen-grpc-ja

2022-11-14 Thread GitBox
amaliujia commented on PR #38609: URL: https://github.com/apache/spark/pull/38609#issuecomment-1314615345 LGTM assuming this is tested manually (seems to be hard have a UT) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38609: [SPARK-40593][BUILD][CONNECT] Make user can build and test `connect` module by specifying the user-defined `protoc` and `protoc

2022-11-14 Thread GitBox
HyukjinKwon commented on code in PR #38609: URL: https://github.com/apache/spark/pull/38609#discussion_r1022213304 ## connector/connect/README.md: ## @@ -24,7 +24,30 @@ or ```bash ./build/sbt -Phive clean package ``` - + +### Build with user-defined `protoc` and

[GitHub] [spark] AmplabJenkins commented on pull request #38638: [SPARK-41122][CONNECT] Explain API can support different modes

2022-11-14 Thread GitBox
AmplabJenkins commented on PR #38638: URL: https://github.com/apache/spark/pull/38638#issuecomment-1314598771 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38639: [SPARK-41123][BUILD] Upgrade mysql-connector-java from 8.0.30 to 8.0.31

2022-11-14 Thread GitBox
AmplabJenkins commented on PR #38639: URL: https://github.com/apache/spark/pull/38639#issuecomment-1314598737 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

  1   2   3   >