[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39098: [SPARK-41553][PS][PYTHON][CORE] Change `num_files` to `repartition`

2022-12-27 Thread GitBox
HyukjinKwon commented on code in PR #39098: URL: https://github.com/apache/spark/pull/39098#discussion_r1058130837 ## python/pyspark/pandas/generic.py: ## @@ -748,7 +748,7 @@ def to_csv( 2012-02-29 12:00:00,US,2 2012-03-31 12:00:00,JP,3 ->>>

[GitHub] [spark] itholic opened a new pull request, #39260: [SPARK-41579][SQL] Assign name to _LEGACY_ERROR_TEMP_1249

2022-12-27 Thread GitBox
itholic opened a new pull request, #39260: URL: https://github.com/apache/spark/pull/39260 ### What changes were proposed in this pull request? This PR proposes to assign name to _LEGACY_ERROR_TEMP_1249, "NOT_A_PARTITIONED_TABLE". ### Why are the changes needed?

[GitHub] [spark] Ngone51 commented on pull request #39011: [SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated

2022-12-27 Thread GitBox
Ngone51 commented on PR #39011: URL: https://github.com/apache/spark/pull/39011#issuecomment-1366436959 Thanks @mridulm @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on pull request #39251: [SPARK-41736][CONNECT][PYTHON] `pyspark_types_to_proto_types` should supports `ArrayType`

2022-12-27 Thread GitBox
beliefer commented on PR #39251: URL: https://github.com/apache/spark/pull/39251#issuecomment-1366435201 @HyukjinKwon @zhengruifeng @grundprinzip Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] AngersZhuuuu commented on pull request #39259: [SPARK-41739][SQL] CheckRule should not be executed when analyze view child

2022-12-27 Thread GitBox
AngersZh commented on PR #39259: URL: https://github.com/apache/spark/pull/39259#issuecomment-1366432024 ping @cloud-fan @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #39097: [SPARK-41049][SQL] Make to_csv function deterministic

2022-12-27 Thread GitBox
cloud-fan commented on PR #39097: URL: https://github.com/apache/spark/pull/39097#issuecomment-1366430660 I'm fixing the root cause at https://github.com/apache/spark/pull/39248 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] cloud-fan commented on pull request #39248: [SPARK-41049][SQL] Revisit stateful expression handling

2022-12-27 Thread GitBox
cloud-fan commented on PR #39248: URL: https://github.com/apache/spark/pull/39248#issuecomment-1366430564 cc @viirya @HyukjinKwon @gengliangwang @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] AngersZhuuuu opened a new pull request, #39259: [SPARK-41739][SQL] CheckRule should not be executed when analyze view child

2022-12-27 Thread GitBox
AngersZh opened a new pull request, #39259: URL: https://github.com/apache/spark/pull/39259 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] cloud-fan commented on a diff in pull request #39248: [SPARK-41049][SQL] Revisit stateful expression handling

2022-12-27 Thread GitBox
cloud-fan commented on code in PR #39248: URL: https://github.com/apache/spark/pull/39248#discussion_r1058115431 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala: ## @@ -117,10 +111,6 @@ object

[GitHub] [spark] LuciferYang commented on pull request #39250: [SQL][MINOR] Use Diamond operator for constructing HashMap

2022-12-27 Thread GitBox
LuciferYang commented on PR #39250: URL: https://github.com/apache/spark/pull/39250#issuecomment-1366426696 for example https://github.com/apache/spark/blob/87a235c2143449bd8da0acee4ec3cd3155bb/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java#L168

[GitHub] [spark] cloud-fan commented on a diff in pull request #39248: [SPARK-41049][SQL] Revisit stateful expression handling

2022-12-27 Thread GitBox
cloud-fan commented on code in PR #39248: URL: https://github.com/apache/spark/pull/39248#discussion_r1058114484 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala: ## @@ -36,18 +36,12 @@ class

[GitHub] [spark] itholic opened a new pull request, #39258: [SPARK-41572][SQL] Assign name to _LEGACY_ERROR_TEMP_2149

2022-12-27 Thread GitBox
itholic opened a new pull request, #39258: URL: https://github.com/apache/spark/pull/39258 ### What changes were proposed in this pull request? This PR proposes to assign name to _LEGACY_ERROR_TEMP_2149, "MALFORMED_CSV_RECORD". ### Why are the changes needed?

[GitHub] [spark] HeartSaVioR closed pull request #39247: [SPARK-41733][SQL][SS] Apply tree-pattern based pruning for the rule ResolveWindowTime

2022-12-27 Thread GitBox
HeartSaVioR closed pull request #39247: [SPARK-41733][SQL][SS] Apply tree-pattern based pruning for the rule ResolveWindowTime URL: https://github.com/apache/spark/pull/39247 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HeartSaVioR commented on pull request #39247: [SPARK-41733][SQL][SS] Apply tree-pattern based pruning for the rule ResolveWindowTime

2022-12-27 Thread GitBox
HeartSaVioR commented on PR #39247: URL: https://github.com/apache/spark/pull/39247#issuecomment-1366416569 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on pull request #39247: [SPARK-41733][SQL][SS] Apply tree-pattern based pruning for the rule ResolveWindowTime

2022-12-27 Thread GitBox
HeartSaVioR commented on PR #39247: URL: https://github.com/apache/spark/pull/39247#issuecomment-1366416497 https://github.com/HeartSaVioR/spark/runs/10328690654 Looks like GA fails to pull the test result. Build succeeded. -- This is an automated message from the Apache Git Service.

[GitHub] [spark] HyukjinKwon closed pull request #39257: [MINOR][CONNECT] Regenerate Protobuf for Python

2022-12-27 Thread GitBox
HyukjinKwon closed pull request #39257: [MINOR][CONNECT] Regenerate Protobuf for Python URL: https://github.com/apache/spark/pull/39257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #39257: [MINOR][CONNECT] Regenerate Protobuf for Python

2022-12-27 Thread GitBox
HyukjinKwon commented on PR #39257: URL: https://github.com/apache/spark/pull/39257#issuecomment-1366414887 Sorry, it's my bad. Merged to master. to fix up the build. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon opened a new pull request, #39257: [MINOR][CONNECT] Regenerate Protobuf for Python

2022-12-27 Thread GitBox
HyukjinKwon opened a new pull request, #39257: URL: https://github.com/apache/spark/pull/39257 ### What changes were proposed in this pull request? There is unsynced Python side of Protobuf. This PR regenerates ### Why are the changes needed? To fix the build.

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39246: [SPARK-41067][CONNECT][PYTHON] Implement `DataFrame.stat.cov`

2022-12-27 Thread GitBox
zhengruifeng commented on code in PR #39246: URL: https://github.com/apache/spark/pull/39246#discussion_r1058097335 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -328,6 +329,16 @@ class

[GitHub] [spark] ulysses-you commented on a diff in pull request #39220: [SPARK-41713][SQL] Make CTAS hold a nested execution for data writing

2022-12-27 Thread GitBox
ulysses-you commented on code in PR #39220: URL: https://github.com/apache/spark/pull/39220#discussion_r1058097111 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala: ## @@ -143,29 +141,11 @@ case class

[GitHub] [spark] zhengruifeng commented on pull request #39251: [SPARK-41736][CONNECT][PYTHON] `pyspark_types_to_proto_types` should supports `ArrayType`

2022-12-27 Thread GitBox
zhengruifeng commented on PR #39251: URL: https://github.com/apache/spark/pull/39251#issuecomment-1366411747 merged into master, thank you @beliefer ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng closed pull request #39251: [SPARK-41736][CONNECT][PYTHON] `pyspark_types_to_proto_types` should supports `ArrayType`

2022-12-27 Thread GitBox
zhengruifeng closed pull request #39251: [SPARK-41736][CONNECT][PYTHON] `pyspark_types_to_proto_types` should supports `ArrayType` URL: https://github.com/apache/spark/pull/39251 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39254: [SPARK-41333][SPARK-41737] Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread GitBox
zhengruifeng commented on code in PR #39254: URL: https://github.com/apache/spark/pull/39254#discussion_r1058094822 ## python/pyspark/sql/connect/group.py: ## @@ -97,36 +97,46 @@ def agg(self, *exprs: Union[Column, Dict[str, str]]) -> "DataFrame": ),

[GitHub] [spark] HyukjinKwon closed pull request #39252: [SPARK-41734][CONNECT] Add a parent message for Catalog

2022-12-27 Thread GitBox
HyukjinKwon closed pull request #39252: [SPARK-41734][CONNECT] Add a parent message for Catalog URL: https://github.com/apache/spark/pull/39252 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #39252: [SPARK-41734][CONNECT] Add a parent message for Catalog

2022-12-27 Thread GitBox
HyukjinKwon commented on PR #39252: URL: https://github.com/apache/spark/pull/39252#issuecomment-1366407948 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] grundprinzip commented on a diff in pull request #39254: [SPARK-41333][SPARK-41737] Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread GitBox
grundprinzip commented on code in PR #39254: URL: https://github.com/apache/spark/pull/39254#discussion_r1058092052 ## python/pyspark/sql/connect/group.py: ## @@ -97,36 +97,46 @@ def agg(self, *exprs: Union[Column, Dict[str, str]]) -> "DataFrame": ),

[GitHub] [spark] LuciferYang commented on pull request #39235: [SPARK-41729][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0011` to `UNSUPPORTED_FEATURE.COMBINATION_QUERY_RESULT_CLAUSES`

2022-12-27 Thread GitBox
LuciferYang commented on PR #39235: URL: https://github.com/apache/spark/pull/39235#issuecomment-1366407361 Thanks @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang commented on a diff in pull request #39255: [DON'T MERGE][BUILD] Switch default protobuf-java version to 3.x

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39255: URL: https://github.com/apache/spark/pull/39255#discussion_r1058091610 ## pom.xml: ## @@ -827,8 +827,7 @@ com.google.protobuf protobuf-java -${protobuf.hadoopDependency.version} -

[GitHub] [spark] MaxGekk closed pull request #39235: [SPARK-41729][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0011` to `UNSUPPORTED_FEATURE.COMBINATION_QUERY_RESULT_CLAUSES`

2022-12-27 Thread GitBox
MaxGekk closed pull request #39235: [SPARK-41729][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0011` to `UNSUPPORTED_FEATURE.COMBINATION_QUERY_RESULT_CLAUSES` URL: https://github.com/apache/spark/pull/39235 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] MaxGekk commented on pull request #39235: [SPARK-41729][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0011` to `UNSUPPORTED_FEATURE.COMBINATION_QUERY_RESULT_CLAUSES`

2022-12-27 Thread GitBox
MaxGekk commented on PR #39235: URL: https://github.com/apache/spark/pull/39235#issuecomment-1366406803 +1, LGTM. Merging to master. Thank you, @LuciferYang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] grundprinzip commented on a diff in pull request #39254: [SPARK-41333][SPARK-41737] Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread GitBox
grundprinzip commented on code in PR #39254: URL: https://github.com/apache/spark/pull/39254#discussion_r1058089769 ## python/pyspark/sql/connect/group.py: ## @@ -97,36 +97,46 @@ def agg(self, *exprs: Union[Column, Dict[str, str]]) -> "DataFrame": ),

[GitHub] [spark] grundprinzip commented on a diff in pull request #39254: [SPARK-41333][SPARK-41737] Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread GitBox
grundprinzip commented on code in PR #39254: URL: https://github.com/apache/spark/pull/39254#discussion_r1058089550 ## python/pyspark/sql/connect/group.py: ## @@ -97,36 +97,46 @@ def agg(self, *exprs: Union[Column, Dict[str, str]]) -> "DataFrame": ),

[GitHub] [spark] grundprinzip commented on a diff in pull request #39254: [SPARK-41333][SPARK-41737] Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread GitBox
grundprinzip commented on code in PR #39254: URL: https://github.com/apache/spark/pull/39254#discussion_r1058089392 ## python/pyspark/sql/connect/group.py: ## @@ -97,36 +97,46 @@ def agg(self, *exprs: Union[Column, Dict[str, str]]) -> "DataFrame": ),

[GitHub] [spark] LuciferYang commented on a diff in pull request #39255: [DON'T MERGE][BUILD] Switch default protobuf-java version to 3.x

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39255: URL: https://github.com/apache/spark/pull/39255#discussion_r1058080832 ## pom.xml: ## @@ -827,8 +827,7 @@ com.google.protobuf protobuf-java -${protobuf.hadoopDependency.version} -

[GitHub] [spark] cloud-fan commented on a diff in pull request #39220: [SPARK-41713][SQL] Make CTAS hold a nested execution for data writing

2022-12-27 Thread GitBox
cloud-fan commented on code in PR #39220: URL: https://github.com/apache/spark/pull/39220#discussion_r1058086587 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala: ## @@ -143,29 +141,11 @@ case class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39254: [SPARK-41333][SPARK-41737] Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread GitBox
zhengruifeng commented on code in PR #39254: URL: https://github.com/apache/spark/pull/39254#discussion_r1058086316 ## python/pyspark/sql/connect/group.py: ## @@ -97,36 +97,46 @@ def agg(self, *exprs: Union[Column, Dict[str, str]]) -> "DataFrame": ),

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39254: [SPARK-41333][SPARK-41737] Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread GitBox
zhengruifeng commented on code in PR #39254: URL: https://github.com/apache/spark/pull/39254#discussion_r1058085725 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1061,6 +1068,81 @@ class

[GitHub] [spark] grundprinzip commented on a diff in pull request #39254: [SPARK-41333][SPARK-41737] Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread GitBox
grundprinzip commented on code in PR #39254: URL: https://github.com/apache/spark/pull/39254#discussion_r1058085692 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1061,6 +1068,81 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39254: [SPARK-41333][SPARK-41737] Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread GitBox
zhengruifeng commented on code in PR #39254: URL: https://github.com/apache/spark/pull/39254#discussion_r1058084926 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1061,6 +1068,81 @@ class

[GitHub] [spark] grundprinzip commented on pull request #39256: [SPARK-41738][CONNECT] Mix ClientId in SparkSession cache

2022-12-27 Thread GitBox
grundprinzip commented on PR #39256: URL: https://github.com/apache/spark/pull/39256#issuecomment-1366397023 R: @HyukjinKwon @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] grundprinzip opened a new pull request, #39256: [SPARK-41738][CONNECT] Mix ClientId in SparkSession cache

2022-12-27 Thread GitBox
grundprinzip opened a new pull request, #39256: URL: https://github.com/apache/spark/pull/39256 ### What changes were proposed in this pull request? This PR mixes the client ID into the cache for the SparkSessions on the server. This is necessary to allow to concurrent SparkSessions of

[GitHub] [spark] LuciferYang commented on a diff in pull request #39255: [DON'T MERGE][BUILD] Switch default protobuf-java version to 3.x

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39255: URL: https://github.com/apache/spark/pull/39255#discussion_r1058080832 ## pom.xml: ## @@ -827,8 +827,7 @@ com.google.protobuf protobuf-java -${protobuf.hadoopDependency.version} -

[GitHub] [spark] LuciferYang commented on a diff in pull request #39255: [DON'T MERGE][BUILD] Switch default protobuf-java version to 3.x

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39255: URL: https://github.com/apache/spark/pull/39255#discussion_r1058080832 ## pom.xml: ## @@ -827,8 +827,7 @@ com.google.protobuf protobuf-java -${protobuf.hadoopDependency.version} -

[GitHub] [spark] ulysses-you commented on pull request #39220: [SPARK-41713][SQL] Make CTAS hold a nested execution for data writing

2022-12-27 Thread GitBox
ulysses-you commented on PR #39220: URL: https://github.com/apache/spark/pull/39220#issuecomment-1366395718 @cloud-fan addressed all comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang opened a new pull request, #39255: [DON'T MERGE][BUILD] Switch default protobuf-java version to 3.x

2022-12-27 Thread GitBox
LuciferYang opened a new pull request, #39255: URL: https://github.com/apache/spark/pull/39255 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] cloud-fan commented on a diff in pull request #39099: [SPARK-41554] fix changing of Decimal scale when scale decreased by m…

2022-12-27 Thread GitBox
cloud-fan commented on code in PR #39099: URL: https://github.com/apache/spark/pull/39099#discussion_r1058077238 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala: ## @@ -374,7 +374,7 @@ final class Decimal extends Ordered[Decimal] with Serializable {

[GitHub] [spark] LuciferYang commented on a diff in pull request #39250: [SQL][MINOR] Use Diamond operator for constructing HashMap

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39250: URL: https://github.com/apache/spark/pull/39250#discussion_r1058073382 ## sql/core/src/test/java/test/org/apache/spark/sql/JavaBeanDeserializationSuite.java: ## @@ -590,9 +590,9 @@ public Item call(Item item1, Item item2) throws

[GitHub] [spark] srowen commented on pull request #39215: [WIP][SPARK-41709][CORE][SQL][UI] Explicitly define `Seq` as `collection.Seq` to avoid `toSeq` when create ui objects from protobuf objects fo

2022-12-27 Thread GitBox
srowen commented on PR #39215: URL: https://github.com/apache/spark/pull/39215#issuecomment-1366386151 Seems OK to me, un-mark it as Draft to let it test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] LuciferYang commented on a diff in pull request #39110: [SPARK-41429][UI] Protobuf serializer for RDDOperationGraphWrapper

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39110: URL: https://github.com/apache/spark/pull/39110#discussion_r1058072078 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -390,3 +390,38 @@ message SQLExecutionUIData { repeated int64 stages = 11;

[GitHub] [spark] techaddict commented on pull request #39110: [SPARK-41429][UI] Protobuf serializer for RDDOperationGraphWrapper

2022-12-27 Thread GitBox
techaddict commented on PR #39110: URL: https://github.com/apache/spark/pull/39110#issuecomment-1366384358 @gengliangwang addressed all the comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] grundprinzip commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2022-12-27 Thread GitBox
grundprinzip commented on code in PR #39091: URL: https://github.com/apache/spark/pull/39091#discussion_r1058070997 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -907,6 +907,38 @@ def test_random_split(self):

[GitHub] [spark] grundprinzip commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2022-12-27 Thread GitBox
grundprinzip commented on code in PR #39091: URL: https://github.com/apache/spark/pull/39091#discussion_r1058070139 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -181,6 +184,18 @@ message ExecutePlanResponse { string metric_type = 3;

[GitHub] [spark] grundprinzip commented on a diff in pull request #39246: [SPARK-41067][CONNECT][PYTHON] Implement `DataFrame.stat.cov`

2022-12-27 Thread GitBox
grundprinzip commented on code in PR #39246: URL: https://github.com/apache/spark/pull/39246#discussion_r1058068989 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -328,6 +329,16 @@ class

[GitHub] [spark] grundprinzip commented on a diff in pull request #39252: [SPARK-41734][CONNECT] Add a parent message for Catalog

2022-12-27 Thread GitBox
grundprinzip commented on code in PR #39252: URL: https://github.com/apache/spark/pull/39252#discussion_r1058067646 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -70,35 +70,7 @@ message Relation { StatDescribe describe = 102; //

[GitHub] [spark] grundprinzip commented on a diff in pull request #39252: [SPARK-41734][CONNECT] Add a parent message for Catalog

2022-12-27 Thread GitBox
grundprinzip commented on code in PR #39252: URL: https://github.com/apache/spark/pull/39252#discussion_r1058067556 ## connector/connect/common/src/main/protobuf/spark/connect/catalog.proto: ## @@ -24,6 +24,41 @@ import "spark/connect/types.proto"; option java_multiple_files =

[GitHub] [spark] grundprinzip commented on a diff in pull request #39254: [SPARK-41333][SPARK-41737] Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread GitBox
grundprinzip commented on code in PR #39254: URL: https://github.com/apache/spark/pull/39254#discussion_r1058067328 ## python/pyspark/sql/connect/group.py: ## @@ -97,36 +97,46 @@ def agg(self, *exprs: Union[Column, Dict[str, str]]) -> "DataFrame": ),

[GitHub] [spark] grundprinzip commented on a diff in pull request #39254: [SPARK-41333][SPARK-41737] Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread GitBox
grundprinzip commented on code in PR #39254: URL: https://github.com/apache/spark/pull/39254#discussion_r1058066260 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1061,6 +1068,81 @@ class

[GitHub] [spark] grundprinzip commented on a diff in pull request #39254: [SPARK-41333][SPARK-41737] Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread GitBox
grundprinzip commented on code in PR #39254: URL: https://github.com/apache/spark/pull/39254#discussion_r1058066012 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1061,6 +1068,81 @@ class

[GitHub] [spark] LuciferYang commented on a diff in pull request #39215: [WIP][SPARK-41709][CORE][SQL][UI] Explicitly define `Seq` as `collection.Seq` to avoid `toSeq` when create ui objects from prot

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39215: URL: https://github.com/apache/spark/pull/39215#discussion_r1058061532 ## project/MimaExcludes.scala: ## @@ -129,7 +129,16 @@ object MimaExcludes {

[GitHub] [spark] zhengruifeng commented on pull request #39254: [SPARK-41333][SPARK-41737] Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread GitBox
zhengruifeng commented on PR #39254: URL: https://github.com/apache/spark/pull/39254#issuecomment-1366372126 cc @HyukjinKwon @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng opened a new pull request, #39254: [SPARK-41333][SPARK-41737] Implement `GroupedData.{min, max, avg, sum}`

2022-12-27 Thread GitBox
zhengruifeng opened a new pull request, #39254: URL: https://github.com/apache/spark/pull/39254 ### What changes were proposed in this pull request? Implement `GroupedData.{min, max, avg, sum}` ### Why are the changes needed? TLDR, `df.groupby().min` != `df.groupby().agg(min)`

[GitHub] [spark] LuciferYang commented on a diff in pull request #39215: [WIP][SPARK-41709][CORE][SQL][UI] Explicitly define `Seq` as `collection.Seq` to avoid `toSeq` when create ui objects from prot

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39215: URL: https://github.com/apache/spark/pull/39215#discussion_r1058059669 ## project/MimaExcludes.scala: ## @@ -129,7 +129,27 @@ object MimaExcludes {

[GitHub] [spark] beliefer commented on pull request #39251: [SPARK-41736][CONNECT][PYTHON] `pyspark_types_to_proto_types` should supports `ArrayType`

2022-12-27 Thread GitBox
beliefer commented on PR #39251: URL: https://github.com/apache/spark/pull/39251#issuecomment-1366364200 ping @zhengruifeng @grundprinzip @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] cloud-fan commented on a diff in pull request #39220: [SPARK-41713][SQL] Make CTAS hold a nested execution for data writing

2022-12-27 Thread GitBox
cloud-fan commented on code in PR #39220: URL: https://github.com/apache/spark/pull/39220#discussion_r1058051128 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala: ## @@ -143,29 +141,10 @@ case class

[GitHub] [spark] packyan commented on pull request #39021: [SPARK-41483][CORE] Last metrics system report should have a timeout, avoid to lead shutdown hook timeout

2022-12-27 Thread GitBox
packyan commented on PR #39021: URL: https://github.com/apache/spark/pull/39021#issuecomment-1366356581 Any one can give me some suggestions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on pull request #39253: [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing

2022-12-27 Thread GitBox
HeartSaVioR commented on PR #39253: URL: https://github.com/apache/spark/pull/39253#issuecomment-1366355228 The change is identical to #39245 except import (one line diff.). I'll go merging this once CI passes. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] HeartSaVioR opened a new pull request, #39253: [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing

2022-12-27 Thread GitBox
HeartSaVioR opened a new pull request, #39253: URL: https://github.com/apache/spark/pull/39253 This PR ports back #39245 to branch-3.3. ### What changes were proposed in this pull request? This PR proposes to apply tree-pattern based pruning for the rule SessionWindowing, to

[GitHub] [spark] HeartSaVioR commented on pull request #39247: [SPARK-41733][SQL][SS] Apply tree-pattern based pruning for the rule ResolveWindowTime

2022-12-27 Thread GitBox
HeartSaVioR commented on PR #39247: URL: https://github.com/apache/spark/pull/39247#issuecomment-1366352458 The change is only to rebase the fix due to the conflict from #39245. I'm going to merge once the CI passes. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] cloud-fan commented on a diff in pull request #39220: [SPARK-41713][SQL] Make CTAS hold a nested execution for data writing

2022-12-27 Thread GitBox
cloud-fan commented on code in PR #39220: URL: https://github.com/apache/spark/pull/39220#discussion_r1058042355 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala: ## @@ -143,29 +141,10 @@ case class

[GitHub] [spark] HeartSaVioR commented on pull request #39245: [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing

2022-12-27 Thread GitBox
HeartSaVioR commented on PR #39245: URL: https://github.com/apache/spark/pull/39245#issuecomment-1366349270 It conflicts with branch-3.3 (probably 3.2 as well). I'll create a new PR for backport. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] cloud-fan commented on a diff in pull request #39220: [SPARK-41713][SQL] Make CTAS hold a nested execution for data writing

2022-12-27 Thread GitBox
cloud-fan commented on code in PR #39220: URL: https://github.com/apache/spark/pull/39220#discussion_r1058042058 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala: ## @@ -102,21 +102,15 @@ class HiveExplainSuite extends QueryTest with

[GitHub] [spark] cloud-fan commented on a diff in pull request #39220: [SPARK-41713][SQL] Make CTAS hold a nested execution for data writing

2022-12-27 Thread GitBox
cloud-fan commented on code in PR #39220: URL: https://github.com/apache/spark/pull/39220#discussion_r1058041588 ## sql/core/src/test/scala/org/apache/spark/sql/util/DataFrameCallbackSuite.scala: ## @@ -217,10 +217,10 @@ class DataFrameCallbackSuite extends QueryTest

[GitHub] [spark] HeartSaVioR closed pull request #39245: [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing

2022-12-27 Thread GitBox
HeartSaVioR closed pull request #39245: [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing URL: https://github.com/apache/spark/pull/39245 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HeartSaVioR commented on pull request #39245: [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing

2022-12-27 Thread GitBox
HeartSaVioR commented on PR #39245: URL: https://github.com/apache/spark/pull/39245#issuecomment-1366348248 Thanks, merging to master/3.3/3.2! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #39220: [SPARK-41713][SQL] Make CTAS hold a nested execution for data writing

2022-12-27 Thread GitBox
cloud-fan commented on code in PR #39220: URL: https://github.com/apache/spark/pull/39220#discussion_r1058041258 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -1217,7 +1230,7 @@ class AdaptiveQueryExecSuite

[GitHub] [spark] cloud-fan commented on a diff in pull request #39240: [SPARK-41440][CONNECT][PYTHON] Avoid the cache operator for general Sample.

2022-12-27 Thread GitBox
cloud-fan commented on code in PR #39240: URL: https://github.com/apache/spark/pull/39240#discussion_r1058037999 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -379,9 +379,10 @@ message Sample { // (Optional) The random seed. optional

[GitHub] [spark] LuciferYang commented on a diff in pull request #39192: [SPARK-41423][CORE] Protobuf serializer for StageDataWrapper

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39192: URL: https://github.com/apache/spark/pull/39192#discussion_r1058036782 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -390,3 +390,214 @@ message SQLExecutionUIData { repeated int64 stages = 11;

[GitHub] [spark] HyukjinKwon commented on pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2022-12-27 Thread GitBox
HyukjinKwon commented on PR #39239: URL: https://github.com/apache/spark/pull/39239#issuecomment-1366342520 Let me take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #39252: [SPARK-41734][CONNECT] Add a parent message for Catalog

2022-12-27 Thread GitBox
HyukjinKwon commented on PR #39252: URL: https://github.com/apache/spark/pull/39252#issuecomment-1366341521 cc @zhengruifeng and @grundprinzip FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon opened a new pull request, #39252: [SPARK-41734][CONNECT] Add a parent message for Catalog

2022-12-27 Thread GitBox
HyukjinKwon opened a new pull request, #39252: URL: https://github.com/apache/spark/pull/39252 ### What changes were proposed in this pull request? This PR proposes to add a parent Protobuf message for Catalog (see https://github.com/apache/spark/pull/39214#discussion_r1057439608).

[GitHub] [spark] LuciferYang commented on a diff in pull request #39110: [SPARK-41429][UI] Protobuf serializer for RDDOperationGraphWrapper

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39110: URL: https://github.com/apache/spark/pull/39110#discussion_r1058034241 ## core/src/main/scala/org/apache/spark/status/protobuf/RDDOperationGraphWrapperSerializer.scala: ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] LuciferYang commented on a diff in pull request #39110: [SPARK-41429][UI] Protobuf serializer for RDDOperationGraphWrapper

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39110: URL: https://github.com/apache/spark/pull/39110#discussion_r1058034241 ## core/src/main/scala/org/apache/spark/status/protobuf/RDDOperationGraphWrapperSerializer.scala: ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] cloud-fan commented on a diff in pull request #39133: [SPARK-41595][SQL] Support generator function explode/explode_outer in the FROM clause

2022-12-27 Thread GitBox
cloud-fan commented on code in PR #39133: URL: https://github.com/apache/spark/pull/39133#discussion_r1058032279 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala: ## @@ -136,6 +136,30 @@ object UnresolvedTableValuedFunction { } } +/**

[GitHub] [spark] LuciferYang commented on a diff in pull request #39226: [SPARK-41694][CORE] Add new config to clean up `spark.ui.store.path` directory when `SparkContext.stop()`

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39226: URL: https://github.com/apache/spark/pull/39226#discussion_r1058030818 ## core/src/main/scala/org/apache/spark/status/AppStatusStore.scala: ## @@ -733,6 +734,15 @@ private[spark] class AppStatusStore( def close(): Unit = {

[GitHub] [spark] LuciferYang commented on a diff in pull request #39226: [SPARK-41694][CORE] Add new config to clean up `spark.ui.store.path` directory when `SparkContext.stop()`

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39226: URL: https://github.com/apache/spark/pull/39226#discussion_r1058030818 ## core/src/main/scala/org/apache/spark/status/AppStatusStore.scala: ## @@ -733,6 +734,15 @@ private[spark] class AppStatusStore( def close(): Unit = {

[GitHub] [spark] LuciferYang commented on a diff in pull request #39226: [SPARK-41694][CORE] Add new config to clean up `spark.ui.store.path` directory when `SparkContext.stop()`

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39226: URL: https://github.com/apache/spark/pull/39226#discussion_r1058030818 ## core/src/main/scala/org/apache/spark/status/AppStatusStore.scala: ## @@ -733,6 +734,15 @@ private[spark] class AppStatusStore( def close(): Unit = {

[GitHub] [spark] cloud-fan commented on a diff in pull request #39202: [SPARK-41685][UI] Support Protobuf serializer for the KVStore in History server

2022-12-27 Thread GitBox
cloud-fan commented on code in PR #39202: URL: https://github.com/apache/spark/pull/39202#discussion_r1058030669 ## core/src/main/scala/org/apache/spark/internal/config/History.scala: ## @@ -79,6 +79,21 @@ private[spark] object History { .stringConf .createOptional

[GitHub] [spark] beliefer opened a new pull request, #39251: [SPARK-41736][CONNECT][PYTHON] `pyspark_types_to_proto_types` should supports `ArrayType`

2022-12-27 Thread GitBox
beliefer opened a new pull request, #39251: URL: https://github.com/apache/spark/pull/39251 ### What changes were proposed in this pull request? Currently, `pyspark_types_to_proto_types` used to transform pyspark datatypes to protobuffer datatypes. But it not supports the array type

[GitHub] [spark] LuciferYang commented on a diff in pull request #39226: [SPARK-41694][CORE] Add new config to clean up `spark.ui.store.path` directory when `SparkContext.stop()`

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39226: URL: https://github.com/apache/spark/pull/39226#discussion_r1058024767 ## core/src/main/scala/org/apache/spark/status/AppStatusStore.scala: ## @@ -733,6 +734,15 @@ private[spark] class AppStatusStore( def close(): Unit = {

[GitHub] [spark] thejdeep commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-12-27 Thread GitBox
thejdeep commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1058019046 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -100,11 +100,21 @@ message TaskDataWrapper { int64

[GitHub] [spark] LuciferYang commented on pull request #39235: [SPARK-41729][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0011` to `UNSUPPORTED_FEATURE.COMBINATION_QUERY_RESULT_CLAUSES`

2022-12-27 Thread GitBox
LuciferYang commented on PR #39235: URL: https://github.com/apache/spark/pull/39235#issuecomment-1366320342 [9d74522](https://github.com/apache/spark/pull/39235/commits/9d7452237e3febaf3ba6e8384db30aebb325b34b) remove `_LEGACY_ERROR_TEMP_0011` from `error-classes.json`. -- This is an

[GitHub] [spark] thejdeep commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-12-27 Thread GitBox
thejdeep commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1058019046 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -100,11 +100,21 @@ message TaskDataWrapper { int64

[GitHub] [spark] panbingkun commented on a diff in pull request #39192: [SPARK-41423][CORE] Protobuf serializer for StageDataWrapper

2022-12-27 Thread GitBox
panbingkun commented on code in PR #39192: URL: https://github.com/apache/spark/pull/39192#discussion_r1058009462 ## core/src/main/scala/org/apache/spark/status/protobuf/StageDataWrapperSerializer.scala: ## @@ -0,0 +1,622 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] tedyu commented on pull request #39250: [SQL][MINOR] Use Diamond operator for constructing HashMap

2022-12-27 Thread GitBox
tedyu commented on PR #39250: URL: https://github.com/apache/spark/pull/39250#issuecomment-1366305206 @srowen I have covered `JavaBeanDeserializationSuite` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon closed pull request #39243: [SPARK-41697][CONNECT][TESTS][FOLLOW-UP] Disable test_toDF_with_schema_string back, and fix test_freqItems

2022-12-27 Thread GitBox
HyukjinKwon closed pull request #39243: [SPARK-41697][CONNECT][TESTS][FOLLOW-UP] Disable test_toDF_with_schema_string back, and fix test_freqItems URL: https://github.com/apache/spark/pull/39243 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] HyukjinKwon commented on pull request #39243: [SPARK-41697][CONNECT][TESTS][FOLLOW-UP] Disable test_toDF_with_schema_string back, and fix test_freqItems

2022-12-27 Thread GitBox
HyukjinKwon commented on PR #39243: URL: https://github.com/apache/spark/pull/39243#issuecomment-1366301882 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] panbingkun commented on a diff in pull request #39192: [SPARK-41423][CORE] Protobuf serializer for StageDataWrapper

2022-12-27 Thread GitBox
panbingkun commented on code in PR #39192: URL: https://github.com/apache/spark/pull/39192#discussion_r1058004492 ## core/src/main/scala/org/apache/spark/status/protobuf/StageDataWrapperSerializer.scala: ## @@ -0,0 +1,622 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] HyukjinKwon closed pull request #39225: [SPARK-41654][CONNECT][TESTS] Enable doctests for pyspark.sql.connect.window

2022-12-27 Thread GitBox
HyukjinKwon closed pull request #39225: [SPARK-41654][CONNECT][TESTS] Enable doctests for pyspark.sql.connect.window URL: https://github.com/apache/spark/pull/39225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #39225: [SPARK-41654][CONNECT][TESTS] Enable doctests for pyspark.sql.connect.window

2022-12-27 Thread GitBox
HyukjinKwon commented on PR #39225: URL: https://github.com/apache/spark/pull/39225#issuecomment-1366301196 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] panbingkun commented on a diff in pull request #39192: [SPARK-41423][CORE] Protobuf serializer for StageDataWrapper

2022-12-27 Thread GitBox
panbingkun commented on code in PR #39192: URL: https://github.com/apache/spark/pull/39192#discussion_r1058002785 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -390,3 +390,214 @@ message SQLExecutionUIData { repeated int64 stages = 11;

  1   2   3   >