[GitHub] [spark] panbingkun commented on a diff in pull request #39192: [SPARK-41423][CORE] Protobuf serializer for StageDataWrapper

2022-12-27 Thread GitBox
panbingkun commented on code in PR #39192: URL: https://github.com/apache/spark/pull/39192#discussion_r1058002785 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -390,3 +390,214 @@ message SQLExecutionUIData { repeated int64 stages = 11;

[GitHub] [spark] srowen commented on pull request #39250: [SQL][MINOR] Use Diamond operator for constructing HashMap

2022-12-27 Thread GitBox
srowen commented on PR #39250: URL: https://github.com/apache/spark/pull/39250#issuecomment-1366299627 OK yeah I ran IJ inspections too. That's the only collection. There is one other related one I think we could fix, in JavaBeanDeserializationSuite : ``` List> expectedRecords

[GitHub] [spark] panbingkun commented on a diff in pull request #39192: [SPARK-41423][CORE] Protobuf serializer for StageDataWrapper

2022-12-27 Thread GitBox
panbingkun commented on code in PR #39192: URL: https://github.com/apache/spark/pull/39192#discussion_r1058002256 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -390,3 +390,214 @@ message SQLExecutionUIData { repeated int64 stages = 11;

[GitHub] [spark] HeartSaVioR commented on pull request #39245: [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing

2022-12-27 Thread GitBox
HeartSaVioR commented on PR #39245: URL: https://github.com/apache/spark/pull/39245#issuecomment-1366297616 cc. @viirya as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on pull request #39247: [SPARK-41733][SQL][SS] Apply tree-pattern based pruning for the rule ResolveWindowTime

2022-12-27 Thread GitBox
HeartSaVioR commented on PR #39247: URL: https://github.com/apache/spark/pull/39247#issuecomment-1366297569 cc. @viirya as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] beliefer commented on pull request #39246: [SPARK-41067][CONNECT][PYTHON] Implement `DataFrame.stat.cov`

2022-12-27 Thread GitBox
beliefer commented on PR #39246: URL: https://github.com/apache/spark/pull/39246#issuecomment-1366297511 ping @HyukjinKwon @zhengruifeng @grundprinzip @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] gengliangwang commented on a diff in pull request #39110: [SPARK-41429][UI] Protobuf serializer for RDDOperationGraphWrapper

2022-12-27 Thread GitBox
gengliangwang commented on code in PR #39110: URL: https://github.com/apache/spark/pull/39110#discussion_r1058000402 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -390,3 +390,38 @@ message SQLExecutionUIData { repeated int64 stages = 11;

[GitHub] [spark] tedyu commented on pull request #39250: [SQL][MINOR] Use Diamond operator for constructing HashMap

2022-12-27 Thread GitBox
tedyu commented on PR #39250: URL: https://github.com/apache/spark/pull/39250#issuecomment-1366289684 I searched for `LinkedList`, `HashSet`, `ArrayList` and `HashMap` among the java files. This is the only one I found. -- This is an automated message from the Apache Git Service.

[GitHub] [spark] srowen commented on pull request #39250: [SQL][MINOR] Use Diamond operator for constructing HashMap

2022-12-27 Thread GitBox
srowen commented on PR #39250: URL: https://github.com/apache/spark/pull/39250#issuecomment-1366289099 Are there more instances? please fix all at once, if there are more -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] tedyu commented on pull request #39250: [SQL][MINOR] Use Diamond operator for constructing HashMap

2022-12-27 Thread GitBox
tedyu commented on PR #39250: URL: https://github.com/apache/spark/pull/39250#issuecomment-1366283040 cc @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] tedyu opened a new pull request, #39250: [SQL][MINOR] Use Diamond operator for constructing HashMap

2022-12-27 Thread GitBox
tedyu opened a new pull request, #39250: URL: https://github.com/apache/spark/pull/39250 ### What changes were proposed in this pull request? This PR uses Diamond operator for constructing HashMap for type inference. ### Why are the changes needed? The change follows Java

[GitHub] [spark] Tagar commented on pull request #39115: [SPARK-41563][SQL] Support partition filter in MSCK REPAIR TABLE statement

2022-12-27 Thread GitBox
Tagar commented on PR #39115: URL: https://github.com/apache/spark/pull/39115#issuecomment-1366276253 I had a customer that runs MSCK on really huge tables, and it takes them hours to complete that operation. So this looks the same as the 2nd bullet point in @wecharyu's "why are the

[GitHub] [spark] github-actions[bot] closed pull request #37910: [SPARK-40469][CORE] Avoid creating directory failures

2022-12-27 Thread GitBox
github-actions[bot] closed pull request #37910: [SPARK-40469][CORE] Avoid creating directory failures URL: https://github.com/apache/spark/pull/37910 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] github-actions[bot] closed pull request #37625: [SPARK-40177][SQL] Simplify condition of form (a==b) || (a==null&==null) to a<=>b

2022-12-27 Thread GitBox
github-actions[bot] closed pull request #37625: [SPARK-40177][SQL] Simplify condition of form (a==b) || (a==null&==null) to a<=>b URL: https://github.com/apache/spark/pull/37625 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] techaddict opened a new pull request, #39249: [WIP] [SPARK-41655][CONNECT] Enable doctests in pyspark.sql.connect.column

2022-12-27 Thread GitBox
techaddict opened a new pull request, #39249: URL: https://github.com/apache/spark/pull/39249 ### What changes were proposed in this pull request? This PR proposes to enable doctests in pyspark.sql.connect.column that is virtually the same as pyspark.sql.column. ### Why are the

[GitHub] [spark] techaddict commented on a diff in pull request #39225: [WIP][SPARK-41654][CONNECT][TESTS] doctests for pyspark.sqlconnect.window

2022-12-27 Thread GitBox
techaddict commented on code in PR #39225: URL: https://github.com/apache/spark/pull/39225#discussion_r1057958961 ## python/pyspark/sql/connect/window.py: ## @@ -242,3 +243,46 @@ def rangeBetween(start: int, end: int) -> "WindowSpec": Window.__doc__ = PySparkWindow.__doc__

[GitHub] [spark] gengliangwang commented on a diff in pull request #39192: [SPARK-41423][CORE] Protobuf serializer for StageDataWrapper

2022-12-27 Thread GitBox
gengliangwang commented on code in PR #39192: URL: https://github.com/apache/spark/pull/39192#discussion_r1057955531 ## core/src/main/scala/org/apache/spark/status/protobuf/StageDataWrapperSerializer.scala: ## @@ -0,0 +1,622 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] gengliangwang commented on pull request #39192: [SPARK-41423][CORE] Protobuf serializer for StageDataWrapper

2022-12-27 Thread GitBox
gengliangwang commented on PR #39192: URL: https://github.com/apache/spark/pull/39192#issuecomment-1366229183 This is a big one. @panbingkun Thanks for working on it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39225: [WIP][SPARK-41654][CONNECT][TESTS] doctests for pyspark.sqlconnect.window

2022-12-27 Thread GitBox
HyukjinKwon commented on code in PR #39225: URL: https://github.com/apache/spark/pull/39225#discussion_r1057953902 ## python/pyspark/sql/connect/window.py: ## @@ -242,3 +243,46 @@ def rangeBetween(start: int, end: int) -> "WindowSpec": Window.__doc__ =

[GitHub] [spark] gengliangwang commented on a diff in pull request #39192: [SPARK-41423][CORE] Protobuf serializer for StageDataWrapper

2022-12-27 Thread GitBox
gengliangwang commented on code in PR #39192: URL: https://github.com/apache/spark/pull/39192#discussion_r1057953273 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -390,3 +390,214 @@ message SQLExecutionUIData { repeated int64 stages =

[GitHub] [spark] techaddict commented on a diff in pull request #39110: [SPARK-41429][UI] Protobuf serializer for RDDOperationGraphWrapper

2022-12-27 Thread GitBox
techaddict commented on code in PR #39110: URL: https://github.com/apache/spark/pull/39110#discussion_r1057953155 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -390,3 +390,38 @@ message SQLExecutionUIData { repeated int64 stages = 11;

[GitHub] [spark] gengliangwang commented on a diff in pull request #39226: [SPARK-41694][CORE] Add new config to clean up `spark.ui.store.path` directory when `SparkContext.stop()`

2022-12-27 Thread GitBox
gengliangwang commented on code in PR #39226: URL: https://github.com/apache/spark/pull/39226#discussion_r1057948247 ## docs/configuration.md: ## @@ -1388,6 +1388,14 @@ Apart from these, the following properties are also available, and may be useful 3.4.0 + +

[GitHub] [spark] gengliangwang commented on a diff in pull request #39226: [SPARK-41694][CORE] Add new config to clean up `spark.ui.store.path` directory when `SparkContext.stop()`

2022-12-27 Thread GitBox
gengliangwang commented on code in PR #39226: URL: https://github.com/apache/spark/pull/39226#discussion_r1057947130 ## core/src/main/scala/org/apache/spark/internal/config/Status.scala: ## @@ -77,4 +77,12 @@ private[spark] object Status { .version("3.4.0")

[GitHub] [spark] gengliangwang commented on a diff in pull request #39110: [SPARK-41429][UI] Protobuf serializer for RDDOperationGraphWrapper

2022-12-27 Thread GitBox
gengliangwang commented on code in PR #39110: URL: https://github.com/apache/spark/pull/39110#discussion_r1057942669 ## core/src/main/scala/org/apache/spark/status/protobuf/RDDOperationGraphWrapperSerializer.scala: ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] gengliangwang commented on a diff in pull request #39110: [SPARK-41429][UI] Protobuf serializer for RDDOperationGraphWrapper

2022-12-27 Thread GitBox
gengliangwang commented on code in PR #39110: URL: https://github.com/apache/spark/pull/39110#discussion_r1057939206 ## core/src/main/scala/org/apache/spark/status/protobuf/RDDOperationGraphWrapperSerializer.scala: ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] gengliangwang commented on a diff in pull request #39110: [SPARK-41429][UI] Protobuf serializer for RDDOperationGraphWrapper

2022-12-27 Thread GitBox
gengliangwang commented on code in PR #39110: URL: https://github.com/apache/spark/pull/39110#discussion_r1057938753 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -390,3 +390,38 @@ message SQLExecutionUIData { repeated int64 stages = 11;

[GitHub] [spark] gengliangwang commented on pull request #39202: [SPARK-41685][UI] Support Protobuf serializer for the KVStore in History server

2022-12-27 Thread GitBox
gengliangwang commented on PR #39202: URL: https://github.com/apache/spark/pull/39202#issuecomment-1366200977 I think SHS can read the written RocksDB after this PR, if the file path/DB backend/serializer configurations are set properly. -- This is an automated message from the Apache

[GitHub] [spark] gengliangwang commented on a diff in pull request #39202: [SPARK-41685][UI] Support Protobuf serializer for the KVStore in History server

2022-12-27 Thread GitBox
gengliangwang commented on code in PR #39202: URL: https://github.com/apache/spark/pull/39202#discussion_r1057932574 ## core/src/main/scala/org/apache/spark/internal/config/History.scala: ## @@ -79,6 +79,21 @@ private[spark] object History { .stringConf

[GitHub] [spark] gengliangwang commented on a diff in pull request #39202: [SPARK-41685][UI] Support Protobuf serializer for the KVStore in History server

2022-12-27 Thread GitBox
gengliangwang commented on code in PR #39202: URL: https://github.com/apache/spark/pull/39202#discussion_r1057932164 ## core/src/main/scala/org/apache/spark/status/KVUtils.scala: ## @@ -111,7 +122,7 @@ private[spark] object KVUtils extends Logging { // The default

[GitHub] [spark] gengliangwang commented on a diff in pull request #39202: [SPARK-41685][UI] Support Protobuf serializer for the KVStore in History server

2022-12-27 Thread GitBox
gengliangwang commented on code in PR #39202: URL: https://github.com/apache/spark/pull/39202#discussion_r1057931838 ## docs/monitoring.md: ## @@ -341,6 +341,16 @@ Security options for the Spark History Server are covered more detail in the 2.3.0 + +

[GitHub] [spark] grundprinzip commented on a diff in pull request #39212: [SPARK-41533][CONNECT] Proper Error Handling for Spark Connect Server / Client

2022-12-27 Thread GitBox
grundprinzip commented on code in PR #39212: URL: https://github.com/apache/spark/pull/39212#discussion_r1057696164 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -49,6 +53,67 @@ class SparkConnectService(debug:

[GitHub] [spark] mridulm commented on pull request #39011: [SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated

2022-12-27 Thread GitBox
mridulm commented on PR #39011: URL: https://github.com/apache/spark/pull/39011#issuecomment-1366193206 Merged to master. Thanks for working on this @Ngone51 ! Thanks for review @dongjoon-hyun :-) -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] mridulm closed pull request #39011: [SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated

2022-12-27 Thread GitBox
mridulm closed pull request #39011: [SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated URL: https://github.com/apache/spark/pull/39011 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] mridulm commented on pull request #39200: [CORE][MINOR] Correct spelling for RPC in log

2022-12-27 Thread GitBox
mridulm commented on PR #39200: URL: https://github.com/apache/spark/pull/39200#issuecomment-1366192182 Thanks for fixing this @tedyu, and thanks for merging during holidays @srowen :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] mridulm commented on pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2022-12-27 Thread GitBox
mridulm commented on PR #37922: URL: https://github.com/apache/spark/pull/37922#issuecomment-1366191335 +CC @otterc, can you take a look at this PR ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2022-12-27 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1057911122 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -396,6 +403,67 @@ public void applicationRemoved(String

[GitHub] [spark] mridulm commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-12-27 Thread GitBox
mridulm commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1057888476 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -100,11 +100,21 @@ message TaskDataWrapper { int64

[GitHub] [spark] AmplabJenkins commented on pull request #39212: [SPARK-41533][CONNECT] Proper Error Handling for Spark Connect Server / Client

2022-12-27 Thread GitBox
AmplabJenkins commented on PR #39212: URL: https://github.com/apache/spark/pull/39212#issuecomment-1366150205 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] mridulm commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2022-12-27 Thread GitBox
mridulm commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r1057887707 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1904,4 +1951,52 @@ long getPos() { return pos;

[GitHub] [spark] mridulm commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2022-12-27 Thread GitBox
mridulm commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r1057886779 ## common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java: ## @@ -235,6 +251,7 @@ public void

[GitHub] [spark] mridulm commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2022-12-27 Thread GitBox
mridulm commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r1057885497 ## common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java: ## @@ -257,6 +274,7 @@ public void

[GitHub] [spark] mridulm commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2022-12-27 Thread GitBox
mridulm commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r1057885497 ## common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java: ## @@ -257,6 +274,7 @@ public void

[GitHub] [spark] mridulm commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2022-12-27 Thread GitBox
mridulm commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r1057878241 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1220,6 +1260,7 @@ public void onData(String streamId,

[GitHub] [spark] techaddict commented on a diff in pull request #39110: [SPARK-41429][UI] Protobuf serializer for RDDOperationGraphWrapper

2022-12-27 Thread GitBox
techaddict commented on code in PR #39110: URL: https://github.com/apache/spark/pull/39110#discussion_r1057867101 ## core/src/main/scala/org/apache/spark/status/protobuf/RDDOperationGraphWrapperSerializer.scala: ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] techaddict commented on a diff in pull request #39110: [SPARK-41429][UI] Protobuf serializer for RDDOperationGraphWrapper

2022-12-27 Thread GitBox
techaddict commented on code in PR #39110: URL: https://github.com/apache/spark/pull/39110#discussion_r1057866682 ## core/src/main/scala/org/apache/spark/status/protobuf/RDDOperationGraphWrapperSerializer.scala: ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] srowen commented on pull request #39218: [SPARK-41714][BUILD] Update maven-checkstyle-plugin from 3.1.2 to 3.2.0

2022-12-27 Thread GitBox
srowen commented on PR #39218: URL: https://github.com/apache/spark/pull/39218#issuecomment-1366105135 I checked manually and it's fine. Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] srowen closed pull request #39218: [SPARK-41714][BUILD] Update maven-checkstyle-plugin from 3.1.2 to 3.2.0

2022-12-27 Thread GitBox
srowen closed pull request #39218: [SPARK-41714][BUILD] Update maven-checkstyle-plugin from 3.1.2 to 3.2.0 URL: https://github.com/apache/spark/pull/39218 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] mridulm commented on pull request #39190: [SPARK-41683][CORE] Fix issue of getting incorrect property numActiveStages in jobs API

2022-12-27 Thread GitBox
mridulm commented on PR #39190: URL: https://github.com/apache/spark/pull/39190#issuecomment-1366095161 +CC @thejdeep -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] mridulm commented on a diff in pull request #39226: [SPARK-41694][CORE] Add new config to clean up `spark.ui.store.path` directory when `SparkContext.stop()`

2022-12-27 Thread GitBox
mridulm commented on code in PR #39226: URL: https://github.com/apache/spark/pull/39226#discussion_r1057840697 ## core/src/main/scala/org/apache/spark/status/AppStatusStore.scala: ## @@ -733,6 +734,15 @@ private[spark] class AppStatusStore( def close(): Unit = {

[GitHub] [spark] mridulm commented on a diff in pull request #39226: [SPARK-41694][CORE] Add new config to clean up `spark.ui.store.path` directory when `SparkContext.stop()`

2022-12-27 Thread GitBox
mridulm commented on code in PR #39226: URL: https://github.com/apache/spark/pull/39226#discussion_r1057840697 ## core/src/main/scala/org/apache/spark/status/AppStatusStore.scala: ## @@ -733,6 +734,15 @@ private[spark] class AppStatusStore( def close(): Unit = {

[GitHub] [spark] fe2s commented on a diff in pull request #39099: [SPARK-41554] fix changing of Decimal scale when scale decreased by m…

2022-12-27 Thread GitBox
fe2s commented on code in PR #39099: URL: https://github.com/apache/spark/pull/39099#discussion_r1057810986 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala: ## @@ -374,7 +374,7 @@ final class Decimal extends Ordered[Decimal] with Serializable {

[GitHub] [spark] fe2s commented on a diff in pull request #39099: [SPARK-41554] fix changing of Decimal scale when scale decreased by m…

2022-12-27 Thread GitBox
fe2s commented on code in PR #39099: URL: https://github.com/apache/spark/pull/39099#discussion_r1057810842 ## sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala: ## @@ -384,4 +387,51 @@ class DecimalSuite extends SparkFunSuite with PrivateMethodTester

[GitHub] [spark] allisonwang-db commented on a diff in pull request #39133: [SPARK-41595][SQL] Support generator function explode/explode_outer in the FROM clause

2022-12-27 Thread GitBox
allisonwang-db commented on code in PR #39133: URL: https://github.com/apache/spark/pull/39133#discussion_r1057781800 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala: ## @@ -134,6 +134,7 @@ object TreePattern extends Enumeration { val

[GitHub] [spark] MaxGekk commented on pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2022-12-27 Thread GitBox
MaxGekk commented on PR #39239: URL: https://github.com/apache/spark/pull/39239#issuecomment-1366021084 @HyukjinKwon @itholic All 4 failed tests are related to Pandas, and it seems Pandas code suffers from the issue too but I cannot find where the conversion happens. Could you point me out

[GitHub] [spark] AmplabJenkins commented on pull request #39218: [SPARK-41714][BUILD] Update maven-checkstyle-plugin from 3.1.2 to 3.2.0

2022-12-27 Thread GitBox
AmplabJenkins commented on PR #39218: URL: https://github.com/apache/spark/pull/39218#issuecomment-1366010944 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #39219: [WIP][SPARK-41277] Auto infer bucketing info for shuffled actions

2022-12-27 Thread GitBox
AmplabJenkins commented on PR #39219: URL: https://github.com/apache/spark/pull/39219#issuecomment-1366010904 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan opened a new pull request, #39248: [WIP] revisit stateful expression handling

2022-12-27 Thread GitBox
cloud-fan opened a new pull request, #39248: URL: https://github.com/apache/spark/pull/39248 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] srowen commented on a diff in pull request #39215: [SPARK-41709][CORE][SQL][UI] Explicitly define `Seq` as `collection.Seq` to avoid `toSeq` when create ui objects from protobuf objec

2022-12-27 Thread GitBox
srowen commented on code in PR #39215: URL: https://github.com/apache/spark/pull/39215#discussion_r1057719199 ## project/MimaExcludes.scala: ## @@ -129,7 +129,16 @@ object MimaExcludes {

[GitHub] [spark] peter-toth commented on pull request #38034: [SPARK-40599][SQL] Add multiTransform methods to TreeNode to generate alternatives

2022-12-27 Thread GitBox
peter-toth commented on PR #38034: URL: https://github.com/apache/spark/pull/38034#issuecomment-1365948997 @cloud-fan, @sigmod I still think `multiTransform()` can be useful helper. Please find a few examples a previous comment. I've rebased the PRs on the latest `master` once more. Let me

[GitHub] [spark] AmplabJenkins commented on pull request #39221: [SPARK-41719] [CORE]: SSLOptions sub settings should be set only when ssl is enabled

2022-12-27 Thread GitBox
AmplabJenkins commented on PR #39221: URL: https://github.com/apache/spark/pull/39221#issuecomment-1365921603 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon closed pull request #39244: [SPARK-41643][CONNECT][PYTHON][FOLLOWUP] Deduplicate docstrings of `Column.over`

2022-12-27 Thread GitBox
HyukjinKwon closed pull request #39244: [SPARK-41643][CONNECT][PYTHON][FOLLOWUP] Deduplicate docstrings of `Column.over` URL: https://github.com/apache/spark/pull/39244 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #39244: [SPARK-41643][CONNECT][PYTHON][FOLLOWUP] Deduplicate docstrings of `Column.over`

2022-12-27 Thread GitBox
HyukjinKwon commented on PR #39244: URL: https://github.com/apache/spark/pull/39244#issuecomment-1365879611 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #39241: [SPARK-41731][CONNECT][PYTHON] Implement the column accessor

2022-12-27 Thread GitBox
HyukjinKwon closed pull request #39241: [SPARK-41731][CONNECT][PYTHON] Implement the column accessor URL: https://github.com/apache/spark/pull/39241 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #39241: [SPARK-41731][CONNECT][PYTHON] Implement the column accessor

2022-12-27 Thread GitBox
HyukjinKwon commented on PR #39241: URL: https://github.com/apache/spark/pull/39241#issuecomment-1365879294 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on a diff in pull request #39192: [SPARK-41423][CORE] Protobuf serializer for StageDataWrapper

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39192: URL: https://github.com/apache/spark/pull/39192#discussion_r1057610747 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -390,3 +390,214 @@ message SQLExecutionUIData { repeated int64 stages = 11;

[GitHub] [spark] beliefer commented on a diff in pull request #39240: [SPARK-41440][CONNECT][PYTHON] Avoid the cache operator for general Sample.

2022-12-27 Thread GitBox
beliefer commented on code in PR #39240: URL: https://github.com/apache/spark/pull/39240#discussion_r1057644855 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -381,7 +381,7 @@ message Sample { // (Optional) Explicitly sort the underlying

[GitHub] [spark] bjornjorgensen commented on pull request #39242: [SPARK-41649][CONNECT][FOLLOW-UP] Change `PySparkWindowSpec.Window` to `PySparkWindow`

2022-12-27 Thread GitBox
bjornjorgensen commented on PR #39242: URL: https://github.com/apache/spark/pull/39242#issuecomment-1365849228 @HyukjinKwon Thank you :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2022-12-27 Thread GitBox
beliefer commented on PR #39091: URL: https://github.com/apache/spark/pull/39091#issuecomment-1365847958 ping @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HeartSaVioR commented on pull request #39245: [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing

2022-12-27 Thread GitBox
HeartSaVioR commented on PR #39245: URL: https://github.com/apache/spark/pull/39245#issuecomment-1365842134 We may need to port back to 3.3/3.2 version lines as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HeartSaVioR commented on pull request #39247: [SPARK-41733][SQL][SS] Apply tree-pattern based pruning for the rule ResolveWindowTime

2022-12-27 Thread GitBox
HeartSaVioR commented on PR #39247: URL: https://github.com/apache/spark/pull/39247#issuecomment-1365841711 cc. @cloud-fan @rxin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on pull request #39245: [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing

2022-12-27 Thread GitBox
HeartSaVioR commented on PR #39245: URL: https://github.com/apache/spark/pull/39245#issuecomment-1365841672 cc. @cloud-fan @rxin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on pull request #38288: [SPARK-40821][SQL][CORE][PYTHON][SS] Introduce window_time function to extract event time from the window column

2022-12-27 Thread GitBox
HeartSaVioR commented on PR #38288: URL: https://github.com/apache/spark/pull/38288#issuecomment-1365840718 I'm sorry I was on vacation - you're right we seem to miss pruning, and we also seem to miss the same for session window. My bad. I've submitted PRs separately for both cases.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39214: [SPARK-41707][CONNECT] Implement Catalog API in Spark Connect

2022-12-27 Thread GitBox
HyukjinKwon commented on code in PR #39214: URL: https://github.com/apache/spark/pull/39214#discussion_r1057627227 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -68,6 +69,37 @@ message Relation { StatCrosstab crosstab = 101;

[GitHub] [spark] HeartSaVioR opened a new pull request, #39247: [SPARK-41733][SQL][SS] Apply tree-pattern based pruning for the rule ResolveWindowTime

2022-12-27 Thread GitBox
HeartSaVioR opened a new pull request, #39247: URL: https://github.com/apache/spark/pull/39247 ### What changes were proposed in this pull request? This PR proposes to apply tree-pattern based pruning for the rule ResolveWindowTime, to minimize the evaluation of rule with WindowTime

[GitHub] [spark] beliefer opened a new pull request, #39246: [SPARK-41067][CONNECT][PYTHON] Implement `DataFrame.stat.cov`

2022-12-27 Thread GitBox
beliefer opened a new pull request, #39246: URL: https://github.com/apache/spark/pull/39246 ### What changes were proposed in this pull request? Implement `DataFrame.stat.cov` with a proto message Implement `DataFrame.stat.cov` for scala API Implement `DataFrame.stat.cov` for

[GitHub] [spark] HeartSaVioR opened a new pull request, #39245: [SPARK-41732][SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing

2022-12-27 Thread GitBox
HeartSaVioR opened a new pull request, #39245: URL: https://github.com/apache/spark/pull/39245 ### What changes were proposed in this pull request? This PR proposes to apply tree-pattern based pruning for the rule SessionWindowing, to minimize the evaluation of rule with

[GitHub] [spark] zhengruifeng opened a new pull request, #39244: [SPARK-41643][CONNECT][PYTHON][FOLLOWUP] Deduplicate docstrings of `Column.over`

2022-12-27 Thread GitBox
zhengruifeng opened a new pull request, #39244: URL: https://github.com/apache/spark/pull/39244 ### What changes were proposed in this pull request? Deduplicate docstrings of `Column.over` ### Why are the changes needed? For easier maintenance ### Does

[GitHub] [spark] LuciferYang commented on a diff in pull request #39164: [SPARK-41432][UI][SQL] Protobuf serializer for SparkPlanGraphWrapper

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39164: URL: https://github.com/apache/spark/pull/39164#discussion_r1057609250 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SparkPlanGraphWrapperSerializer.scala: ## @@ -0,0 +1,152 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] HyukjinKwon commented on pull request #39243: [SPARK-41697][CONNECT][TESTS][FOLLOW-UP] Disable test_toDF_with_schema_string back, and fix test_freqItems

2022-12-27 Thread GitBox
HyukjinKwon commented on PR #39243: URL: https://github.com/apache/spark/pull/39243#issuecomment-1365801216 cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #39240: [SPARK-41440][CONNECT][PYTHON] Avoid the cache operator for general Sample.

2022-12-27 Thread GitBox
cloud-fan commented on code in PR #39240: URL: https://github.com/apache/spark/pull/39240#discussion_r1057599857 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -381,7 +381,7 @@ message Sample { // (Optional) Explicitly sort the

[GitHub] [spark] cloud-fan commented on a diff in pull request #39240: [SPARK-41440][CONNECT][PYTHON] Avoid the cache operator for general Sample.

2022-12-27 Thread GitBox
cloud-fan commented on code in PR #39240: URL: https://github.com/apache/spark/pull/39240#discussion_r1057598895 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -381,7 +381,7 @@ message Sample { // (Optional) Explicitly sort the

[GitHub] [spark] HyukjinKwon opened a new pull request, #39243: [SPARK-41697][CONNECT][TESTS][FOLLOW-UP] Disable test_toDF_with_schema_string back, and fix test_freqItems

2022-12-27 Thread GitBox
HyukjinKwon opened a new pull request, #39243: URL: https://github.com/apache/spark/pull/39243 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/39193 that: 1. Disables `test_toDF_with_schema_string` back because it

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39240: [SPARK-41440][CONNECT][PYTHON] Avoid the cache operator for general Sample.

2022-12-27 Thread GitBox
zhengruifeng commented on code in PR #39240: URL: https://github.com/apache/spark/pull/39240#discussion_r1057594069 ## python/pyspark/sql/tests/connect/test_connect_plan_only.py: ## @@ -244,21 +244,21 @@ def checkRelations(relations: List["DataFrame"]):

[GitHub] [spark] HyukjinKwon closed pull request #39242: [SPARK-41649][CONNECT][FOLLOW-UP] Change `PySparkWindowSpec.Window` to `PySparkWindow`

2022-12-27 Thread GitBox
HyukjinKwon closed pull request #39242: [SPARK-41649][CONNECT][FOLLOW-UP] Change `PySparkWindowSpec.Window` to `PySparkWindow` URL: https://github.com/apache/spark/pull/39242 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39240: [SPARK-41440][CONNECT][PYTHON] Avoid the cache operator for general Sample.

2022-12-27 Thread GitBox
zhengruifeng commented on code in PR #39240: URL: https://github.com/apache/spark/pull/39240#discussion_r1057593107 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -178,8 +178,9 @@ class

[GitHub] [spark] HyukjinKwon commented on pull request #39242: [SPARK-41649][CONNECT][FOLLOW-UP] Change `PySparkWindowSpec.Window` to `PySparkWindow`

2022-12-27 Thread GitBox
HyukjinKwon commented on PR #39242: URL: https://github.com/apache/spark/pull/39242#issuecomment-1365795311 I manually verified this. I am merging this to unblock other PRs. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon opened a new pull request, #39242: [SPARK-41649][CONNECT][FOLLOW-UP] Change `PySparkWindowSpec.Window` to `PySparkWindow`

2022-12-27 Thread GitBox
HyukjinKwon opened a new pull request, #39242: URL: https://github.com/apache/spark/pull/39242 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/39238 that fixes the mistake on the deduplication of docs in Window.

[GitHub] [spark] beliefer commented on a diff in pull request #39236: [SPARK-41068][CONNECT][PYTHON] Implement `DataFrame.stat.corr`

2022-12-27 Thread GitBox
beliefer commented on code in PR #39236: URL: https://github.com/apache/spark/pull/39236#discussion_r1057591817 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -970,6 +972,11 @@ def test_show(self): expected = "+---+---+\n| X| Y|\n+---+---+\n| 1|

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39238: [SPARK-41649][CONNECT][FOLLOWUP] Deduplicate docstrings in pyspark.sql.connect.window

2022-12-27 Thread GitBox
HyukjinKwon commented on code in PR #39238: URL: https://github.com/apache/spark/pull/39238#discussion_r1057590815 ## python/pyspark/sql/connect/window.py: ## @@ -241,4 +241,4 @@ def rangeBetween(start: int, end: int) -> "WindowSpec": rangeBetween.__doc__ =

[GitHub] [spark] LuciferYang commented on a diff in pull request #39215: [SPARK-41709][CORE][SQL][UI] Explicitly define `Seq` as `collection.Seq` to avoid `toSeq` when create ui objects from protobuf

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39215: URL: https://github.com/apache/spark/pull/39215#discussion_r1057590714 ## sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala: ## @@ -64,7 +64,7 @@ class SQLExecutionUIDataSerializer extends

[GitHub] [spark] zhengruifeng opened a new pull request, #39241: [SPARK-41731][CONNECT][PYTHON] Implement the column accessor

2022-12-27 Thread GitBox
zhengruifeng opened a new pull request, #39241: URL: https://github.com/apache/spark/pull/39241 ### What changes were proposed in this pull request? Implement the column accessor: 1. getItem 2. getField 3. __getattr__ 4. __getitem__ ### Why are the changes needed?

[GitHub] [spark] LuciferYang commented on a diff in pull request #39215: [SPARK-41709][CORE][SQL][UI] Explicitly define `Seq` as `collection.Seq` to avoid `toSeq` when create ui objects from protobuf

2022-12-27 Thread GitBox
LuciferYang commented on code in PR #39215: URL: https://github.com/apache/spark/pull/39215#discussion_r1057583961 ## project/MimaExcludes.scala: ## @@ -129,7 +129,16 @@ object MimaExcludes {

[GitHub] [spark] HyukjinKwon commented on pull request #39232: [SPARK-41378][SQL][FOLLOWUP] Fix compilation warning about `[unchecked] unchecked conversion`

2022-12-27 Thread GitBox
HyukjinKwon commented on PR #39232: URL: https://github.com/apache/spark/pull/39232#issuecomment-1365783152 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #39232: [SPARK-41378][SQL][FOLLOWUP] Fix compilation warning about `[unchecked] unchecked conversion`

2022-12-27 Thread GitBox
HyukjinKwon closed pull request #39232: [SPARK-41378][SQL][FOLLOWUP] Fix compilation warning about `[unchecked] unchecked conversion` URL: https://github.com/apache/spark/pull/39232 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #39240: [SPARK-41440][CONNECT][PYTHON] Avoid the cache operator for general Sample.

2022-12-27 Thread GitBox
HyukjinKwon commented on PR #39240: URL: https://github.com/apache/spark/pull/39240#issuecomment-1365782920 cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] beliefer commented on pull request #38867: [SPARK-41234][SQL][PYTHON] Add `array_insert` function

2022-12-27 Thread GitBox
beliefer commented on PR #38867: URL: https://github.com/apache/spark/pull/38867#issuecomment-1365779642 @Daniel-Davies Could you add the syntax, arguments, examples and the list of the mainstream database supports `array_insert` ? refer https://github.com/apache/spark/pull/38865 --

[GitHub] [spark] HyukjinKwon closed pull request #39234: [SPARK-41728][CONNECT][PYTHON] Implement `unwrap_udt` function

2022-12-27 Thread GitBox
HyukjinKwon closed pull request #39234: [SPARK-41728][CONNECT][PYTHON] Implement `unwrap_udt` function URL: https://github.com/apache/spark/pull/39234 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on pull request #39234: [SPARK-41728][CONNECT][PYTHON] Implement `unwrap_udt` function

2022-12-27 Thread GitBox
HyukjinKwon commented on PR #39234: URL: https://github.com/apache/spark/pull/39234#issuecomment-1365779307 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] beliefer commented on pull request #38947: [SPARK-41233][SQL] Add `array_prepend` function

2022-12-27 Thread GitBox
beliefer commented on PR #38947: URL: https://github.com/apache/spark/pull/38947#issuecomment-1365775810 Could we wait array_append merged then this PR could follow some convention and make a better abstract ? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HyukjinKwon closed pull request #39237: [SPARK-41717][CONNECT][DOCS][FOLLOW-UP] Add docstrings for _parameters_to_print, print and _repr_html_ at LogicalPlan

2022-12-27 Thread GitBox
HyukjinKwon closed pull request #39237: [SPARK-41717][CONNECT][DOCS][FOLLOW-UP] Add docstrings for _parameters_to_print, print and _repr_html_ at LogicalPlan URL: https://github.com/apache/spark/pull/39237 -- This is an automated message from the Apache Git Service. To respond to the

<    1   2   3   >