[GitHub] [spark] HyukjinKwon closed pull request #40673: [SPARK-41537][INFRA][CONNECT][FOLLOW-UP] Removes breaking changes within master branch

2023-04-08 Thread via GitHub
HyukjinKwon closed pull request #40673: [SPARK-41537][INFRA][CONNECT][FOLLOW-UP] Removes breaking changes within master branch URL: https://github.com/apache/spark/pull/40673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] HyukjinKwon commented on pull request #40673: [SPARK-41537][INFRA][CONNECT][FOLLOW-UP] Removes breaking changes within master branch

2023-04-08 Thread via GitHub
HyukjinKwon commented on PR #40673: URL: https://github.com/apache/spark/pull/40673#issuecomment-1501048604 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on pull request #40525: [SPARK-42859][CONNECT][PS] Basic support for pandas API on Spark Connect

2023-04-08 Thread via GitHub
HyukjinKwon commented on PR #40525: URL: https://github.com/apache/spark/pull/40525#issuecomment-1501048286 For example, you can't just import `from pyspark.sql.connect.column import Column as ConnectColumn` at `pyspark/pandas/_typing.py`. Even you don't use Spark connect, it will check the

[GitHub] [spark] HyukjinKwon commented on pull request #40716: [SPARK-43075][CONNECT] Change `gRPC` to `grpcio` when it is not installed.

2023-04-08 Thread via GitHub
HyukjinKwon commented on PR #40716: URL: https://github.com/apache/spark/pull/40716#issuecomment-1501048066 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] HyukjinKwon closed pull request #40716: [SPARK-43075][CONNECT] Change `gRPC` to `grpcio` when it is not installed.

2023-04-08 Thread via GitHub
HyukjinKwon closed pull request #40716: [SPARK-43075][CONNECT] Change `gRPC` to `grpcio` when it is not installed. URL: https://github.com/apache/spark/pull/40716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] HyukjinKwon commented on pull request #40525: [SPARK-42859][CONNECT][PS] Basic support for pandas API on Spark Connect

2023-04-08 Thread via GitHub
HyukjinKwon commented on PR #40525: URL: https://github.com/apache/spark/pull/40525#issuecomment-1501047901 The problem is that you don't use Spark Connect but it complains that it needs `grpcio` -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [spark] itholic commented on pull request #40525: [SPARK-42859][CONNECT][PS] Basic support for pandas API on Spark Connect

2023-04-08 Thread via GitHub
itholic commented on PR #40525: URL: https://github.com/apache/spark/pull/40525#issuecomment-1501047078 Thank you for the feedback, @bjornjorgensen ! IMHO, it seems more reasonable to add `grpcio` as a dependency for the Pandas API on Spark instead of reverting all this change back (O

[GitHub] [spark] grundprinzip commented on a diff in pull request #40695: [SPARK-42994][ML][CONNECT] PyTorch Distributor support Local Mode with GPU

2023-04-08 Thread via GitHub
grundprinzip commented on code in PR #40695: URL: https://github.com/apache/spark/pull/40695#discussion_r1161204815 ## python/pyspark/sql/connect/client.py: ## @@ -867,6 +878,8 @@ def _analyze(self, method: str, **kwargs: Any) -> AnalyzeResult: req.unpersist.bl

[GitHub] [spark] zzzzming95 commented on a diff in pull request #40714: [SPARK-43074] Add the function without constant parameters of `SessionState#executePlan`

2023-04-08 Thread via GitHub
ming95 commented on code in PR #40714: URL: https://github.com/apache/spark/pull/40714#discussion_r1161191880 ## sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala: ## @@ -125,6 +125,9 @@ private[sql] class SessionState( plan: LogicalPlan,

[GitHub] [spark] AngersZhuuuu commented on pull request #40701: [SPARK-43064][SQL] Spark SQL CLI SQL tab should only show once statement once

2023-04-08 Thread via GitHub
AngersZh commented on PR #40701: URL: https://github.com/apache/spark/pull/40701#issuecomment-1501012548 ping @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] github-actions[bot] closed pull request #39021: [SPARK-41483][CORE] Last metrics system report should have a timeout, avoid to lead shutdown hook timeout

2023-04-08 Thread via GitHub
github-actions[bot] closed pull request #39021: [SPARK-41483][CORE] Last metrics system report should have a timeout, avoid to lead shutdown hook timeout URL: https://github.com/apache/spark/pull/39021 -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] github-actions[bot] closed pull request #39259: [SPARK-41739][SQL] CheckRule should not be executed when analyze view child

2023-04-08 Thread via GitHub
github-actions[bot] closed pull request #39259: [SPARK-41739][SQL] CheckRule should not be executed when analyze view child URL: https://github.com/apache/spark/pull/39259 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [spark] amaliujia commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-08 Thread via GitHub
amaliujia commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1161169706 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -915,9 +916,9 @@ case class Cast( case TimestampType => buildCa

[GitHub] [spark] amaliujia commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-08 Thread via GitHub
amaliujia commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1161167873 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -915,9 +916,9 @@ case class Cast( case TimestampType => buildCa

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #40715: [SPARK-43007][TESTS][FOLLOWUP] Regenerate benchmark results of `StateStoreBasicOperationsBenchmark`

2023-04-08 Thread via GitHub
HeartSaVioR commented on code in PR #40715: URL: https://github.com/apache/spark/pull/40715#discussion_r1161163980 ## sql/core/benchmarks/StateStoreBasicOperationsBenchmark-jdk17-results.txt: ## Review Comment: I first skimmed through relative and thought it might be regres

[GitHub] [spark] HeartSaVioR closed pull request #40705: [SPARK-43067][SS] Correct the location of error class resource file in Kafka connector

2023-04-08 Thread via GitHub
HeartSaVioR closed pull request #40705: [SPARK-43067][SS] Correct the location of error class resource file in Kafka connector URL: https://github.com/apache/spark/pull/40705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [spark] HeartSaVioR commented on pull request #40705: [SPARK-43067][SS] Correct the location of error class resource file in Kafka connector

2023-04-08 Thread via GitHub
HeartSaVioR commented on PR #40705: URL: https://github.com/apache/spark/pull/40705#issuecomment-1500982175 I'll just merge this since the fix is super straightforward and CI passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [spark] bjornjorgensen opened a new pull request, #40716: Change `gRPC` to `grpcio` when it is not installed.

2023-04-08 Thread via GitHub
bjornjorgensen opened a new pull request, #40716: URL: https://github.com/apache/spark/pull/40716 ### What changes were proposed in this pull request? Change `gRPC` to `grpcio` This is ONLY in the printing, for users that haven't install `gRPC` ### Why are the changes needed

[GitHub] [spark] bjornjorgensen commented on pull request #40525: [SPARK-42859][CONNECT][PS] Basic support for pandas API on Spark Connect

2023-04-08 Thread via GitHub
bjornjorgensen commented on PR #40525: URL: https://github.com/apache/spark/pull/40525#issuecomment-1500961292 @itholic Thank you, great work :) After this PR `from pyspark import pandas as ps ` ModuleNotFoundError Traceback (most recent call last)

[GitHub] [spark] LuciferYang commented on pull request #40715: [SPARK-43007][TESTS][FOLLOWUP] Regenerate benchmark results of `StateStoreBasicOperationsBenchmark`

2023-04-08 Thread via GitHub
LuciferYang commented on PR #40715: URL: https://github.com/apache/spark/pull/40715#issuecomment-1500901102 cc @dongjoon-hyun @HeartSaVioR FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang opened a new pull request, #40715: [SPARK-43007][TESTS][FOLLOWUP] Regenerate benchmark results of `StateStoreBasicOperationsBenchmark`

2023-04-08 Thread via GitHub
LuciferYang opened a new pull request, #40715: URL: https://github.com/apache/spark/pull/40715 ### What changes were proposed in this pull request? This pr regenerate benchmark results of `StateStoreBasicOperationsBenchmark`. ### Why are the changes needed? https://github.com

[GitHub] [spark] zzzzming95 opened a new pull request, #40714: [SPARK-43074] Add the function without constant parameters of `SessionState#executePlan`

2023-04-08 Thread via GitHub
ming95 opened a new pull request, #40714: URL: https://github.com/apache/spark/pull/40714 ### What changes were proposed in this pull request? Add the function without constant parameters of `SessionState#executePlan` ### Why are the changes needed? Before

[GitHub] [spark] HeartSaVioR closed pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-08 Thread via GitHub
HeartSaVioR closed pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark URL: https://github.com/apache/spark/pull/40561 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-08 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1500894916 Thanks all for reviewing! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HeartSaVioR commented on pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-04-08 Thread via GitHub
HeartSaVioR commented on PR #40561: URL: https://github.com/apache/spark/pull/40561#issuecomment-1500894854 Confirmed CI passed for last commit. https://github.com/HeartSaVioR/spark/runs/12606973127 -- This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [spark] zhouyifan279 commented on pull request #40645: [SPARK-43014] Do not overwrite `spark.app.submitTime` in k8s cluster mode driver

2023-04-08 Thread via GitHub
zhouyifan279 commented on PR #40645: URL: https://github.com/apache/spark/pull/40645#issuecomment-1500887774 > Any updates, @zhouyifan279 ? @dongjoon-hyun sorry for response late. Speaking from my limited experience, I think very few users may encounter the case you mentioned.

[GitHub] [spark] wankunde opened a new pull request, #40713: [WIP][SPARK-42551][SQL] Support more subexpression elimination cases

2023-04-08 Thread via GitHub
wankunde opened a new pull request, #40713: URL: https://github.com/apache/spark/pull/40713 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How wa

[GitHub] [spark] HeartSaVioR commented on pull request #40702: [SPARK-43066][SQL] Add test for dropDuplicates in JavaDatasetSuite

2023-04-08 Thread via GitHub
HeartSaVioR commented on PR #40702: URL: https://github.com/apache/spark/pull/40702#issuecomment-1500869567 Thanks for reviewing and merging! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] hvanhovell commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-08 Thread via GitHub
hvanhovell commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1161097318 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -915,9 +916,9 @@ case class Cast( case TimestampType => buildC

[GitHub] [spark] hvanhovell commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-08 Thread via GitHub
hvanhovell commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1161097318 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -915,9 +916,9 @@ case class Cast( case TimestampType => buildC

[GitHub] [spark] LuciferYang commented on pull request #40699: [SPARK-43063][SQL] `df.show` handle null should print NULL instead of null

2023-04-08 Thread via GitHub
LuciferYang commented on PR #40699: URL: https://github.com/apache/spark/pull/40699#issuecomment-1500823598 @Yikf Can you re-trigger GA? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] zhengruifeng closed pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

2023-04-08 Thread via GitHub
zhengruifeng closed pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations URL: https://github.com/apache/spark/pull/40263 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] zhengruifeng commented on pull request #40695: [SPARK-42994][ML][CONNECT] PyTorch Distributor support Local Mode with GPU

2023-04-08 Thread via GitHub
zhengruifeng commented on PR #40695: URL: https://github.com/apache/spark/pull/40695#issuecomment-1500819117 cc @WeichenXu123 @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] zhengruifeng opened a new pull request, #40712: [SPARK-43073][CONNECT] Add proto data types constants

2023-04-08 Thread via GitHub
zhengruifeng opened a new pull request, #40712: URL: https://github.com/apache/spark/pull/40712 ### What changes were proposed in this pull request? Add constants for un-parameterized proto data types ### Why are the changes needed? avoid recreating them ### Does this PR i

[GitHub] [spark] HyukjinKwon closed pull request #40702: [SPARK-43066][SQL] Add test for dropDuplicates in JavaDatasetSuite

2023-04-08 Thread via GitHub
HyukjinKwon closed pull request #40702: [SPARK-43066][SQL] Add test for dropDuplicates in JavaDatasetSuite URL: https://github.com/apache/spark/pull/40702 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] HyukjinKwon commented on pull request #40702: [SPARK-43066][SQL] Add test for dropDuplicates in JavaDatasetSuite

2023-04-08 Thread via GitHub
HyukjinKwon commented on PR #40702: URL: https://github.com/apache/spark/pull/40702#issuecomment-1500814524 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #40698: [SPARK-43062][INFRA][PYTHON][TESTS] Add options to lint-python to run each test separately

2023-04-08 Thread via GitHub
HyukjinKwon closed pull request #40698: [SPARK-43062][INFRA][PYTHON][TESTS] Add options to lint-python to run each test separately URL: https://github.com/apache/spark/pull/40698 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [spark] HyukjinKwon commented on pull request #40698: [SPARK-43062][INFRA][PYTHON][TESTS] Add options to lint-python to run each test separately

2023-04-08 Thread via GitHub
HyukjinKwon commented on PR #40698: URL: https://github.com/apache/spark/pull/40698#issuecomment-1500814404 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #40525: [SPARK-42859][CONNECT][PS] Basic support for pandas API on Spark Connect

2023-04-08 Thread via GitHub
HyukjinKwon closed pull request #40525: [SPARK-42859][CONNECT][PS] Basic support for pandas API on Spark Connect URL: https://github.com/apache/spark/pull/40525 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [spark] HyukjinKwon commented on pull request #40525: [SPARK-42859][CONNECT][PS] Basic support for pandas API on Spark Connect

2023-04-08 Thread via GitHub
HyukjinKwon commented on PR #40525: URL: https://github.com/apache/spark/pull/40525#issuecomment-1500814281 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] mridulm commented on a diff in pull request #40707: [SPARK-43033][SQL] Avoid task retries due to AssertNotNull checks

2023-04-08 Thread via GitHub
mridulm commented on code in PR #40707: URL: https://github.com/apache/spark/pull/40707#discussion_r1161075409 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -2795,7 +2795,9 @@ private[sql] object QueryExecutionErrors extends QueryE

[GitHub] [spark] mridulm commented on pull request #40663: [SPARK-39696][CORE] Fix data race in access to TaskMetrics.externalAccums

2023-04-08 Thread via GitHub
mridulm commented on PR #40663: URL: https://github.com/apache/spark/pull/40663#issuecomment-1500810809 Thanks for checking @dongjoon-hyun and @LuciferYang ! Great to finally have this issue fixed :-) -- This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [spark] mridulm commented on pull request #40690: [SPARK-43043][CORE] Improve the performance of MapOutputTracker.updateMapOutput

2023-04-08 Thread via GitHub
mridulm commented on PR #40690: URL: https://github.com/apache/spark/pull/40690#issuecomment-1500810568 You are right about the null -> Int, should have checked better :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us