[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-28 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1034404863 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -775,16 +779,14 @@ private[spark] class ExecutorAllocationManager( }

[GitHub] [spark] zhengruifeng commented on pull request #38834: [MINOR][PYTHON][DOCS] Fix types and docstring in DataFrame.toDF

2022-11-28 Thread GitBox
zhengruifeng commented on PR #38834: URL: https://github.com/apache/spark/pull/38834#issuecomment-1330223964 late lgtm, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] HyukjinKwon closed pull request #38834: [MINOR][PYTHON][DOCS] Fix types and docstring in DataFrame.toDF

2022-11-28 Thread GitBox
HyukjinKwon closed pull request #38834: [MINOR][PYTHON][DOCS] Fix types and docstring in DataFrame.toDF URL: https://github.com/apache/spark/pull/38834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] HyukjinKwon commented on pull request #38834: [MINOR][PYTHON][DOCS] Fix types and docstring in DataFrame.toDF

2022-11-28 Thread GitBox
HyukjinKwon commented on PR #38834: URL: https://github.com/apache/spark/pull/38834#issuecomment-1330223524 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang opened a new pull request, #38835: [SPARK-41316][SQL] Enable tail-recursion wherever possible

2022-11-28 Thread GitBox
LuciferYang opened a new pull request, #38835: URL: https://github.com/apache/spark/pull/38835 ### What changes were proposed in this pull request? Similar to SPARK-37783, this pr adds `@scala.annotation.tailrec` inspected by IDE (IntelliJ), these are new cases after Spark 3.3.

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-28 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1034403916 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -643,10 +643,12 @@ private[spark] class ExecutorAllocationManager( // Should be 0

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-28 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1034403079 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -643,10 +643,12 @@ private[spark] class ExecutorAllocationManager( // Should be 0

[GitHub] [spark] amaliujia commented on pull request #38834: [MINOR][PYTHON][DOCS] Fix types and docstring in DataFrame.toDF

2022-11-28 Thread GitBox
amaliujia commented on PR #38834: URL: https://github.com/apache/spark/pull/38834#issuecomment-1330222089 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38829: [SPARK-41310][CONNECT][PYTHON] Implement DataFrame.toDF

2022-11-28 Thread GitBox
HyukjinKwon commented on code in PR #38829: URL: https://github.com/apache/spark/pull/38829#discussion_r1034397626 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1068,6 +1068,23 @@ def inputFiles(self) -> List[str]: query = self._plan.to_proto(self._session.client)

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-28 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1034397331 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -643,10 +643,12 @@ private[spark] class ExecutorAllocationManager( // Should be 0

[GitHub] [spark] HyukjinKwon opened a new pull request, #38834: [MINOR][DOCS] Fix types and docstring in DataFrame.toDF

2022-11-28 Thread GitBox
HyukjinKwon opened a new pull request, #38834: URL: https://github.com/apache/spark/pull/38834 ### What changes were proposed in this pull request? `df.toDF` cannot take `Column`s: ```python >>> df.toDF(df.id) ``` ``` Traceback (most recent call last): File "",

[GitHub] [spark] zhengruifeng commented on pull request #38831: [SPARK-41312][CONNECT][PYTHON] Implement DataFrame.withColumnRenamed

2022-11-28 Thread GitBox
zhengruifeng commented on PR #38831: URL: https://github.com/apache/spark/pull/38831#issuecomment-1330213301 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] zhengruifeng closed pull request #38831: [SPARK-41312][CONNECT][PYTHON] Implement DataFrame.withColumnRenamed

2022-11-28 Thread GitBox
zhengruifeng closed pull request #38831: [SPARK-41312][CONNECT][PYTHON] Implement DataFrame.withColumnRenamed URL: https://github.com/apache/spark/pull/38831 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38829: [SPARK-41310][CONNECT][PYTHON] Implement DataFrame.toDF

2022-11-28 Thread GitBox
HyukjinKwon commented on code in PR #38829: URL: https://github.com/apache/spark/pull/38829#discussion_r1034394129 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1068,6 +1068,23 @@ def inputFiles(self) -> List[str]: query = self._plan.to_proto(self._session.client)

[GitHub] [spark] HyukjinKwon commented on pull request #38796: [SPARK-41260][PYTHON][SS] Cast NumPy instances to Python primitive types in GroupState update

2022-11-28 Thread GitBox
HyukjinKwon commented on PR #38796: URL: https://github.com/apache/spark/pull/38796#issuecomment-1330200925 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] Ngone51 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-28 Thread GitBox
Ngone51 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1034375202 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -810,9 +812,10 @@ private[spark] class ExecutorAllocationManager( val stageId = sp

[GitHub] [spark] dongjoon-hyun closed pull request #38833: Test Apache ORC 1.8.1 SNAPSHOT

2022-11-28 Thread GitBox
dongjoon-hyun closed pull request #38833: Test Apache ORC 1.8.1 SNAPSHOT URL: https://github.com/apache/spark/pull/38833 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [spark] grundprinzip commented on a diff in pull request #38806: [SPARK-41268][CONNECT][PYTHON] Refactor "Column" for API Compatibility

2022-11-28 Thread GitBox
grundprinzip commented on code in PR #38806: URL: https://github.com/apache/spark/pull/38806#discussion_r1034373263 ## python/pyspark/sql/connect/column.py: ## @@ -314,3 +323,62 @@ def to_plan(self, session: "RemoteSparkSession") -> proto.Expression: def __str__(self) ->

[GitHub] [spark] HeartSaVioR closed pull request #38796: [SPARK-41260][PYTHON][SS] Cast NumPy instances to Python primitive types in GroupState update

2022-11-28 Thread GitBox
HeartSaVioR closed pull request #38796: [SPARK-41260][PYTHON][SS] Cast NumPy instances to Python primitive types in GroupState update URL: https://github.com/apache/spark/pull/38796 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HeartSaVioR commented on pull request #38796: [SPARK-41260][PYTHON][SS] Cast NumPy instances to Python primitive types in GroupState update

2022-11-28 Thread GitBox
HeartSaVioR commented on PR #38796: URL: https://github.com/apache/spark/pull/38796#issuecomment-1330183042 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] pkudinov commented on pull request #37536: [WIP][SPARK-40100][SQL] Add DataType class for Int128 type

2022-11-28 Thread GitBox
pkudinov commented on PR #37536: URL: https://github.com/apache/spark/pull/37536#issuecomment-1330178829 Hi @beliefer, are there any plans to complete this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] cloud-fan commented on pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-11-28 Thread GitBox
cloud-fan commented on PR #38799: URL: https://github.com/apache/spark/pull/38799#issuecomment-1330155848 > since new logical and physical plans are added, should also consider how it affect existing rules. How about we do this optimization as a planner rule? then logical plan won't

[GitHub] [spark] zhengruifeng commented on pull request #38800: [SPARK-41264][CONNECT][PYTHON] Make Literal support more datatypes

2022-11-28 Thread GitBox
zhengruifeng commented on PR #38800: URL: https://github.com/apache/spark/pull/38800#issuecomment-1330153837 since I have other PRs depending on this work, let me merge it into master now, thanks for reviews -- This is an automated message from the Apache Git Service. To respond to the me

[GitHub] [spark] zhengruifeng closed pull request #38800: [SPARK-41264][CONNECT][PYTHON] Make Literal support more datatypes

2022-11-28 Thread GitBox
zhengruifeng closed pull request #38800: [SPARK-41264][CONNECT][PYTHON] Make Literal support more datatypes URL: https://github.com/apache/spark/pull/38800 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-11-28 Thread GitBox
zhengruifeng commented on code in PR #38799: URL: https://github.com/apache/spark/pull/38799#discussion_r1034340755 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowGroupLimitExec.scala: ## @@ -0,0 +1,236 @@ +/* + * Licensed to the Apache Software Foundatio

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38806: [SPARK-41268][CONNECT][PYTHON] Refactor "Column" for API Compatibility

2022-11-28 Thread GitBox
HyukjinKwon commented on code in PR #38806: URL: https://github.com/apache/spark/pull/38806#discussion_r1034335214 ## python/pyspark/sql/connect/column.py: ## @@ -314,3 +323,62 @@ def to_plan(self, session: "RemoteSparkSession") -> proto.Expression: def __str__(self) ->

[GitHub] [spark] srielau commented on pull request #38728: [SPARK-41204] [CONNECT] Migrate custom exceptions to use Spark exceptions

2022-11-28 Thread GitBox
srielau commented on PR #38728: URL: https://github.com/apache/spark/pull/38728#issuecomment-1330129095 General comment. You use CONNECT as error class, everything else is a sub error class. Many of these are INVALID_PLAN. How many errors total do you expect? Note that at present we su

[GitHub] [spark] zhengruifeng commented on pull request #38799: [SPARK-37099][SQL] Introduce the group limit of Window for rank-based filter to optimize top-k computation

2022-11-28 Thread GitBox
zhengruifeng commented on PR #38799: URL: https://github.com/apache/spark/pull/38799#issuecomment-1330127603 +1 on add a separate config as the threshold to trigger this optimization. since new logical and physical plans are added, should also consider how it affect existing rules.

[GitHub] [spark] srielau commented on a diff in pull request #38728: [SPARK-41204] [CONNECT] Migrate custom exceptions to use Spark exceptions

2022-11-28 Thread GitBox
srielau commented on code in PR #38728: URL: https://github.com/apache/spark/pull/38728#discussion_r1034330247 ## core/src/main/resources/error/error-classes.json: ## @@ -132,7 +132,97 @@ }, "INTERCEPTOR_RUNTIME_ERROR" : { "message" : [ - "Error i

[GitHub] [spark] grundprinzip commented on a diff in pull request #38806: [SPARK-41268][CONNECT][PYTHON] Refactor "Column" for API Compatibility

2022-11-28 Thread GitBox
grundprinzip commented on code in PR #38806: URL: https://github.com/apache/spark/pull/38806#discussion_r1034329584 ## python/pyspark/sql/connect/column.py: ## @@ -314,3 +323,70 @@ def to_plan(self, session: "SparkConnectClient") -> proto.Expression: def __str__(self) ->

[GitHub] [spark] xinglin opened a new pull request, #38832: SPARK-41313 Combine fixes for SPARK-3900 and SPARK-21138

2022-11-28 Thread GitBox
xinglin opened a new pull request, #38832: URL: https://github.com/apache/spark/pull/38832 ### What changes were proposed in this pull request? This PR combines fixes for SPARK-3900 and SPARK-21138. Spark-3900 introduced a fix for illegalStateException when creating fs obj

[GitHub] [spark] amaliujia commented on a diff in pull request #38793: [SPARK-41256][CONNECT] Implement DataFrame.withColumn(s)

2022-11-28 Thread GitBox
amaliujia commented on code in PR #38793: URL: https://github.com/apache/spark/pull/38793#discussion_r1034324598 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -242,6 +243,17 @@ class SparkConnectPlanner(session: SparkSe

[GitHub] [spark] amaliujia commented on a diff in pull request #38806: [SPARK-41268][CONNECT][PYTHON] Refactor "Column" for API Compatibility

2022-11-28 Thread GitBox
amaliujia commented on code in PR #38806: URL: https://github.com/apache/spark/pull/38806#discussion_r1034323760 ## python/pyspark/sql/connect/column.py: ## @@ -314,3 +323,70 @@ def to_plan(self, session: "SparkConnectClient") -> proto.Expression: def __str__(self) -> st

[GitHub] [spark] MaxGekk commented on pull request #38712: [WIP][SPARK-41271][SQL] Parameterized SQL queries

2022-11-28 Thread GitBox
MaxGekk commented on PR #38712: URL: https://github.com/apache/spark/pull/38712#issuecomment-1330110664 > Can you provide more information ... of other popular SQL dialects @xkrogen Named parameters are supported by: 1. Redshift: https://docs.aws.amazon.com/redshift/latest/mgmt/dat

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38806: [SPARK-41268][CONNECT][PYTHON] Refactor "Column" for API Compatibility

2022-11-28 Thread GitBox
HyukjinKwon commented on code in PR #38806: URL: https://github.com/apache/spark/pull/38806#discussion_r1034320960 ## python/pyspark/sql/connect/column.py: ## @@ -314,3 +323,70 @@ def to_plan(self, session: "SparkConnectClient") -> proto.Expression: def __str__(self) ->

[GitHub] [spark] amaliujia commented on a diff in pull request #38829: [SPARK-41310][CONNECT][PYTHON] Implement DataFrame.toDF

2022-11-28 Thread GitBox
amaliujia commented on code in PR #38829: URL: https://github.com/apache/spark/pull/38829#discussion_r1034320893 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1068,6 +1068,23 @@ def inputFiles(self) -> List[str]: query = self._plan.to_proto(self._session.client)

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38829: [SPARK-41310][CONNECT][PYTHON] Implement DataFrame.toDF

2022-11-28 Thread GitBox
zhengruifeng commented on code in PR #38829: URL: https://github.com/apache/spark/pull/38829#discussion_r1034318046 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1068,6 +1068,23 @@ def inputFiles(self) -> List[str]: query = self._plan.to_proto(self._session.client)

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38829: [SPARK-41310][CONNECT][PYTHON] Implement DataFrame.toDF

2022-11-28 Thread GitBox
zhengruifeng commented on code in PR #38829: URL: https://github.com/apache/spark/pull/38829#discussion_r1034318046 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1068,6 +1068,23 @@ def inputFiles(self) -> List[str]: query = self._plan.to_proto(self._session.client)

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38829: [SPARK-41310][CONNECT][PYTHON] Implement DataFrame.toDF

2022-11-28 Thread GitBox
zhengruifeng commented on code in PR #38829: URL: https://github.com/apache/spark/pull/38829#discussion_r1034317543 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1068,6 +1068,23 @@ def inputFiles(self) -> List[str]: query = self._plan.to_proto(self._session.client)

[GitHub] [spark] wineternity commented on a diff in pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-11-28 Thread GitBox
wineternity commented on code in PR #38702: URL: https://github.com/apache/spark/pull/38702#discussion_r1034309933 ## core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala: ## @@ -1849,6 +1849,68 @@ abstract class AppStatusListenerSuite extends SparkFunSuite

[GitHub] [spark] amaliujia commented on pull request #38831: [SPARK-41312][CONNECT][PYTHON] Implement DataFrame.withColumnRenamed

2022-11-28 Thread GitBox
amaliujia commented on PR #38831: URL: https://github.com/apache/spark/pull/38831#issuecomment-1330097696 @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] zhengruifeng commented on pull request #38819: [SPARK-41148][CONNECT][PYTHON] Implement `DataFrame.dropna` and `DataFrame.na.drop`

2022-11-28 Thread GitBox
zhengruifeng commented on PR #38819: URL: https://github.com/apache/spark/pull/38819#issuecomment-1330097738 merged into master, thanks all for the reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] amaliujia opened a new pull request, #38831: [SPARK-41312][CONNECT][PYTHON] Implement DataFrame.withColumnRenamed

2022-11-28 Thread GitBox
amaliujia opened a new pull request, #38831: URL: https://github.com/apache/spark/pull/38831 ### What changes were proposed in this pull request? Implement DataFrame.withColumnRenamed by reusing existing Connect proto `RenameColumnsNameByName `. ### Why are the changes

[GitHub] [spark] zhengruifeng closed pull request #38819: [SPARK-41148][CONNECT][PYTHON] Implement `DataFrame.dropna` and `DataFrame.na.drop`

2022-11-28 Thread GitBox
zhengruifeng closed pull request #38819: [SPARK-41148][CONNECT][PYTHON] Implement `DataFrame.dropna` and `DataFrame.na.drop` URL: https://github.com/apache/spark/pull/38819 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] zhengruifeng commented on pull request #38819: [SPARK-41148][CONNECT][PYTHON] Implement `DataFrame.dropna` and `DataFrame.na.drop`

2022-11-28 Thread GitBox
zhengruifeng commented on PR #38819: URL: https://github.com/apache/spark/pull/38819#issuecomment-1330096394 the last commit only change the number of `min_non_nulls` from 4 to 3, I'm going to merge this PR. -- This is an automated message from the Apache Git Service. To respond to the me

[GitHub] [spark] amaliujia commented on a diff in pull request #38829: [SPARK-41310][CONNECT][PYTHON] Implement DataFrame.toDF

2022-11-28 Thread GitBox
amaliujia commented on code in PR #38829: URL: https://github.com/apache/spark/pull/38829#discussion_r1034313340 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1068,6 +1068,23 @@ def inputFiles(self) -> List[str]: query = self._plan.to_proto(self._session.client)

[GitHub] [spark] wineternity commented on a diff in pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-11-28 Thread GitBox
wineternity commented on code in PR #38702: URL: https://github.com/apache/spark/pull/38702#discussion_r1034309933 ## core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala: ## @@ -1849,6 +1849,68 @@ abstract class AppStatusListenerSuite extends SparkFunSuite

[GitHub] [spark] amaliujia commented on a diff in pull request #38793: [SPARK-41256][CONNECT] Implement DataFrame.withColumn(s)

2022-11-28 Thread GitBox
amaliujia commented on code in PR #38793: URL: https://github.com/apache/spark/pull/38793#discussion_r1034311728 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -242,6 +243,17 @@ class SparkConnectPlanner(session: SparkSe

[GitHub] [spark] wineternity commented on a diff in pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-11-28 Thread GitBox
wineternity commented on code in PR #38702: URL: https://github.com/apache/spark/pull/38702#discussion_r1034309933 ## core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala: ## @@ -1849,6 +1849,68 @@ abstract class AppStatusListenerSuite extends SparkFunSuite

[GitHub] [spark] wineternity commented on a diff in pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-11-28 Thread GitBox
wineternity commented on code in PR #38702: URL: https://github.com/apache/spark/pull/38702#discussion_r1034309933 ## core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala: ## @@ -1849,6 +1849,68 @@ abstract class AppStatusListenerSuite extends SparkFunSuite

[GitHub] [spark] wineternity commented on a diff in pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-11-28 Thread GitBox
wineternity commented on code in PR #38702: URL: https://github.com/apache/spark/pull/38702#discussion_r1034309587 ## core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala: ## @@ -1849,6 +1849,68 @@ abstract class AppStatusListenerSuite extends SparkFunSuite

[GitHub] [spark] amaliujia commented on a diff in pull request #38793: [SPARK-41256][CONNECT] Implement DataFrame.withColumn(s)

2022-11-28 Thread GitBox
amaliujia commented on code in PR #38793: URL: https://github.com/apache/spark/pull/38793#discussion_r1034305923 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -242,6 +243,17 @@ class SparkConnectPlanner(session: SparkSe

[GitHub] [spark] LuciferYang commented on pull request #38830: [SPARK-41309][SQL] Reuse `INVALID_SCHEMA.NON_STRING_LITERAL` instead of `_LEGACY_ERROR_TEMP_1093`

2022-11-28 Thread GitBox
LuciferYang commented on PR #38830: URL: https://github.com/apache/spark/pull/38830#issuecomment-1330081500 Test first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [spark] ibuder commented on a diff in pull request #38782: [SPARK-38728][SQL] Test the error class: FAILED_RENAME_PATH

2022-11-28 Thread GitBox
ibuder commented on code in PR #38782: URL: https://github.com/apache/spark/pull/38782#discussion_r1034302547 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala: ## @@ -638,6 +638,37 @@ class QueryExecutionErrorsSuite sqlState = "0A000"

[GitHub] [spark] LuciferYang opened a new pull request, #38830: [SPARK-41309][SQL] Reuse `INVALID_SCHEMA.NON_STRING_LITERAL` instead of `_LEGACY_ERROR_TEMP_1093 `

2022-11-28 Thread GitBox
LuciferYang opened a new pull request, #38830: URL: https://github.com/apache/spark/pull/38830 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was thi

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38829: [SPARK-41310][CONNECT][PYTHON] Implement DataFrame.toDF

2022-11-28 Thread GitBox
zhengruifeng commented on code in PR #38829: URL: https://github.com/apache/spark/pull/38829#discussion_r1034298659 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1068,6 +1068,23 @@ def inputFiles(self) -> List[str]: query = self._plan.to_proto(self._session.client)

[GitHub] [spark] grundprinzip commented on a diff in pull request #38793: [SPARK-41256][CONNECT] Implement DataFrame.withColumn(s)

2022-11-28 Thread GitBox
grundprinzip commented on code in PR #38793: URL: https://github.com/apache/spark/pull/38793#discussion_r1034295175 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -242,6 +243,17 @@ class SparkConnectPlanner(session: Spar

[GitHub] [spark] amaliujia commented on a diff in pull request #38829: [SPARK-41310][CONNECT][PYTHON] Implement DataFrame.toDF

2022-11-28 Thread GitBox
amaliujia commented on code in PR #38829: URL: https://github.com/apache/spark/pull/38829#discussion_r1034294597 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1068,6 +1068,23 @@ def inputFiles(self) -> List[str]: query = self._plan.to_proto(self._session.client)

[GitHub] [spark] amaliujia commented on pull request #38829: [SPARK-41310][CONNECT][PYTHON] Implement DataFrame.toDF

2022-11-28 Thread GitBox
amaliujia commented on PR #38829: URL: https://github.com/apache/spark/pull/38829#issuecomment-1330070732 @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] amaliujia opened a new pull request, #38829: [SPARK-41310][CONNECT][PYTHON] Implement DataFrame.toDF

2022-11-28 Thread GitBox
amaliujia opened a new pull request, #38829: URL: https://github.com/apache/spark/pull/38829 ### What changes were proposed in this pull request? Implement DataFrame.toDF by reusing existing `RenameColumnsBySameLengthNames` proto. ### Why are the changes needed?

[GitHub] [spark] zhengruifeng commented on pull request #38800: [SPARK-41264][CONNECT][PYTHON] Make Literal support more datatypes

2022-11-28 Thread GitBox
zhengruifeng commented on PR #38800: URL: https://github.com/apache/spark/pull/38800#issuecomment-1330064553 > A big left work is testing coverage over literals. For example does NaN, +inf, -inf can pass through Connect proto and any test case? I also have tons of questions about value rang

[GitHub] [spark] SparksFyz commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2022-11-28 Thread GitBox
SparksFyz commented on PR #38171: URL: https://github.com/apache/spark/pull/38171#issuecomment-1330064443 https://user-images.githubusercontent.com/8748814/204439049-53f0bd4f-9ea0-4289-8268-d16aef5b4334.png";> Would you share the test sql pattern? I test some cases and haven't seen such i

[GitHub] [spark] SparksFyz commented on a diff in pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2022-11-28 Thread GitBox
SparksFyz commented on code in PR #38171: URL: https://github.com/apache/spark/pull/38171#discussion_r1034286023 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressionsJoni.scala: ## @@ -0,0 +1,471 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] [spark] amaliujia commented on a diff in pull request #38793: [SPARK-41256][CONNECT] Implement DataFrame.withColumn(s)

2022-11-28 Thread GitBox
amaliujia commented on code in PR #38793: URL: https://github.com/apache/spark/pull/38793#discussion_r1034277068 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -242,6 +243,17 @@ class SparkConnectPlanner(session: SparkSe

[GitHub] [spark] amaliujia commented on a diff in pull request #38806: [SPARK-41268][CONNECT][PYTHON] Refactor "Column" for API Compatibility

2022-11-28 Thread GitBox
amaliujia commented on code in PR #38806: URL: https://github.com/apache/spark/pull/38806#discussion_r1034278581 ## python/pyspark/sql/connect/column.py: ## @@ -314,3 +323,62 @@ def to_plan(self, session: "RemoteSparkSession") -> proto.Expression: def __str__(self) -> st

[GitHub] [spark] amaliujia commented on a diff in pull request #38793: [SPARK-41256][CONNECT] Implement DataFrame.withColumn(s)

2022-11-28 Thread GitBox
amaliujia commented on code in PR #38793: URL: https://github.com/apache/spark/pull/38793#discussion_r1034277068 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -242,6 +243,17 @@ class SparkConnectPlanner(session: SparkSe

[GitHub] [spark] amaliujia commented on a diff in pull request #38793: [SPARK-41256][CONNECT] Implement DataFrame.withColumn(s)

2022-11-28 Thread GitBox
amaliujia commented on code in PR #38793: URL: https://github.com/apache/spark/pull/38793#discussion_r1034274139 ## python/pyspark/sql/connect/dataframe.py: ## @@ -503,6 +503,63 @@ def _show_string( assert pdf is not None return pdf["show_string"][0] +def

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-28 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1034274241 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -643,10 +643,11 @@ private[spark] class ExecutorAllocationManager( // Should be 0

[GitHub] [spark] amaliujia commented on a diff in pull request #38793: [SPARK-41256][CONNECT] Implement DataFrame.withColumn(s)

2022-11-28 Thread GitBox
amaliujia commented on code in PR #38793: URL: https://github.com/apache/spark/pull/38793#discussion_r1034274139 ## python/pyspark/sql/connect/dataframe.py: ## @@ -503,6 +503,63 @@ def _show_string( assert pdf is not None return pdf["show_string"][0] +def

[GitHub] [spark] zhengruifeng commented on pull request #38827: [SPARK-41308][CONNECT][PYTHON] Improve DataFrame.count()

2022-11-28 Thread GitBox
zhengruifeng commented on PR #38827: URL: https://github.com/apache/spark/pull/38827#issuecomment-1330039092 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] zhengruifeng closed pull request #38827: [SPARK-41308][CONNECT][PYTHON] Improve DataFrame.count()

2022-11-28 Thread GitBox
zhengruifeng closed pull request #38827: [SPARK-41308][CONNECT][PYTHON] Improve DataFrame.count() URL: https://github.com/apache/spark/pull/38827 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on a diff in pull request #38793: [SPARK-41256][CONNECT] Implement DataFrame.withColumn(s)

2022-11-28 Thread GitBox
amaliujia commented on code in PR #38793: URL: https://github.com/apache/spark/pull/38793#discussion_r1034273829 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -506,6 +506,12 @@ class SparkConnectProtoSuite extends Pl

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-28 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1034273628 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -722,9 +723,8 @@ private[spark] class ExecutorAllocationManager( // because t

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-28 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1034271797 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -383,8 +383,8 @@ private[spark] class DAGScheduler( /** * Called by the TaskSetMan

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38800: [SPARK-41264][CONNECT][PYTHON] Make Literal support more datatypes

2022-11-28 Thread GitBox
zhengruifeng commented on code in PR #38800: URL: https://github.com/apache/spark/pull/38800#discussion_r1034266638 ## connector/connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -83,59 +80,39 @@ message Expression { // directly declare the type variation).

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38800: [SPARK-41264][CONNECT][PYTHON] Make Literal support more datatypes

2022-11-28 Thread GitBox
zhengruifeng commented on code in PR #38800: URL: https://github.com/apache/spark/pull/38800#discussion_r1034262370 ## python/pyspark/sql/tests/connect/test_connect_column_expressions.py: ## @@ -134,6 +138,37 @@ def test_list_to_literal(self): lit_list_plan = fun.lit([f

[GitHub] [spark] mridulm commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-28 Thread GitBox
mridulm commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1034252202 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -643,10 +643,11 @@ private[spark] class ExecutorAllocationManager( // Should be 0 wh

[GitHub] [spark] mridulm commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-28 Thread GitBox
mridulm commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1034252202 ## core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala: ## @@ -643,10 +643,11 @@ private[spark] class ExecutorAllocationManager( // Should be 0 wh

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38806: [SPARK-41268][CONNECT][PYTHON] Refactor "Column" for API Compatibility

2022-11-28 Thread GitBox
zhengruifeng commented on code in PR #38806: URL: https://github.com/apache/spark/pull/38806#discussion_r1034253147 ## python/pyspark/sql/connect/column.py: ## @@ -30,53 +30,34 @@ def _bin_op( name: str, doc: str = "binary function", reverse: bool = False -) -> Callable[

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38819: [SPARK-41148][CONNECT][PYTHON] Implement `DataFrame.dropna` and `DataFrame.na.drop`

2022-11-28 Thread GitBox
zhengruifeng commented on code in PR #38819: URL: https://github.com/apache/spark/pull/38819#discussion_r1034251475 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -440,6 +441,26 @@ message NAFill { repeated Expression.Literal values = 3; } + +/

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38819: [SPARK-41148][CONNECT][PYTHON] Implement `DataFrame.dropna` and `DataFrame.na.drop`

2022-11-28 Thread GitBox
zhengruifeng commented on code in PR #38819: URL: https://github.com/apache/spark/pull/38819#discussion_r1034250363 ## python/pyspark/sql/connect/dataframe.py: ## @@ -727,6 +727,77 @@ def fillna( session=self._session, ) +def dropna( +self, +

[GitHub] [spark] mridulm commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-11-28 Thread GitBox
mridulm commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1034238236 ## core/src/main/scala/org/apache/spark/executor/Executor.scala: ## @@ -791,6 +770,53 @@ private[spark] class Executor( } } +private def incrementShuf

[GitHub] [spark] zhengruifeng commented on pull request #38778: [SPARK-41227][CONNECT][PYTHON] Implement DataFrame cross join

2022-11-28 Thread GitBox
zhengruifeng commented on PR #38778: URL: https://github.com/apache/spark/pull/38778#issuecomment-1330002115 let's also add an e2e test in `test_connect_basic.py`, otherwise LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38778: [SPARK-41227][CONNECT][PYTHON] Implement DataFrame cross join

2022-11-28 Thread GitBox
zhengruifeng commented on code in PR #38778: URL: https://github.com/apache/spark/pull/38778#discussion_r1034246368 ## python/pyspark/sql/tests/connect/test_connect_plan_only.py: ## @@ -58,6 +58,13 @@ def test_join_condition(self): )._plan.to_proto(self.connect)

[GitHub] [spark] cloud-fan commented on a diff in pull request #38793: [SPARK-41256][CONNECT] Implement DataFrame.withColumn(s)

2022-11-28 Thread GitBox
cloud-fan commented on code in PR #38793: URL: https://github.com/apache/spark/pull/38793#discussion_r1034230127 ## python/pyspark/sql/connect/dataframe.py: ## @@ -503,6 +503,63 @@ def _show_string( assert pdf is not None return pdf["show_string"][0] +def

[GitHub] [spark] zhengruifeng commented on pull request #38827: [SPARK-41308][CONNECT][PYTHON] Improve DataFrame.count()

2022-11-28 Thread GitBox
zhengruifeng commented on PR #38827: URL: https://github.com/apache/spark/pull/38827#issuecomment-1329980321 lgtm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[GitHub] [spark] cloud-fan commented on a diff in pull request #38793: [SPARK-41256][CONNECT] Implement DataFrame.withColumn(s)

2022-11-28 Thread GitBox
cloud-fan commented on code in PR #38793: URL: https://github.com/apache/spark/pull/38793#discussion_r1034229626 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -506,6 +506,12 @@ class SparkConnectProtoSuite extends Pl

[GitHub] [spark] cloud-fan commented on a diff in pull request #38793: [SPARK-41256][CONNECT] Implement DataFrame.withColumn(s)

2022-11-28 Thread GitBox
cloud-fan commented on code in PR #38793: URL: https://github.com/apache/spark/pull/38793#discussion_r1034229052 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -242,6 +243,17 @@ class SparkConnectPlanner(session: SparkSe

[GitHub] [spark] toujours33 commented on pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-28 Thread GitBox
toujours33 commented on PR #38711: URL: https://github.com/apache/spark/pull/38711#issuecomment-1329974305 ping @mridulm Could you help take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] ulysses-you commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

2022-11-28 Thread GitBox
ulysses-you commented on code in PR #38739: URL: https://github.com/apache/spark/pull/38739#discussion_r1034221066 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala: ## @@ -276,9 +276,9 @@ class DecimalPrecisionSuite extends Analys

[GitHub] [spark] cloud-fan commented on a diff in pull request #38806: [SPARK-41268][CONNECT][PYTHON] Refactor "Column" for API Compatibility

2022-11-28 Thread GitBox
cloud-fan commented on code in PR #38806: URL: https://github.com/apache/spark/pull/38806#discussion_r1034218435 ## python/pyspark/sql/connect/column.py: ## @@ -314,3 +323,62 @@ def to_plan(self, session: "RemoteSparkSession") -> proto.Expression: def __str__(self) -> st

[GitHub] [spark] zhengruifeng commented on pull request #38818: [SPARK-41238][CONNECT][PYTHON][FOLLOWUP] Support `DayTimeIntervalType` in the client

2022-11-28 Thread GitBox
zhengruifeng commented on PR #38818: URL: https://github.com/apache/spark/pull/38818#issuecomment-1329966529 thanks for reivews, merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] zhengruifeng closed pull request #38818: [SPARK-41238][CONNECT][PYTHON][FOLLOWUP] Support `DayTimeIntervalType` in the client

2022-11-28 Thread GitBox
zhengruifeng closed pull request #38818: [SPARK-41238][CONNECT][PYTHON][FOLLOWUP] Support `DayTimeIntervalType` in the client URL: https://github.com/apache/spark/pull/38818 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38819: [SPARK-41148][CONNECT][PYTHON] Implement `DataFrame.dropna` and `DataFrame.na.drop`

2022-11-28 Thread GitBox
zhengruifeng commented on code in PR #38819: URL: https://github.com/apache/spark/pull/38819#discussion_r1034217228 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -472,6 +472,46 @@ def test_fill_na(self): self.spark.sql(query).na.fill({"a": True, "

[GitHub] [spark] Yikun commented on pull request #38789: [SPARK-41253][K8S][TESTS] Make Spark K8S volcano IT work in Github Action

2022-11-28 Thread GitBox
Yikun commented on PR #38789: URL: https://github.com/apache/spark/pull/38789#issuecomment-1329961854 > Although we already have some instances, is there a more general way than having this code pattern? Or maybe a configuration like `spark.kubernetes.volcano.maxConcurrencyJobNum`, d

[GitHub] [spark] LuciferYang commented on pull request #38754: [SPARK-41180][SQL] Reuse `INVALID_SCHEMA` instead of `_LEGACY_ERROR_TEMP_1227`

2022-11-28 Thread GitBox
LuciferYang commented on PR #38754: URL: https://github.com/apache/spark/pull/38754#issuecomment-1329958569 Thanks @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng commented on pull request #38824: [SPARK-41304][CONNECT][PYTHON][DOCS] Add missing docs for DataFrame API

2022-11-28 Thread GitBox
zhengruifeng commented on PR #38824: URL: https://github.com/apache/spark/pull/38824#issuecomment-1329957729 Late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [spark] mridulm commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-11-28 Thread GitBox
mridulm commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1034209167 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -722,6 +718,63 @@ final class ShuffleBlockFetcherIterator( } } + //

[GitHub] [spark] mridulm commented on pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-11-28 Thread GitBox
mridulm commented on PR #36165: URL: https://github.com/apache/spark/pull/36165#issuecomment-1329952299 Please don't force push the code, unless it is to resolve conflicts/etc which can be handled otherwise - it makes reviewing what changed much harder -- This is an automated message from

[GitHub] [spark] cloud-fan commented on pull request #38825: [SPARK-41306][CONNECT] Improve Connect Expression proto documentation

2022-11-28 Thread GitBox
cloud-fan commented on PR #38825: URL: https://github.com/apache/spark/pull/38825#issuecomment-1329940842 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

  1   2   3   >