[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2022-12-04 Thread GitBox
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1039264107 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableProvider.java: ## @@ -93,4 +93,18 @@ default Transform[] inferPartitioning(CaseInsensitiveS

[GitHub] [spark] cloud-fan commented on a diff in pull request #38823: [SPARK-41290][SQL] Support GENERATED ALWAYS AS expressions for columns in create/replace table statements

2022-12-04 Thread GitBox
cloud-fan commented on code in PR #38823: URL: https://github.com/apache/spark/pull/38823#discussion_r1039260520 ## core/src/main/resources/error/error-classes.json: ## @@ -1266,6 +1266,11 @@ "DISTRIBUTE BY clause." ] }, + "GENERATED_COLUMN_UNSUPP

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1039256946 ## sql/core/src/test/resources/sql-tests/inputs/array.sql: ## @@ -119,3 +119,21 @@ select get(array(1, 2, 3), 0); select get(array(1, 2, 3), 3); select get(array(

[GitHub] [spark] dongjoon-hyun opened a new pull request, #38912: [SPARK-41388][K8S] getReusablePVCs should ignore recently created PVCs in the previous batch

2022-12-04 Thread GitBox
dongjoon-hyun opened a new pull request, #38912: URL: https://github.com/apache/spark/pull/38912 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] zwangsheng commented on pull request #38202: [SPARK-40763][K8S] Should expose driver service name to config for user features

2022-12-04 Thread GitBox
zwangsheng commented on PR #38202: URL: https://github.com/apache/spark/pull/38202#issuecomment-1336876549 In order to maintain the `KubernetesConf.SparkConf` Read-Only, a lazy variable `driverServiceName` is added to `KubernetesDriverConf`, which is available to all FeatureSteps that can A

[GitHub] [spark] HeartSaVioR commented on pull request #38911: [SPARK-41387][SS] Add defensive assertions to Kafka data source for Trigger.AvailableNow

2022-12-04 Thread GitBox
HeartSaVioR commented on PR #38911: URL: https://github.com/apache/spark/pull/38911#issuecomment-1336865300 cc. @zsxwing @viirya @jerrypeng Please take a look. Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HeartSaVioR opened a new pull request, #38911: [SPARK-41387][SS] Add defensive assertions to Kafka data source for Trigger.AvailableNow

2022-12-04 Thread GitBox
HeartSaVioR opened a new pull request, #38911: URL: https://github.com/apache/spark/pull/38911 ### What changes were proposed in this pull request? This PR proposes to add defensive assertions to Kafka data source for Trigger.AvailableNow, so that the query will rather fail fast inste

[GitHub] [spark] wankunde commented on pull request #38682: [SPARK-41167][SQL] Improve multi like performance by creating a balanced expression tree predicate

2022-12-04 Thread GitBox
wankunde commented on PR #38682: URL: https://github.com/apache/spark/pull/38682#issuecomment-1336852190 Retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-04 Thread GitBox
HyukjinKwon commented on code in PR #38908: URL: https://github.com/apache/spark/pull/38908#discussion_r1039219748 ## python/pyspark/sql/connect/dataframe.py: ## @@ -137,7 +137,13 @@ def isEmpty(self) -> bool: return len(self.take(1)) == 0 def select(self, *cols:

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-04 Thread GitBox
HyukjinKwon commented on code in PR #38908: URL: https://github.com/apache/spark/pull/38908#discussion_r1039219061 ## python/pyspark/sql/connect/dataframe.py: ## @@ -137,7 +137,13 @@ def isEmpty(self) -> bool: return len(self.take(1)) == 0 def select(self, *cols:

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-04 Thread GitBox
dongjoon-hyun commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1039216766 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -273,7 +275,24 @@ abstract class InMemoryBaseTable( }

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1039216402 ## sql/core/src/test/resources/sql-tests/inputs/array.sql: ## @@ -119,3 +119,21 @@ select get(array(1, 2, 3), 0); select get(array(1, 2, 3), 3); select get(array(

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1039215759 ## sql/core/src/test/resources/sql-tests/inputs/array.sql: ## @@ -119,3 +119,21 @@ select get(array(1, 2, 3), 0); select get(array(1, 2, 3), 3); select get(array(

[GitHub] [spark] huaxingao commented on pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-04 Thread GitBox
huaxingao commented on PR #38904: URL: https://github.com/apache/spark/pull/38904#issuecomment-1336843298 also cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] sandeep-katta commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
sandeep-katta commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1336834908 @LuciferYang I added SQL tests you could you please review again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [spark] sandeep-katta commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
sandeep-katta commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1039205150 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala: ## @@ -2596,4 +2596,33 @@ class CollectionExpressionsSu

[GitHub] [spark] Ngone51 commented on a diff in pull request #38876: [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost

2022-12-04 Thread GitBox
Ngone51 commented on code in PR #38876: URL: https://github.com/apache/spark/pull/38876#discussion_r1039195161 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -583,7 +586,12 @@ class BlockManagerMasterEndpoint( val time = System.cur

[GitHub] [spark] cloud-fan commented on a diff in pull request #38795: [SPARK-41259][SQL] Spark-sql cli query results should correspond to schema

2022-12-04 Thread GitBox
cloud-fan commented on code in PR #38795: URL: https://github.com/apache/spark/pull/38795#discussion_r1039187199 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala: ## @@ -50,8 +52,21 @@ private[hive] class SparkSQLDriver(val cont

[GitHub] [spark] Yaohua628 commented on pull request #38910: [SPARK-41151][FOLLOW-UP][SQL][3.3] Keep built-in file _metadata fields nullable value consistent

2022-12-04 Thread GitBox
Yaohua628 commented on PR #38910: URL: https://github.com/apache/spark/pull/38910#issuecomment-1336816189 @cloud-fan Here's the 3.3 cherry-pick, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] Yaohua628 opened a new pull request, #38910: [SPARK-41151][FOLLOW-UP][SQL][3.3] Keep built-in file _metadata fields nullable value consistent

2022-12-04 Thread GitBox
Yaohua628 opened a new pull request, #38910: URL: https://github.com/apache/spark/pull/38910 ### What changes were proposed in this pull request? Cherry-pick https://github.com/apache/spark/pull/38777. Resolved conflicts in https://github.com/apache/spark/commit/ac2d027a768f50e27

[GitHub] [spark] dongjoon-hyun closed pull request #38909: [SPARK-41385][K8S] Replace deprecated `.newInstance()` in K8s module

2022-12-04 Thread GitBox
dongjoon-hyun closed pull request #38909: [SPARK-41385][K8S] Replace deprecated `.newInstance()` in K8s module URL: https://github.com/apache/spark/pull/38909 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun commented on pull request #38909: [SPARK-41385][K8S] Replace deprecated `.newInstance()` in K8s module

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #38909: URL: https://github.com/apache/spark/pull/38909#issuecomment-1336810296 All tests (except documentation generation) are finished. This PR is irrelevant to the doc generation. Merged to master/3.3. -- This is an automated message from the Apache Git

[GitHub] [spark] cloud-fan commented on a diff in pull request #38877: [SPARK-41361] [SQL] Invalid call toAttribute on unresolved object exception caused by WidenSetOperationTypes

2022-12-04 Thread GitBox
cloud-fan commented on code in PR #38877: URL: https://github.com/apache/spark/pull/38877#discussion_r1039178253 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ScriptTransformation.scala: ## @@ -32,7 +32,13 @@ case class ScriptTransformation( chi

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-04 Thread GitBox
huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1039177239 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -273,7 +275,24 @@ abstract class InMemoryBaseTable( } }

[GitHub] [spark] HeartSaVioR commented on pull request #38517: [SPARK-39591][SS] Async Progress Tracking

2022-12-04 Thread GitBox
HeartSaVioR commented on PR #38517: URL: https://github.com/apache/spark/pull/38517#issuecomment-1336801690 Also cc. @mridulm since he reviewed the design doc in details. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun commented on pull request #38909: [SPARK-41385][K8S] Replace deprecated `.newInstance()` in K8s module

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #38909: URL: https://github.com/apache/spark/pull/38909#issuecomment-1336800566 Thank you for review and approval, @Yikun . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-04 Thread GitBox
dongjoon-hyun commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1039174225 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -273,7 +275,24 @@ abstract class InMemoryBaseTable( }

[GitHub] [spark] dongjoon-hyun commented on pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #38904: URL: https://github.com/apache/spark/pull/38904#issuecomment-1336797980 Thank you for updates. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
LuciferYang commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1336792961 > > Please add some invalid input test cases @sandeep-katta and add some sql test to `src/test/resources/sql-tests/inputs/array.sql` > > Thanks @LuciferYang for the review, I wi

[GitHub] [spark] Yikf commented on a diff in pull request #38795: [SPARK-41259][SQL] Spark-sql cli query results should correspond to schema

2022-12-04 Thread GitBox
Yikf commented on code in PR #38795: URL: https://github.com/apache/spark/pull/38795#discussion_r1039170262 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala: ## @@ -50,8 +52,21 @@ private[hive] class SparkSQLDriver(val context:

[GitHub] [spark] Yikf commented on a diff in pull request #38795: [SPARK-41259][SQL] Spark-sql cli query results should correspond to schema

2022-12-04 Thread GitBox
Yikf commented on code in PR #38795: URL: https://github.com/apache/spark/pull/38795#discussion_r1039170262 ## sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala: ## @@ -50,8 +52,21 @@ private[hive] class SparkSQLDriver(val context:

[GitHub] [spark] zhengruifeng commented on pull request #38905: [SPARK-41380][CONNECT][PYTHON] Implement aggregation functions

2022-12-04 Thread GitBox
zhengruifeng commented on PR #38905: URL: https://github.com/apache/spark/pull/38905#issuecomment-1336789658 merged into master, thanks for reviews! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] sandeep-katta commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
sandeep-katta commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1336789676 > Please add some invalid input test cases @sandeep-katta and add some sql test to `src/test/resources/sql-tests/inputs/array.sql` Thanks @LuciferYang for the review, I will a

[GitHub] [spark] Yikf commented on pull request #38795: [SPARK-41259][SQL] Spark-sql cli query results should correspond to schema

2022-12-04 Thread GitBox
Yikf commented on PR #38795: URL: https://github.com/apache/spark/pull/38795#issuecomment-1336789289 > Would be easier to follow if you post before/after results in the PR description. Yea, updated the PR descripe, thanks -- This is an automated message from the Apache Git Service.

[GitHub] [spark] zhengruifeng closed pull request #38905: [SPARK-41380][CONNECT][PYTHON] Implement aggregation functions

2022-12-04 Thread GitBox
zhengruifeng closed pull request #38905: [SPARK-41380][CONNECT][PYTHON] Implement aggregation functions URL: https://github.com/apache/spark/pull/38905 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] HeartSaVioR commented on pull request #38517: [SPARK-39591][SS] Async Progress Tracking

2022-12-04 Thread GitBox
HeartSaVioR commented on PR #38517: URL: https://github.com/apache/spark/pull/38517#issuecomment-1336784998 cc. @zsxwing @viirya @xuanyuanking to seek a chance for getting help on reviewing. I'll look into the PR sooner as well. -- This is an automated message from the Apache Git Service.

[GitHub] [spark] dongjoon-hyun commented on pull request #38909: [SPARK-41385][K8S] Replace deprecated `.newInstance()` in K8s module

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #38909: URL: https://github.com/apache/spark/pull/38909#issuecomment-1336784356 Could you review this, @Yikun ? I missed this at SPARK-37145. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-04 Thread GitBox
huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1039161731 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala: ## @@ -30,6 +30,7 @@ import org.apache.spark.sql.connector.expressions.{Expression

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-04 Thread GitBox
huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1039161555 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala: ## @@ -2772,6 +2773,39 @@ class DataSourceV2SQLSuiteV1Filter } } + te

[GitHub] [spark] HeartSaVioR closed pull request #38906: [SPARK-41379][SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark

2022-12-04 Thread GitBox
HeartSaVioR closed pull request #38906: [SPARK-41379][SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark URL: https://github.com/apache/spark/pull/38906 -- This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [spark] HeartSaVioR commented on pull request #38906: [SPARK-41379][SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark

2022-12-04 Thread GitBox
HeartSaVioR commented on PR #38906: URL: https://github.com/apache/spark/pull/38906#issuecomment-1336777465 Thanks! Merging to master/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] amaliujia commented on a diff in pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-04 Thread GitBox
amaliujia commented on code in PR #38908: URL: https://github.com/apache/spark/pull/38908#discussion_r1039159115 ## python/pyspark/sql/connect/dataframe.py: ## @@ -137,7 +137,13 @@ def isEmpty(self) -> bool: return len(self.take(1)) == 0 def select(self, *cols: "

[GitHub] [spark] vinodkc commented on pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2022-12-04 Thread GitBox
vinodkc commented on PR #38419: URL: https://github.com/apache/spark/pull/38419#issuecomment-1336776081 @cloud-fan , Can you please review it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1039157936 ## core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala: ## @@ -55,7 +55,8 @@ case class SparkListenerTaskGettingResult(taskInfo: TaskInfo) extends S

[GitHub] [spark] dongjoon-hyun opened a new pull request, #38909: [SPARK-41385][K8S] Replace deprecated `.newInstance()` in K8s module

2022-12-04 Thread GitBox
dongjoon-hyun opened a new pull request, #38909: URL: https://github.com/apache/spark/pull/38909 ### What changes were proposed in this pull request? This PR aims to replace the deprecated `Class.newInstance` with `Class.getConstructor.newInstance`. ### Why are the changes need

[GitHub] [spark] amaliujia commented on a diff in pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-04 Thread GitBox
amaliujia commented on code in PR #38908: URL: https://github.com/apache/spark/pull/38908#discussion_r1039156011 ## python/pyspark/sql/connect/dataframe.py: ## @@ -137,7 +137,13 @@ def isEmpty(self) -> bool: return len(self.take(1)) == 0 def select(self, *cols: "

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
LuciferYang commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1336772413 > I personally think the output `[[1, null, 3], [null, 2, 3]]` is expected, let me confirm it. If this case is correct, I think Scala part is OK except fo lack of `invalid inp

[GitHub] [spark] grundprinzip commented on pull request #38905: [SPARK-41380][CONNECT][PYTHON] Implement aggregation functions

2022-12-04 Thread GitBox
grundprinzip commented on PR #38905: URL: https://github.com/apache/spark/pull/38905#issuecomment-1336772089 > shall we consider sharing code between pyspark and spark connect python client? Yes, as part of the packaging we have to merge this code back with the PySpark code. However,

[GitHub] [spark] cloud-fan commented on pull request #38905: [SPARK-41380][CONNECT][PYTHON] Implement aggregation functions

2022-12-04 Thread GitBox
cloud-fan commented on PR #38905: URL: https://github.com/apache/spark/pull/38905#issuecomment-1336771046 shall we consider sharing code between pyspark and spark connect python client? -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [spark] cloud-fan commented on a diff in pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-04 Thread GitBox
cloud-fan commented on code in PR #38908: URL: https://github.com/apache/spark/pull/38908#discussion_r1039153768 ## python/pyspark/sql/connect/dataframe.py: ## @@ -137,7 +137,13 @@ def isEmpty(self) -> bool: return len(self.take(1)) == 0 def select(self, *cols: "

[GitHub] [spark] cloud-fan commented on a diff in pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-04 Thread GitBox
cloud-fan commented on code in PR #38908: URL: https://github.com/apache/spark/pull/38908#discussion_r1039153585 ## python/pyspark/sql/connect/dataframe.py: ## @@ -137,7 +137,13 @@ def isEmpty(self) -> bool: return len(self.take(1)) == 0 def select(self, *cols: "

[GitHub] [spark] wangyum commented on pull request #38907: [SPARK-40379][K8S][FOLLOWUP] Fix scalastyle failure

2022-12-04 Thread GitBox
wangyum commented on PR #38907: URL: https://github.com/apache/spark/pull/38907#issuecomment-1336767774 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun commented on pull request #38907: [SPARK-40379][K8S][FOLLOWUP] Fix scalastyle failure

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #38907: URL: https://github.com/apache/spark/pull/38907#issuecomment-1336767713 Thank you so much, @wangyum ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] wangyum closed pull request #38907: [SPARK-40379][K8S][FOLLOWUP] Fix scalastyle failure

2022-12-04 Thread GitBox
wangyum closed pull request #38907: [SPARK-40379][K8S][FOLLOWUP] Fix scalastyle failure URL: https://github.com/apache/spark/pull/38907 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
LuciferYang commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1336766817 Please add some invalid input test cases @sandeep-katta and add some sql test to `src/test/resources/sql-tests/inputs/array.sql` -- This is an automated message from the Apache

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1039150323 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala: ## @@ -2596,4 +2596,33 @@ class CollectionExpressionsSuit

[GitHub] [spark] mridulm commented on pull request #38882: [SPARK-41365][UI] Stages UI page fails to load for proxy in specific yarn environment

2022-12-04 Thread GitBox
mridulm commented on PR #38882: URL: https://github.com/apache/spark/pull/38882#issuecomment-1336760340 +CC @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [spark] amaliujia commented on pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-04 Thread GitBox
amaliujia commented on PR #38908: URL: https://github.com/apache/spark/pull/38908#issuecomment-1336759697 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [spark] LuciferYang commented on pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
LuciferYang commented on PR #38865: URL: https://github.com/apache/spark/pull/38865#issuecomment-1336759147 @infoankitp Would you mind adding some sql related tests to `sql-tests/inputs/array.sql`? -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] amaliujia commented on pull request #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-04 Thread GitBox
amaliujia commented on PR #38908: URL: https://github.com/apache/spark/pull/38908#issuecomment-1336759090 cc @HyukjinKwon @zhengruifeng @grundprinzip @xinrong-meng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] mridulm commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-12-04 Thread GitBox
mridulm commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1039144282 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] amaliujia opened a new pull request, #38908: [SPARK-41384][CONNECT] Should use SQLExpression for str arguments in Projection

2022-12-04 Thread GitBox
amaliujia opened a new pull request, #38908: URL: https://github.com/apache/spark/pull/38908 ### What changes were proposed in this pull request? We can depending on the server side SQL parse to parse the strings in projection so that clients side do not need to reason about w

[GitHub] [spark] mridulm commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-12-04 Thread GitBox
mridulm commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1039144282 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] mridulm commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-12-04 Thread GitBox
mridulm commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1039144282 ## core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] LuciferYang commented on pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
LuciferYang commented on PR #38865: URL: https://github.com/apache/spark/pull/38865#issuecomment-1336752983 @infoankitp ``` 2022-12-03T13:00:36.5812875Z [info] ExpressionsSchemaSuite: 2022-12-03T13:00:37.5439485Z [info] 

[GitHub] [spark] dongjoon-hyun commented on pull request #38907: [SPARK-40379][K8S][FOLLOWUP] Fix scalastyle failure

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #38907: URL: https://github.com/apache/spark/pull/38907#issuecomment-1336752127 Hi, @cloud-fan . Could you review this PR to recover the master branch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1039143150 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,118 @@ case class ArrayExcept(left: Express

[GitHub] [spark] mridulm commented on pull request #38901: [SPARK-41376][CORE] Correct the Netty preferDirectBufs check logic on executor start

2022-12-04 Thread GitBox
mridulm commented on PR #38901: URL: https://github.com/apache/spark/pull/38901#issuecomment-1336751517 +CC @Ngone51 (who last updated this) and @cloud-fan (who merged the commit). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1039142575 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,69 @@ case class ArrayExcept(left: Expressi

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-04 Thread GitBox
dongjoon-hyun commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1039141790 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala: ## @@ -30,6 +30,7 @@ import org.apache.spark.sql.connector.expressions.{Expres

[GitHub] [spark] HeartSaVioR commented on pull request #38906: [SPARK-41379][SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark

2022-12-04 Thread GitBox
HeartSaVioR commented on PR #38906: URL: https://github.com/apache/spark/pull/38906#issuecomment-1336749793 I just added one example (which is actually same with test case) to show an example of the problem. -- This is an automated message from the Apache Git Service. To respond to the me

[GitHub] [spark] LuciferYang commented on a diff in pull request #38873: [SPARK-41358][SQL] Refactor `ColumnVectorUtils#populate` method to use `PhysicalDataType` instead of `DataType`

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38873: URL: https://github.com/apache/spark/pull/38873#discussion_r1039140895 ## sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java: ## @@ -125,32 +125,45 @@ public static Map toJavaIntMap(ColumnarMap map

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-04 Thread GitBox
dongjoon-hyun commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1039140899 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala: ## @@ -2772,6 +2773,39 @@ class DataSourceV2SQLSuiteV1Filter } } +

[GitHub] [spark] cloud-fan closed pull request #38873: [SPARK-41358][SQL] Refactor `ColumnVectorUtils#populate` method to use `PhysicalDataType` instead of `DataType`

2022-12-04 Thread GitBox
cloud-fan closed pull request #38873: [SPARK-41358][SQL] Refactor `ColumnVectorUtils#populate` method to use `PhysicalDataType` instead of `DataType` URL: https://github.com/apache/spark/pull/38873 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [spark] cloud-fan commented on pull request #38873: [SPARK-41358][SQL] Refactor `ColumnVectorUtils#populate` method to use `PhysicalDataType` instead of `DataType`

2022-12-04 Thread GitBox
cloud-fan commented on PR #38873: URL: https://github.com/apache/spark/pull/38873#issuecomment-1336749026 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1039140220 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala: ## @@ -5237,6 +5237,59 @@ class DataFrameFunctionsSuite extends QueryTest with Share

[GitHub] [spark] mridulm commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-04 Thread GitBox
mridulm commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1039139958 ## core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala: ## @@ -55,7 +55,8 @@ case class SparkListenerTaskGettingResult(taskInfo: TaskInfo) extends Spark

[GitHub] [spark] mridulm commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-12-04 Thread GitBox
mridulm commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1039139958 ## core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala: ## @@ -55,7 +55,8 @@ case class SparkListenerTaskGettingResult(taskInfo: TaskInfo) extends Spark

[GitHub] [spark] cloud-fan commented on pull request #38777: [SPARK-41151][FOLLOW-UP][SQL] Keep built-in file _metadata fields nullable value consistent

2022-12-04 Thread GitBox
cloud-fan commented on PR #38777: URL: https://github.com/apache/spark/pull/38777#issuecomment-1336748392 oh it conflicts with 3.3, @Yaohua628 can you open a backport PR? thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] cloud-fan closed pull request #38777: [SPARK-41151][FOLLOW-UP][SQL] Keep built-in file _metadata fields nullable value consistent

2022-12-04 Thread GitBox
cloud-fan closed pull request #38777: [SPARK-41151][FOLLOW-UP][SQL] Keep built-in file _metadata fields nullable value consistent URL: https://github.com/apache/spark/pull/38777 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on pull request #38777: [SPARK-41151][FOLLOW-UP][SQL] Keep built-in file _metadata fields nullable value consistent

2022-12-04 Thread GitBox
cloud-fan commented on PR #38777: URL: https://github.com/apache/spark/pull/38777#issuecomment-1336747654 thanks, merging to master/3.3! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] grundprinzip commented on a diff in pull request #38905: [SPARK-41380][CONNECT][PYTHON] Implement aggregation functions

2022-12-04 Thread GitBox
grundprinzip commented on code in PR #38905: URL: https://github.com/apache/spark/pull/38905#discussion_r1039138661 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -356,6 +356,96 @@ def test_math_functions(self): sdf.select(SF.shiftrightunsigned(

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1039137824 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala: ## @@ -5237,6 +5237,59 @@ class DataFrameFunctionsSuite extends QueryTest with Share

[GitHub] [spark] cloud-fan commented on pull request #38862: [SPARK-41350][SQL] Allow simple name access of join hidden columns after subquery alias

2022-12-04 Thread GitBox
cloud-fan commented on PR #38862: URL: https://github.com/apache/spark/pull/38862#issuecomment-1336744721 @HyukjinKwon this is a bug fix and needs to go to 3.3 as well, can you help to backport via local git operation? -- This is an automated message from the Apache Git Service. To respon

[GitHub] [spark] HeartSaVioR commented on pull request #38906: [SPARK-41379][SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark

2022-12-04 Thread GitBox
HeartSaVioR commented on PR #38906: URL: https://github.com/apache/spark/pull/38906#issuecomment-1336743969 Yes, as long as they use `DF.sparkSession`. Although this is still not 100% covering the case as there is no way to prevent end users to access sparkSession outside of user function (

[GitHub] [spark] dongjoon-hyun commented on pull request #38907: [SPARK-40379][K8S][FOLLOWUP] Fix scalastyle failure

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #38907: URL: https://github.com/apache/spark/pull/38907#issuecomment-1336742706 Could you review this, @HyukjinKwon ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] mridulm commented on a diff in pull request #38876: [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost

2022-12-04 Thread GitBox
mridulm commented on code in PR #38876: URL: https://github.com/apache/spark/pull/38876#discussion_r1039131641 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -583,7 +586,12 @@ class BlockManagerMasterEndpoint( val time = System.cur

[GitHub] [spark] zwangsheng commented on a diff in pull request #38202: [SPARK-40763][K8S] Should expose driver service name to config for user features

2022-12-04 Thread GitBox
zwangsheng commented on code in PR #38202: URL: https://github.com/apache/spark/pull/38202#discussion_r1039131007 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala: ## @@ -50,6 +50,8 @@ private[spark] class Dr

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1039127876 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala: ## @@ -2596,4 +2596,82 @@ class CollectionExpressionsSuit

[GitHub] [spark] dongjoon-hyun commented on pull request #37821: [SPARK-40379][K8S] Propagate decommission executor loss reason in K8s

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #37821: URL: https://github.com/apache/spark/pull/37821#issuecomment-1336725688 My bad. I created a follow-up to make it sure. - https://github.com/apache/spark/pull/38907 -- This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [spark] dongjoon-hyun opened a new pull request, #38907: [SPARK-40379][K8S][FOLLOWUP] Fix scalastyle failure

2022-12-04 Thread GitBox
dongjoon-hyun opened a new pull request, #38907: URL: https://github.com/apache/spark/pull/38907 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

[GitHub] [spark] dongjoon-hyun commented on pull request #37821: [SPARK-40379][K8S] Propagate decommission executor loss reason in K8s

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #37821: URL: https://github.com/apache/spark/pull/37821#issuecomment-1336723918 Oh, the fixed indentation causes scalastyle failure. :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38905: [SPARK-41380][CONNECT][PYTHON] Implement aggregation functions

2022-12-04 Thread GitBox
zhengruifeng commented on code in PR #38905: URL: https://github.com/apache/spark/pull/38905#discussion_r1039124841 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -356,6 +356,96 @@ def test_math_functions(self): sdf.select(SF.shiftrightunsigned(

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-04 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1039114830 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,118 @@ case class ArrayExcept(left: Express

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38905: [SPARK-41380][CONNECT][PYTHON] Implement aggregation functions

2022-12-04 Thread GitBox
zhengruifeng commented on code in PR #38905: URL: https://github.com/apache/spark/pull/38905#discussion_r1039124841 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -356,6 +356,96 @@ def test_math_functions(self): sdf.select(SF.shiftrightunsigned(

[GitHub] [spark] ulysses-you commented on a diff in pull request #38875: [SPARK-40988][SQL][TEST] Test case for insert partition should verify value

2022-12-04 Thread GitBox
ulysses-you commented on code in PR #38875: URL: https://github.com/apache/spark/pull/38875#discussion_r1039123347 ## sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala: ## @@ -2313,6 +2313,33 @@ class InsertSuite extends DataSourceTest with SharedSparkSessi

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38202: [SPARK-40763][K8S] Should expose driver service name to config for user features

2022-12-04 Thread GitBox
dongjoon-hyun commented on code in PR #38202: URL: https://github.com/apache/spark/pull/38202#discussion_r1039122836 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverServiceFeatureStep.scala: ## @@ -50,6 +50,8 @@ private[spark] class

[GitHub] [spark] zhengruifeng commented on pull request #38905: [SPARK-41380][CONNECT][PYTHON] Implement aggregation functions

2022-12-04 Thread GitBox
zhengruifeng commented on PR #38905: URL: https://github.com/apache/spark/pull/38905#issuecomment-1336711429 also cc @cloud-fan @grundprinzip @xinrong-meng @amaliujia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] dongjoon-hyun closed pull request #37821: [SPARK-40379][K8S] Propagate decommission executor loss reason in K8s

2022-12-04 Thread GitBox
dongjoon-hyun closed pull request #37821: [SPARK-40379][K8S] Propagate decommission executor loss reason in K8s URL: https://github.com/apache/spark/pull/37821 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun commented on pull request #37821: [SPARK-40379][K8S] Propagate decommission executor loss reason in K8s

2022-12-04 Thread GitBox
dongjoon-hyun commented on PR #37821: URL: https://github.com/apache/spark/pull/37821#issuecomment-1336706941 Let me fix the indentation and merge this, @holdenk . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

  1   2   >