[GitHub] [spark] HyukjinKwon closed pull request #35064: [SPARK-37783][SS][SQL][CORE] Enable tail-recursion wherever possible

2021-12-29 Thread GitBox
HyukjinKwon closed pull request #35064: URL: https://github.com/apache/spark/pull/35064 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-un

[GitHub] [spark] HyukjinKwon commented on pull request #35064: [SPARK-37783][SS][SQL][CORE] Enable tail-recursion wherever possible

2021-12-29 Thread GitBox
HyukjinKwon commented on pull request #35064: URL: https://github.com/apache/spark/pull/35064#issuecomment-1002910891 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] c21 commented on pull request #34999: [SPARK-37726][SQL] Add spill size metrics for sort merge join

2021-12-29 Thread GitBox
c21 commented on pull request #34999: URL: https://github.com/apache/spark/pull/34999#issuecomment-1002899182 Thank you @cloud-fan for review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dchvn commented on a change in pull request #35007: [SPARK-37478][SQL][TESTS][FOLLOWUP] Unify v1 and v2 DROP NAMESPACE error

2021-12-29 Thread GitBox
dchvn commented on a change in pull request #35007: URL: https://github.com/apache/spark/pull/35007#discussion_r776591166 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ## @@ -196,7 +196,15 @@ private[spark] class HiveExternalCatalog(

[GitHub] [spark] dcoliversun commented on pull request #34983: [SPARK-37713][K8S] assign namespace to executor configmap

2021-12-29 Thread GitBox
dcoliversun commented on pull request #34983: URL: https://github.com/apache/spark/pull/34983#issuecomment-1002894808 Sorry, CI has some error. I'm fixing it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] Yaohua628 commented on pull request #35068: [SPARK-37770][SQL][FOLLOWUP] Implement putByteArrays for WritableColumnVector

2021-12-29 Thread GitBox
Yaohua628 commented on pull request #35068: URL: https://github.com/apache/spark/pull/35068#issuecomment-1002893613 Hi @cloud-fan, a small performance improvement PR adding a new method `putByteArrays` in `WritableColumnVector` - avoid copying the same byte array for every row. please take

[GitHub] [spark] Yaohua628 opened a new pull request #35068: [SPARK-37770][SQL][FOLLOWUP] Implement putByteArrays for WritableColumnVector

2021-12-29 Thread GitBox
Yaohua628 opened a new pull request #35068: URL: https://github.com/apache/spark/pull/35068 ### What changes were proposed in this pull request? Implement a new method in `WritableColumnVector` named `putByteArrays`, which avoids copying the same byte array over and over again for all ro

[GitHub] [spark] dcoliversun commented on a change in pull request #34983: [SPARK-37713][K8S] assign namespace to executor configmap

2021-12-29 Thread GitBox
dcoliversun commented on a change in pull request #34983: URL: https://github.com/apache/spark/pull/34983#discussion_r776587503 ## File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtilsSuite.scala ## @@ -82,8 +81,9

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #34983: [SPARK-37713][K8S] assign namespace to executor configmap

2021-12-29 Thread GitBox
dongjoon-hyun commented on a change in pull request #34983: URL: https://github.com/apache/spark/pull/34983#discussion_r776586751 ## File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtilsSuite.scala ## @@ -82,8 +81,

[GitHub] [spark] dcoliversun commented on a change in pull request #34983: [SPARK-37713][K8S] assign namespace to executor configmap

2021-12-29 Thread GitBox
dcoliversun commented on a change in pull request #34983: URL: https://github.com/apache/spark/pull/34983#discussion_r776582962 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala ## @@ -87,11 +87,15 @@

[GitHub] [spark] dcoliversun commented on a change in pull request #34983: [SPARK-37713][K8S] assign namespace to executor configmap

2021-12-29 Thread GitBox
dcoliversun commented on a change in pull request #34983: URL: https://github.com/apache/spark/pull/34983#discussion_r776580634 ## File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtilsSuite.scala ## @@ -76,4 +80,26

[GitHub] [spark] beliefer commented on pull request #31847: [SPARK-34755][SQL] Support the utils for transform number format

2021-12-29 Thread GitBox
beliefer commented on pull request #31847: URL: https://github.com/apache/spark/pull/31847#issuecomment-1002886939 @cloud-fan Thanks for the review. I will update these change into https://github.com/apache/spark/pull/35060 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] linhongliu-db commented on pull request #35061: [SPARK-37369][SQL][FOLLOWUP] Override supportsRowBased in UnionExec

2021-12-29 Thread GitBox
linhongliu-db commented on pull request #35061: URL: https://github.com/apache/spark/pull/35061#issuecomment-1002885396 cc @cloud-fan @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #34983: [SPARK-37713][K8S] assign namespace to executor configmap

2021-12-29 Thread GitBox
dongjoon-hyun commented on a change in pull request #34983: URL: https://github.com/apache/spark/pull/34983#discussion_r776576171 ## File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtilsSuite.scala ## @@ -76,4 +80,

[GitHub] [spark] HyukjinKwon closed pull request #35063: [SPARK-37657][FOLLOWUP][PYTHON] Separate the tests for pandas < 1.1.0

2021-12-29 Thread GitBox
HyukjinKwon closed pull request #35063: URL: https://github.com/apache/spark/pull/35063 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-un

[GitHub] [spark] HyukjinKwon commented on pull request #35063: [SPARK-37657][FOLLOWUP][PYTHON] Separate the tests for pandas < 1.1.0

2021-12-29 Thread GitBox
HyukjinKwon commented on pull request #35063: URL: https://github.com/apache/spark/pull/35063#issuecomment-1002875914 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] cloud-fan commented on a change in pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #34785: URL: https://github.com/apache/spark/pull/34785#discussion_r776570281 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DistributionAndOrderingUtils.scala ## @@ -36,16 +36,27 @@ object Di

[GitHub] [spark] cloud-fan commented on a change in pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #34785: URL: https://github.com/apache/spark/pull/34785#discussion_r776570070 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DistributionAndOrderingUtils.scala ## @@ -36,16 +36,27 @@ object Di

[GitHub] [spark] cloud-fan commented on a change in pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #34785: URL: https://github.com/apache/spark/pull/34785#discussion_r776569698 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DistributionAndOrderingUtils.scala ## @@ -36,16 +36,27 @@ object Di

[GitHub] [spark] cloud-fan commented on a change in pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #34785: URL: https://github.com/apache/spark/pull/34785#discussion_r761718218 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DistributionAndOrderingUtils.scala ## @@ -37,15 +37,16 @@ object Di

[GitHub] [spark] cloud-fan commented on a change in pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #34785: URL: https://github.com/apache/spark/pull/34785#discussion_r776569528 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ## @@ -1459,15 +1471,10 @@ object Rep

[GitHub] [spark] cloud-fan commented on a change in pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #34785: URL: https://github.com/apache/spark/pull/34785#discussion_r776569425 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ## @@ -1395,6 +1395,33 @@ case class

[GitHub] [spark] cloud-fan commented on a change in pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #34785: URL: https://github.com/apache/spark/pull/34785#discussion_r776569326 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ## @@ -1395,6 +1395,33 @@ case class

[GitHub] [spark] cloud-fan commented on a change in pull request #35055: [SPARK-37769][SQL][FOLLOWUP] Filtering files if metadata columns are present in the data filter

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #35055: URL: https://github.com/apache/spark/pull/35055#discussion_r776569081 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala ## @@ -71,8 +73,34 @@ abstract class

[GitHub] [spark] Yikun commented on a change in pull request #35015: [SPARK-37735][K8S] Add appId interface to KubernetesConf

2021-12-29 Thread GitBox
Yikun commented on a change in pull request #35015: URL: https://github.com/apache/spark/pull/35015#discussion_r776568818 ## File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala ## @@ -218,6 +218,14 @@ class Kubernet

[GitHub] [spark] Yikun commented on a change in pull request #35015: [SPARK-37735][K8S] Add appId interface to KubernetesConf

2021-12-29 Thread GitBox
Yikun commented on a change in pull request #35015: URL: https://github.com/apache/spark/pull/35015#discussion_r776566539 ## File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala ## @@ -218,6 +218,14 @@ class Kubernet

[GitHub] [spark] Yaohua628 commented on a change in pull request #35055: [SPARK-37769][SQL][FOLLOWUP] Filtering files if metadata columns are present in the data filter

2021-12-29 Thread GitBox
Yaohua628 commented on a change in pull request #35055: URL: https://github.com/apache/spark/pull/35055#discussion_r776566204 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala ## @@ -71,8 +73,34 @@ abstract class

[GitHub] [spark] cloud-fan commented on a change in pull request #35052: [SPARK-37644][SQL][FOLLOWUP] When partition column is same as group by key, pushing down aggregate completely.

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #35052: URL: https://github.com/apache/spark/pull/35052#discussion_r776565729 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownAggregates.java ## @@ -49,9 +51,10 @@ * Whether the da

[GitHub] [spark] cloud-fan commented on a change in pull request #31847: [SPARK-34755][SQL] Support the utils for transform number format

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #31847: URL: https://github.com/apache/spark/pull/31847#discussion_r776565052 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberUtils.scala ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Soft

[GitHub] [spark] cloud-fan commented on a change in pull request #31847: [SPARK-34755][SQL] Support the utils for transform number format

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #31847: URL: https://github.com/apache/spark/pull/31847#discussion_r776564976 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberUtils.scala ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Soft

[GitHub] [spark] cloud-fan commented on a change in pull request #31847: [SPARK-34755][SQL] Support the utils for transform number format

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #31847: URL: https://github.com/apache/spark/pull/31847#discussion_r776564748 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberUtils.scala ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Soft

[GitHub] [spark] cloud-fan commented on a change in pull request #31847: [SPARK-34755][SQL] Support the utils for transform number format

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #31847: URL: https://github.com/apache/spark/pull/31847#discussion_r776564599 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberUtils.scala ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Soft

[GitHub] [spark] AngersZhuuuu commented on pull request #35059: [SPARK-37780][SQL] QueryExecutionListener support use SQLConf.get to get corresponding SessionState's SQLConf

2021-12-29 Thread GitBox
AngersZh commented on pull request #35059: URL: https://github.com/apache/spark/pull/35059#issuecomment-1002869370 > LGTM, can you update the PR title and description? Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] huaxingao commented on pull request #35052: [SPARK-37644][SQL][FOLLOWUP] When partition column is same as group by key, pushing down aggregate completely.

2021-12-29 Thread GitBox
huaxingao commented on pull request #35052: URL: https://github.com/apache/spark/pull/35052#issuecomment-1002869211 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [spark] cloud-fan commented on a change in pull request #31847: [SPARK-34755][SQL] Support the utils for transform number format

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #31847: URL: https://github.com/apache/spark/pull/31847#discussion_r776564338 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberUtils.scala ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Soft

[GitHub] [spark] cloud-fan commented on a change in pull request #31847: [SPARK-34755][SQL] Support the utils for transform number format

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #31847: URL: https://github.com/apache/spark/pull/31847#discussion_r776564183 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberUtils.scala ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Soft

[GitHub] [spark] cloud-fan commented on a change in pull request #31847: [SPARK-34755][SQL] Support the utils for transform number format

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #31847: URL: https://github.com/apache/spark/pull/31847#discussion_r776563893 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberUtils.scala ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Soft

[GitHub] [spark] cloud-fan commented on a change in pull request #31847: [SPARK-34755][SQL] Support the utils for transform number format

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #31847: URL: https://github.com/apache/spark/pull/31847#discussion_r776563713 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberUtils.scala ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Soft

[GitHub] [spark] cloud-fan commented on a change in pull request #31847: [SPARK-34755][SQL] Support the utils for transform number format

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #31847: URL: https://github.com/apache/spark/pull/31847#discussion_r776563258 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberUtils.scala ## @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Soft

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35015: [SPARK-37735][K8S] Add appId interface to KubernetesConf

2021-12-29 Thread GitBox
dongjoon-hyun commented on a change in pull request #35015: URL: https://github.com/apache/spark/pull/35015#discussion_r776563101 ## File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala ## @@ -218,6 +218,14 @@ class

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35015: [SPARK-37735][K8S] Add appId interface to KubernetesConf

2021-12-29 Thread GitBox
dongjoon-hyun commented on a change in pull request #35015: URL: https://github.com/apache/spark/pull/35015#discussion_r776563051 ## File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala ## @@ -218,6 +218,14 @@ class

[GitHub] [spark] williamhyun commented on a change in pull request #35015: [SPARK-37735][K8S] Add appId interface to KubernetesConf

2021-12-29 Thread GitBox
williamhyun commented on a change in pull request #35015: URL: https://github.com/apache/spark/pull/35015#discussion_r776562906 ## File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala ## @@ -218,6 +218,14 @@ class Ku

[GitHub] [spark] cloud-fan closed pull request #34999: [SPARK-37726][SQL] Add spill size metrics for sort merge join

2021-12-29 Thread GitBox
cloud-fan closed pull request #34999: URL: https://github.com/apache/spark/pull/34999 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsu

[GitHub] [spark] cloud-fan commented on pull request #34999: [SPARK-37726][SQL] Add spill size metrics for sort merge join

2021-12-29 Thread GitBox
cloud-fan commented on pull request #34999: URL: https://github.com/apache/spark/pull/34999#issuecomment-1002864777 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] huaxingao commented on a change in pull request #35052: [SPARK-37644][SQL][FOLLOWUP] When partition column is same as group by key, pushing down aggregate completely.

2021-12-29 Thread GitBox
huaxingao commented on a change in pull request #35052: URL: https://github.com/apache/spark/pull/35052#discussion_r776561992 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownAggregates.java ## @@ -49,9 +51,10 @@ * Whether the da

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #35059: [SPARK-37780][SQL] QueryExecutionListener support SQLConf as constructor

2021-12-29 Thread GitBox
AngersZh commented on a change in pull request #35059: URL: https://github.com/apache/spark/pull/35059#discussion_r776560488 ## File path: sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala ## @@ -74,15 +75,19 @@ trait QueryExecutionListener {

[GitHub] [spark] dchvn commented on pull request #35067: [SPARK-37423][PYTHON] Inline type hints for fpm.py in python/pyspark/mllib

2021-12-29 Thread GitBox
dchvn commented on pull request #35067: URL: https://github.com/apache/spark/pull/35067#issuecomment-1002861576 CC @zero323 @HyukjinKwon @ueshin, please take a look if you have time. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [spark] dchvn opened a new pull request #35067: [SPARK-37423][PYTHON] Inline type hints for fpm.py in python/pyspark/mllib

2021-12-29 Thread GitBox
dchvn opened a new pull request #35067: URL: https://github.com/apache/spark/pull/35067 ### What changes were proposed in this pull request? Inline type hints for fpm.py, test.py in python/pyspark/mllib/ ### Why are the changes needed? We can take advantage of static type ch

[GitHub] [spark] HyukjinKwon commented on pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization

2021-12-29 Thread GitBox
HyukjinKwon commented on pull request #35058: URL: https://github.com/apache/spark/pull/35058#issuecomment-1002859683 Merged to master, branch-3.2, branch-3.1 and branch-3.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [spark] HyukjinKwon closed pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization

2021-12-29 Thread GitBox
HyukjinKwon closed pull request #35058: URL: https://github.com/apache/spark/pull/35058 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-un

[GitHub] [spark] cloud-fan commented on a change in pull request #35055: [SPARK-37769][SQL][FOLLOWUP] Filtering files if metadata columns are present in the data filter

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #35055: URL: https://github.com/apache/spark/pull/35055#discussion_r776557948 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala ## @@ -71,8 +73,34 @@ abstract class

[GitHub] [spark] HyukjinKwon commented on pull request #35058: [SPARK-37779][SQL] Make ColumnarToRowExec plan canonicalizable after (de)serialization

2021-12-29 Thread GitBox
HyukjinKwon commented on pull request #35058: URL: https://github.com/apache/spark/pull/35058#issuecomment-1002859018 Thanks! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on a change in pull request #35059: [SPARK-37780][SQL] QueryExecutionListener support SQLConf as constructor

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #35059: URL: https://github.com/apache/spark/pull/35059#discussion_r776557616 ## File path: sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala ## @@ -74,15 +75,19 @@ trait QueryExecutionListener { //

[GitHub] [spark] JoshRosen commented on pull request #35066: [SPARK-37784][SQL] Correctly handle UDTs in CodeGenerator.addBufferedState()

2021-12-29 Thread GitBox
JoshRosen commented on pull request #35066: URL: https://github.com/apache/spark/pull/35066#issuecomment-1002856449 Please let me know if you have suggestions for good ways to write a regression test for this bug. So far I've been unable to adapt my existing reproduction into something wh

[GitHub] [spark] cloud-fan commented on pull request #35056: [SPARK-37777][SQL] Update the SQL syntax of SHOW FUNCTIONS

2021-12-29 Thread GitBox
cloud-fan commented on pull request #35056: URL: https://github.com/apache/spark/pull/35056#issuecomment-1002854853 > Do we need to add new syntax to migration guide? IIUC migration guide is for breaking changes, not new features. Since this PR doesn't break anything, we don't need t

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #35059: [SPARK-37780][SQL] QueryExecutionListener support SQLConf as constructor

2021-12-29 Thread GitBox
AngersZh commented on a change in pull request #35059: URL: https://github.com/apache/spark/pull/35059#discussion_r776554972 ## File path: sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala ## @@ -74,15 +75,19 @@ trait QueryExecutionListener {

[GitHub] [spark] JoshRosen opened a new pull request #35066: [SPARK-37784] Correctly handle UDTs in CodeGenerator.addBufferedState()

2021-12-29 Thread GitBox
JoshRosen opened a new pull request #35066: URL: https://github.com/apache/spark/pull/35066 ### What changes were proposed in this pull request? This PR fixes a correctness issue in the CodeGenerator.addBufferedState() helper method (which is used by the SortMergeJoinExec operator).

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35064: [SPARK-37783][SS][SQL][CORE] Enable tail-recursive wherever possible

2021-12-29 Thread GitBox
HyukjinKwon commented on a change in pull request #35064: URL: https://github.com/apache/spark/pull/35064#discussion_r776554749 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala ## @@ -76,23 +77,29 @@ case class DecimalType(precision: Int,

[GitHub] [spark] cloud-fan commented on a change in pull request #35059: [SPARK-37780][SQL] QueryExecutionListener support SQLConf as constructor

2021-12-29 Thread GitBox
cloud-fan commented on a change in pull request #35059: URL: https://github.com/apache/spark/pull/35059#discussion_r776554837 ## File path: sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala ## @@ -74,15 +75,19 @@ trait QueryExecutionListener { //

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35064: [SPARK-37783][SS][SQL][CORE] Enable tail-recursive wherever possible

2021-12-29 Thread GitBox
HyukjinKwon commented on a change in pull request #35064: URL: https://github.com/apache/spark/pull/35064#discussion_r776553602 ## File path: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ## @@ -751,6 +751,14 @@ private[history] class FsHistoryPro

[GitHub] [spark] beliefer commented on a change in pull request #35052: [SPARK-37644][SQL][FOLLOWUP] When partition column is same as group by key, pushing down aggregate completely.

2021-12-29 Thread GitBox
beliefer commented on a change in pull request #35052: URL: https://github.com/apache/spark/pull/35052#discussion_r776552450 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownAggregates.java ## @@ -49,9 +51,10 @@ * Whether the dat

[GitHub] [spark] HyukjinKwon opened a new pull request #35065: [SPARK-37785][SQL][CORE] Add Utils.isAtExecutorSide

2021-12-29 Thread GitBox
HyukjinKwon opened a new pull request #35065: URL: https://github.com/apache/spark/pull/35065 ### What changes were proposed in this pull request? This PR proposes to add `Utils.isAtExecutorSide` to see if the codes are running on Executor (or Driver). ### Why are the changes

[GitHub] [spark] HyukjinKwon commented on pull request #35064: [SPARK-37783][SQL][CORE] Enable tail-recursive wherever possible

2021-12-29 Thread GitBox
HyukjinKwon commented on pull request #35064: URL: https://github.com/apache/spark/pull/35064#issuecomment-1002849152 Seems not .. https://contributors.scala-lang.org/t/warning-for-recursive-functions-without-tailrec-annotation/4507/4. There look a couple of external plugins but I wouldn't

[GitHub] [spark] cloud-fan commented on pull request #35064: [SPARK-37783][SQL][CORE] Enable tail-recursive wherever possible

2021-12-29 Thread GitBox
cloud-fan commented on pull request #35064: URL: https://github.com/apache/spark/pull/35064#issuecomment-1002847378 is it possible to let the Scala compiler give warnings if tail-recursive methods are not marked as tail-recursive? -- This is an automated message from the Apache Git Servi

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #35059: [SPARK-37780][SQL] QueryExecutionListener support SQLConf as constructor

2021-12-29 Thread GitBox
AngersZh commented on a change in pull request #35059: URL: https://github.com/apache/spark/pull/35059#discussion_r776548377 ## File path: sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala ## @@ -74,15 +75,19 @@ trait QueryExecutionListener {

[GitHub] [spark] HyukjinKwon opened a new pull request #35064: [SPARK-37783][SQL][CORE] Enable tail-recursive wherever possible

2021-12-29 Thread GitBox
HyukjinKwon opened a new pull request #35064: URL: https://github.com/apache/spark/pull/35064 ### What changes were proposed in this pull request? This PR adds `scala.annotation.tailrec` inspected by IDE (IntelliJ). - If it has one instance in a file, it uses it without importing

[GitHub] [spark] AngersZhuuuu commented on pull request #35059: [SPARK-37780][SQL] QueryExecutionListener support SQLConf as constructor

2021-12-29 Thread GitBox
AngersZh commented on pull request #35059: URL: https://github.com/apache/spark/pull/35059#issuecomment-1002843084 @HyukjinKwon The GA failed seems not related to this pr ``` test_with_different_versions_of_python (pyspark.tests.test_worker.WorkerTests) ... OK (0.216s)

[GitHub] [spark] Yikun commented on pull request #35015: [SPARK-37735][K8S] Add appId interface to KubernetesConf

2021-12-29 Thread GitBox
Yikun commented on pull request #35015: URL: https://github.com/apache/spark/pull/35015#issuecomment-1002842119 ``` starting black test... black checks failed: would reformat python/pyspark/shuffle.py Oh no! 💥 💔 💥 1 file would be reformatted, 358 files would be left unchanged.

[GitHub] [spark] itholic opened a new pull request #35063: [SPARK-37657][FOLLOWUP][PYTHON] Separate the tests for pandas < 1.1.0

2021-12-29 Thread GitBox
itholic opened a new pull request #35063: URL: https://github.com/apache/spark/pull/35063 ### What changes were proposed in this pull request? This follow-ups for SPARK-37657 to separate the test based on pandas version, as the supported minimum pandas version is 1.0.5, b

[GitHub] [spark] HyukjinKwon commented on pull request #35062: [SPARK-37782][SQL][PYTHON] Make DataFrame.transform take the parameters for the function

2021-12-29 Thread GitBox
HyukjinKwon commented on pull request #35062: URL: https://github.com/apache/spark/pull/35062#issuecomment-1002837149 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] HyukjinKwon closed pull request #35062: [SPARK-37782][SQL][PYTHON] Make DataFrame.transform take the parameters for the function

2021-12-29 Thread GitBox
HyukjinKwon closed pull request #35062: URL: https://github.com/apache/spark/pull/35062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-un

[GitHub] [spark] ulysses-you commented on pull request #34908: [SPARK-37652][SQL]Support optimize skewed join through union

2021-12-29 Thread GitBox
ulysses-you commented on pull request #34908: URL: https://github.com/apache/spark/pull/34908#issuecomment-1002835758 Although we have supported it, I think it's still good to add some test. @mcdull-zhang can you rebase this PR only for the test ? -- This is an automated message from the

[GitHub] [spark] HyukjinKwon commented on pull request #35062: [SPARK-37782][SQL][PYTHON] Make DataFrame.transform take the parameters for the function

2021-12-29 Thread GitBox
HyukjinKwon commented on pull request #35062: URL: https://github.com/apache/spark/pull/35062#issuecomment-1002833392 Actually, I didn't add it initially because Scala side doesn't support that but after rethinking I don't mind adding it because it's consistent with other similar APIs in P

[GitHub] [spark] dcoliversun commented on pull request #34983: [SPARK-37713][K8S] assign namespace to executor configmap

2021-12-29 Thread GitBox
dcoliversun commented on pull request #34983: URL: https://github.com/apache/spark/pull/34983#issuecomment-1002831692 > OK,I understand. I am working on it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] imback82 commented on a change in pull request #35056: [SPARK-37777][SQL] Update the SQL syntax of SHOW FUNCTIONS

2021-12-29 Thread GitBox
imback82 commented on a change in pull request #35056: URL: https://github.com/apache/spark/pull/35056#discussion_r776524589 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala ## @@ -2132,29 +2132,34 @@ class DDLParserSuite exte

[GitHub] [spark] huaxingao commented on a change in pull request #35052: [SPARK-37644][SQL][FOLLOWUP] When partition column is same as group by key, pushing down aggregate completely.

2021-12-29 Thread GitBox
huaxingao commented on a change in pull request #35052: URL: https://github.com/apache/spark/pull/35052#discussion_r776534716 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownAggregates.java ## @@ -49,9 +51,10 @@ * Whether the da

[GitHub] [spark] AmplabJenkins commented on pull request #35047: [SPARK-37175][SQL] Performance improvement to hash joins with many duplicate keys

2021-12-29 Thread GitBox
AmplabJenkins commented on pull request #35047: URL: https://github.com/apache/spark/pull/35047#issuecomment-1002826626 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] huaxingao commented on a change in pull request #35052: [SPARK-37644][SQL][FOLLOWUP] When partition column is same as group by key, pushing down aggregate completely.

2021-12-29 Thread GitBox
huaxingao commented on a change in pull request #35052: URL: https://github.com/apache/spark/pull/35052#discussion_r776533664 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownAggregates.java ## @@ -49,9 +51,10 @@ * Whether the da

[GitHub] [spark] huaxingao commented on a change in pull request #35052: [SPARK-37644][SQL][FOLLOWUP] When partition column is same as group by key, pushing down aggregate completely.

2021-12-29 Thread GitBox
huaxingao commented on a change in pull request #35052: URL: https://github.com/apache/spark/pull/35052#discussion_r776532846 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownAggregates.java ## @@ -20,6 +20,8 @@ import org.apache.sp

[GitHub] [spark] Daniel-Davies edited a comment on pull request #35032: [SPARK-37738][PYTHON] Fix API skew in PySpark date functions

2021-12-29 Thread GitBox
Daniel-Davies edited a comment on pull request #35032: URL: https://github.com/apache/spark/pull/35032#issuecomment-1002824416 > > Thank you for the review @zero323 - I've amended as per your comments! > > Thanks. `[:]` shouldn't be necessary. `Row` is just fancy tuple and should wor

[GitHub] [spark] Daniel-Davies commented on pull request #35032: [SPARK-37738][PYTHON] Fix API skew in PySpark date functions

2021-12-29 Thread GitBox
Daniel-Davies commented on pull request #35032: URL: https://github.com/apache/spark/pull/35032#issuecomment-1002824416 > > Thank you for the review @zero323 - I've amended as per your comments! > > Thanks. `[:]` shouldn't be necessary. `Row` is just fancy tuple and should work just

[GitHub] [spark] dongjoon-hyun commented on pull request #35038: [SPARK-37728][SQL][3.2] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

2021-12-29 Thread GitBox
dongjoon-hyun commented on pull request #35038: URL: https://github.com/apache/spark/pull/35038#issuecomment-1002823712 cc @williamhyun , too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #35015: [SPARK-37735][K8S] Add appId interface to KubernetesConf

2021-12-29 Thread GitBox
dongjoon-hyun commented on pull request #35015: URL: https://github.com/apache/spark/pull/35015#issuecomment-1002823537 cc @williamhyun , too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] itholic commented on a change in pull request #34863: [SPARK-37601][PYTHON] sql.DataFrame.transform accept function parameters

2021-12-29 Thread GitBox
itholic commented on a change in pull request #34863: URL: https://github.com/apache/spark/pull/34863#discussion_r776531548 ## File path: python/pyspark/sql/tests/test_dataframe.py ## @@ -1138,6 +1138,39 @@ def test_create_nan_decimal_dataframe(self): [Row(value=No

[GitHub] [spark] Yaohua628 commented on a change in pull request #35055: [SPARK-37769][SQL][FOLLOWUP] Filtering files if metadata columns are present in the data filter

2021-12-29 Thread GitBox
Yaohua628 commented on a change in pull request #35055: URL: https://github.com/apache/spark/pull/35055#discussion_r776531617 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala ## @@ -288,6 +288,20 @@ class FileMetada

[GitHub] [spark] itholic commented on a change in pull request #34863: [SPARK-37601][PYTHON] sql.DataFrame.transform accept function parameters

2021-12-29 Thread GitBox
itholic commented on a change in pull request #34863: URL: https://github.com/apache/spark/pull/34863#discussion_r776531548 ## File path: python/pyspark/sql/tests/test_dataframe.py ## @@ -1138,6 +1138,39 @@ def test_create_nan_decimal_dataframe(self): [Row(value=No

[GitHub] [spark] itholic commented on a change in pull request #34863: [SPARK-37601][PYTHON] sql.DataFrame.transform accept function parameters

2021-12-29 Thread GitBox
itholic commented on a change in pull request #34863: URL: https://github.com/apache/spark/pull/34863#discussion_r776530895 ## File path: python/pyspark/sql/dataframe.py ## @@ -3067,6 +3067,10 @@ def transform(self, func: Callable[["DataFrame"], "DataFrame"]) -> "DataFrame":

[GitHub] [spark] itholic commented on a change in pull request #34863: [SPARK-37601][PYTHON] sql.DataFrame.transform accept function parameters

2021-12-29 Thread GitBox
itholic commented on a change in pull request #34863: URL: https://github.com/apache/spark/pull/34863#discussion_r776530895 ## File path: python/pyspark/sql/dataframe.py ## @@ -3067,6 +3067,10 @@ def transform(self, func: Callable[["DataFrame"], "DataFrame"]) -> "DataFrame":

[GitHub] [spark] Yikun commented on a change in pull request #35015: [SPARK-37735][K8S] Add appId interface to KubernetesConf

2021-12-29 Thread GitBox
Yikun commented on a change in pull request #35015: URL: https://github.com/apache/spark/pull/35015#discussion_r776530675 ## File path: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala ## @@ -218,6 +218,14 @@ class Kubernet

[GitHub] [spark] imback82 commented on a change in pull request #35056: [SPARK-37777][SQL] Update the SQL syntax of SHOW FUNCTIONS

2021-12-29 Thread GitBox
imback82 commented on a change in pull request #35056: URL: https://github.com/apache/spark/pull/35056#discussion_r776524589 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala ## @@ -2132,29 +2132,34 @@ class DDLParserSuite exte

[GitHub] [spark] zero323 commented on pull request #35032: [SPARK-37738][PYTHON] Fix API skew in PySpark date functions

2021-12-29 Thread GitBox
zero323 commented on pull request #35032: URL: https://github.com/apache/spark/pull/35032#issuecomment-1002812986 > Thank you for the review @zero323 - I've amended as per your comments! Thanks. `[:]` shouldn't be necessary. `Row` is just fancy tuple and should work just fine in `all

[GitHub] [spark] Daniel-Davies commented on a change in pull request #35032: [SPARK-37738][PYTHON] Fix API skew in PySpark date functions

2021-12-29 Thread GitBox
Daniel-Davies commented on a change in pull request #35032: URL: https://github.com/apache/spark/pull/35032#discussion_r776524083 ## File path: python/pyspark/sql/tests/test_functions.py ## @@ -286,6 +289,42 @@ def test_dayofweek(self): row = df.select(dayofweek(df.dat

[GitHub] [spark] Daniel-Davies commented on a change in pull request #35032: [SPARK-37738][PYTHON] Fix API skew in PySpark date functions

2021-12-29 Thread GitBox
Daniel-Davies commented on a change in pull request #35032: URL: https://github.com/apache/spark/pull/35032#discussion_r776524032 ## File path: python/pyspark/sql/tests/test_functions.py ## @@ -286,6 +289,42 @@ def test_dayofweek(self): row = df.select(dayofweek(df.dat

[GitHub] [spark] Daniel-Davies commented on pull request #35032: [SPARK-37738][PYTHON] Fix API skew in PySpark date functions

2021-12-29 Thread GitBox
Daniel-Davies commented on pull request #35032: URL: https://github.com/apache/spark/pull/35032#issuecomment-1002812365 @HyukjinKwon - All makes sense to me; I'd be happy to write this in a ticket and make a PR to correct some of the input parameters over the next few days. Thank you

[GitHub] [spark] AmplabJenkins commented on pull request #35055: [SPARK-37769][SQL][FOLLOWUP] Filtering files if metadata columns are present in the data filter

2021-12-29 Thread GitBox
AmplabJenkins commented on pull request #35055: URL: https://github.com/apache/spark/pull/35055#issuecomment-1002806733 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] ueshin opened a new pull request #35062: [SPARK-37782][SQL][PYTHON] Make DataFrame.transform take the parameters for the function

2021-12-29 Thread GitBox
ueshin opened a new pull request #35062: URL: https://github.com/apache/spark/pull/35062 ### What changes were proposed in this pull request? Makes `DataFrame.transform` take the parameters for the function. ### Why are the changes needed? Currently when a function which

[GitHub] [spark] zero323 commented on pull request #35032: [SPARK-37738][PYTHON] Fix API skew in PySpark date functions

2021-12-29 Thread GitBox
zero323 commented on pull request #35032: URL: https://github.com/apache/spark/pull/35032#issuecomment-1002795240 Other than these, LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] zero323 commented on a change in pull request #35032: [SPARK-37738][PYTHON] Fix API skew in PySpark date functions

2021-12-29 Thread GitBox
zero323 commented on a change in pull request #35032: URL: https://github.com/apache/spark/pull/35032#discussion_r776509814 ## File path: python/pyspark/sql/tests/test_functions.py ## @@ -286,6 +289,42 @@ def test_dayofweek(self): row = df.select(dayofweek(df.date)).fi

[GitHub] [spark] zero323 commented on a change in pull request #35032: [SPARK-37738][PYTHON] Fix API skew in PySpark date functions

2021-12-29 Thread GitBox
zero323 commented on a change in pull request #35032: URL: https://github.com/apache/spark/pull/35032#discussion_r776509657 ## File path: python/pyspark/sql/tests/test_functions.py ## @@ -286,6 +289,42 @@ def test_dayofweek(self): row = df.select(dayofweek(df.date)).fi

[GitHub] [spark] zero323 commented on a change in pull request #35032: [SPARK-37738][PYTHON] Fix API skew in PySpark date functions

2021-12-29 Thread GitBox
zero323 commented on a change in pull request #35032: URL: https://github.com/apache/spark/pull/35032#discussion_r776509657 ## File path: python/pyspark/sql/tests/test_functions.py ## @@ -286,6 +289,42 @@ def test_dayofweek(self): row = df.select(dayofweek(df.date)).fi

  1   2   >