[GitHub] [spark] beliefer commented on a diff in pull request #36593: [SPARK-39139][SQL] DS V2 push-down framework supports DS V2 UDF

2022-05-19 Thread GitBox
beliefer commented on code in PR #36593: URL: https://github.com/apache/spark/pull/36593#discussion_r877803218 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCCatalog.scala: ## @@ -32,11 +35,14 @@ import org.apache.spark.sql.jdbc.{JdbcDialect, J

[GitHub] [spark] wangyum commented on pull request #36588: [SPARK-39217][SQL] Makes DPP support the pruning side has Union

2022-05-19 Thread GitBox
wangyum commented on PR #36588: URL: https://github.com/apache/spark/pull/36588#issuecomment-1132514681 A case from production: ![image](https://user-images.githubusercontent.com/5399861/169463931-65bfd0c0-1759-4f9d-8a0a-66b32463b76a.png) -- This is an automated message from the Ap

[GitHub] [spark] cloud-fan commented on a diff in pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #36608: URL: https://github.com/apache/spark/pull/36608#discussion_r877763059 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala: ## @@ -34,7 +34,7 @@ abstract class Covariance(val left: Expressio

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-05-19 Thread GitBox
amaliujia commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r877754283 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -97,8 +97,18 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] LuciferYang commented on pull request #36616: [WIP][SPARK-39231][SQL] Change to use `ConstantColumnVector` to store partition columns in `VectorizedParquetRecordReader`

2022-05-19 Thread GitBox
LuciferYang commented on PR #36616: URL: https://github.com/apache/spark/pull/36616#issuecomment-1132495258 This pr mainly focuses on `Parquet`. If this is acceptable, I will change Orc in another pr -- This is an automated message from the Apache Git Service. To respond to the message, p

[GitHub] [spark] cloud-fan commented on a diff in pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #36608: URL: https://github.com/apache/spark/pull/36608#discussion_r877745569 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala: ## @@ -69,7 +69,7 @@ abstract class Covariance(val left: Expressio

[GitHub] [spark] cloud-fan commented on a diff in pull request #36614: [SPARK-39237][DOCS] Update the ANSI SQL mode documentation

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #36614: URL: https://github.com/apache/spark/pull/36614#discussion_r877742782 ## docs/sql-ref-ansi-compliance.md: ## @@ -28,10 +28,10 @@ The casting behaviours are defined as store assignment rules in the standard. When `spark.sql.storeAssig

[GitHub] [spark] cloud-fan commented on a diff in pull request #36614: [SPARK-39237][DOCS] Update the ANSI SQL mode documentation

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #36614: URL: https://github.com/apache/spark/pull/36614#discussion_r877742454 ## docs/sql-ref-ansi-compliance.md: ## @@ -28,10 +28,10 @@ The casting behaviours are defined as store assignment rules in the standard. When `spark.sql.storeAssig

[GitHub] [spark] LuciferYang commented on pull request #36616: [WIP][SPARK-39231][SQL] Change to use `ConstantColumnVector` to store partition columns in `VectorizedParquetRecordReader`

2022-05-19 Thread GitBox
LuciferYang commented on PR #36616: URL: https://github.com/apache/spark/pull/36616#issuecomment-1132482921 will update pr description later -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] cloud-fan commented on pull request #36615: [SPARK-39238][SQL] Apply WidenSetOperationTypes at last to fix decimal precision loss

2022-05-19 Thread GitBox
cloud-fan commented on PR #36615: URL: https://github.com/apache/spark/pull/36615#issuecomment-1132481952 Good catch! This is a long-standing issue. The type coercion for decimal types is really messy as it's not bound to `Expression.resolved`. Changing the rule order does fix this s

[GitHub] [spark] LuciferYang opened a new pull request, #36616: [SPARK-39231][SQL] Change to use `ConstantColumnVector` to store partition columns in `VectorizedParquetRecordReader`

2022-05-19 Thread GitBox
LuciferYang opened a new pull request, #36616: URL: https://github.com/apache/spark/pull/36616 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was thi

[GitHub] [spark] HyukjinKwon commented on pull request #36486: [SPARK-39129][PS] Implement GroupBy.ewm

2022-05-19 Thread GitBox
HyukjinKwon commented on PR #36486: URL: https://github.com/apache/spark/pull/36486#issuecomment-1132476982 I haven't taken a close look but seems fine from a cursory look. Should be good to go. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] zhengruifeng commented on pull request #36486: [SPARK-39129][PS] Implement GroupBy.ewm

2022-05-19 Thread GitBox
zhengruifeng commented on PR #36486: URL: https://github.com/apache/spark/pull/36486#issuecomment-1132454863 cc @HyukjinKwon @xinrong-databricks @itholic would you mind take a look whenyou have some time, thanks -- This is an automated message from the Apache Git Service. To respond to th

[GitHub] [spark] manuzhang commented on pull request #36615: [SPARK-39238][SQL] Apply WidenSetOperationTypes at last to fix decimal precision loss

2022-05-19 Thread GitBox
manuzhang commented on PR #36615: URL: https://github.com/apache/spark/pull/36615#issuecomment-1132451324 cc @gengliangwang @cloud-fan @turboFei -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [spark] manuzhang opened a new pull request, #36615: [SPARK-39238][SQL] Apply WidenSetOperationTypes at last to fix decimal precision loss

2022-05-19 Thread GitBox
manuzhang opened a new pull request, #36615: URL: https://github.com/apache/spark/pull/36615 ### What changes were proposed in this pull request? When analyzing, apply WidenSetOperationTypes after other rules. ### Why are the changes needed? The following SQL returns 1.00 whi

[GitHub] [spark] HyukjinKwon commented on pull request #36589: [SPARK-39218][SS][PYTHON] Make foreachBatch streaming query stop gracefully

2022-05-19 Thread GitBox
HyukjinKwon commented on PR #36589: URL: https://github.com/apache/spark/pull/36589#issuecomment-1132433602 Merged to master and branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] HyukjinKwon closed pull request #36589: [SPARK-39218][SS][PYTHON] Make foreachBatch streaming query stop gracefully

2022-05-19 Thread GitBox
HyukjinKwon closed pull request #36589: [SPARK-39218][SS][PYTHON] Make foreachBatch streaming query stop gracefully URL: https://github.com/apache/spark/pull/36589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] gengliangwang commented on pull request #36614: [SPARK-39237][DOCS] Update the ANSI SQL mode documentation

2022-05-19 Thread GitBox
gengliangwang commented on PR #36614: URL: https://github.com/apache/spark/pull/36614#issuecomment-1132430487 cc @tanvn as well. Thanks for pointing it out! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] gengliangwang opened a new pull request, #36614: [SPARK-39237][DOCS] Update the ANSI SQL mode documentation

2022-05-19 Thread GitBox
gengliangwang opened a new pull request, #36614: URL: https://github.com/apache/spark/pull/36614 ### What changes were proposed in this pull request? 1. Remove the Experimental notation in ANSI SQL compliance doc 2. Update the description of `spark.sql.ansi.enabled`, since t

[GitHub] [spark] dongjoon-hyun commented on pull request #36358: [SPARK-39023] [K8s] Add Executor Pod inter-pod anti-affinity

2022-05-19 Thread GitBox
dongjoon-hyun commented on PR #36358: URL: https://github.com/apache/spark/pull/36358#issuecomment-1132427215 Thank you so much, @zwangsheng . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-05-19 Thread GitBox
amaliujia commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r877706031 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -367,24 +377,40 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-05-19 Thread GitBox
amaliujia commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r877705884 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -97,8 +97,18 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] zwangsheng closed pull request #36358: [SPARK-39023] [K8s] Add Executor Pod inter-pod anti-affinity

2022-05-19 Thread GitBox
zwangsheng closed pull request #36358: [SPARK-39023] [K8s] Add Executor Pod inter-pod anti-affinity URL: https://github.com/apache/spark/pull/36358 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] zwangsheng commented on pull request #36358: [SPARK-39023] [K8s] Add Executor Pod inter-pod anti-affinity

2022-05-19 Thread GitBox
zwangsheng commented on PR #36358: URL: https://github.com/apache/spark/pull/36358#issuecomment-1132423502 > Hi, @zwangsheng . Thank you for making a PR. However, Apache Spark community wants to avoid feature duplications like this. The proposed feature is already delivered to many pro

[GitHub] [spark] Ngone51 commented on a diff in pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2022-05-19 Thread GitBox
Ngone51 commented on code in PR #36162: URL: https://github.com/apache/spark/pull/36162#discussion_r877700761 ## core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala: ## @@ -769,6 +785,25 @@ private[spark] class TaskSetManager( } } + def setTaskRecordsA

[GitHub] [spark] beliefer commented on pull request #36608: [SPARK-39230][SQL] Support ANSI Aggregate Function: regr_slope

2022-05-19 Thread GitBox
beliefer commented on PR #36608: URL: https://github.com/apache/spark/pull/36608#issuecomment-1132419433 ping @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] gengliangwang commented on pull request #27590: [SPARK-30703][SQL][DOCS][FollowUp] Declare the ANSI SQL compliance options as experimental

2022-05-19 Thread GitBox
gengliangwang commented on PR #27590: URL: https://github.com/apache/spark/pull/27590#issuecomment-1132410903 @tanvn nice catch! @cloud-fan Yes I will update the docs on 3.2 and above -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36599: [SPARK-39228][PYTHON][PS] Implement `skipna` of `Series.argmax`

2022-05-19 Thread GitBox
HyukjinKwon commented on code in PR #36599: URL: https://github.com/apache/spark/pull/36599#discussion_r877689895 ## python/pyspark/pandas/series.py: ## @@ -6239,13 +6239,19 @@ def argsort(self) -> "Series": ps.concat([psser, self.loc[self.isnull()].spark.transform(

[GitHub] [spark] cloud-fan commented on pull request #27590: [SPARK-30703][SQL][DOCS][FollowUp] Declare the ANSI SQL compliance options as experimental

2022-05-19 Thread GitBox
cloud-fan commented on PR #27590: URL: https://github.com/apache/spark/pull/27590#issuecomment-1132399814 I think we can remove the experimental mark now. What do you think? @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r877685094 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -367,24 +377,40 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r877684804 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -367,24 +377,40 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r877684610 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -367,24 +377,40 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] ulysses-you commented on pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2022-05-19 Thread GitBox
ulysses-you commented on PR #34785: URL: https://github.com/apache/spark/pull/34785#issuecomment-1132397474 Looks correct to me. BTW, after Spark3.3 the RebalancePartitions supports specify the initialNumPartition, so the demo code can be: ```scala val optNumPartitions = if (numPartiti

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r877684350 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -367,24 +377,40 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r877684235 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -97,8 +97,18 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r877682958 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -97,8 +97,18 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] Yikun commented on a diff in pull request #36599: [SPARK-39228][PYTHON][PS] Implement `skipna` of `Series.argmax`

2022-05-19 Thread GitBox
Yikun commented on code in PR #36599: URL: https://github.com/apache/spark/pull/36599#discussion_r877673383 ## python/pyspark/pandas/series.py: ## @@ -6239,13 +6239,19 @@ def argsort(self) -> "Series": ps.concat([psser, self.loc[self.isnull()].spark.transform(lambda

[GitHub] [spark] Yikun commented on a diff in pull request #36599: [SPARK-39228][PYTHON][PS] Implement `skipna` of `Series.argmax`

2022-05-19 Thread GitBox
Yikun commented on code in PR #36599: URL: https://github.com/apache/spark/pull/36599#discussion_r877669800 ## python/pyspark/pandas/series.py: ## @@ -6239,13 +6239,19 @@ def argsort(self) -> "Series": ps.concat([psser, self.loc[self.isnull()].spark.transform(lambda

[GitHub] [spark] LuciferYang commented on a diff in pull request #36611: [SPARK-39204][CORE] Change `Utils.createTempDir` and `Utils.createDirectory` call the same logic method in `JavaUtils`

2022-05-19 Thread GitBox
LuciferYang commented on code in PR #36611: URL: https://github.com/apache/spark/pull/36611#discussion_r877674754 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -308,28 +308,7 @@ private[spark] object Utils extends Logging { * newly created, and is not marke

[GitHub] [spark] LuciferYang commented on a diff in pull request #36611: [SPARK-39204][CORE] Change `Utils.createTempDir` and `Utils.createDirectory` call the same logic method in `JavaUtils`

2022-05-19 Thread GitBox
LuciferYang commented on code in PR #36611: URL: https://github.com/apache/spark/pull/36611#discussion_r877674586 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -339,9 +318,7 @@ private[spark] object Utils extends Logging { def createTempDir( root: Str

[GitHub] [spark] yaooqinn commented on pull request #36592: [SPARK-39221][SQL] Make sensitive information be redacted correctly for thrift server job/stage tab

2022-05-19 Thread GitBox
yaooqinn commented on PR #36592: URL: https://github.com/apache/spark/pull/36592#issuecomment-1132373620 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [spark] yaooqinn closed pull request #36592: [SPARK-39221][SQL] Make sensitive information be redacted correctly for thrift server job/stage tab

2022-05-19 Thread GitBox
yaooqinn closed pull request #36592: [SPARK-39221][SQL] Make sensitive information be redacted correctly for thrift server job/stage tab URL: https://github.com/apache/spark/pull/36592 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [spark] LuciferYang commented on pull request #36611: [SPARK-39204][CORE] Change `Utils.createTempDir` and `Utils.createDirectory` call the same logic method in `JavaUtils`

2022-05-19 Thread GitBox
LuciferYang commented on PR #36611: URL: https://github.com/apache/spark/pull/36611#issuecomment-1132373369 > Yeah, I think we should better fix `Utils.createTempDir`. Yeah ~ now this pr only change one file and achieved the goal -- This is an automated message from the Apache Git S

[GitHub] [spark] Yikun commented on a diff in pull request #36599: [SPARK-39228][PYTHON][PS] Implement `skipna` of `Series.argmax`

2022-05-19 Thread GitBox
Yikun commented on code in PR #36599: URL: https://github.com/apache/spark/pull/36599#discussion_r877673383 ## python/pyspark/pandas/series.py: ## @@ -6239,13 +6239,19 @@ def argsort(self) -> "Series": ps.concat([psser, self.loc[self.isnull()].spark.transform(lambda

[GitHub] [spark] Yikun commented on a diff in pull request #36599: [SPARK-39228][PYTHON][PS] Implement `skipna` of `Series.argmax`

2022-05-19 Thread GitBox
Yikun commented on code in PR #36599: URL: https://github.com/apache/spark/pull/36599#discussion_r877669800 ## python/pyspark/pandas/series.py: ## @@ -6239,13 +6239,19 @@ def argsort(self) -> "Series": ps.concat([psser, self.loc[self.isnull()].spark.transform(lambda

[GitHub] [spark] huaxingao commented on pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2022-05-19 Thread GitBox
huaxingao commented on PR #34785: URL: https://github.com/apache/spark/pull/34785#issuecomment-1132364552 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36599: [SPARK-39228][PYTHON][PS] Implement `skipna` of `Series.argmax`

2022-05-19 Thread GitBox
HyukjinKwon commented on code in PR #36599: URL: https://github.com/apache/spark/pull/36599#discussion_r877664483 ## python/pyspark/pandas/series.py: ## @@ -6239,13 +6239,19 @@ def argsort(self) -> "Series": ps.concat([psser, self.loc[self.isnull()].spark.transform(

[GitHub] [spark] huaxingao commented on pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2022-05-19 Thread GitBox
huaxingao commented on PR #34785: URL: https://github.com/apache/spark/pull/34785#issuecomment-1132363307 Thanks @aokolnychyi for the proposal. I agree that we should support both strictly required distribution and best effort distribution. For best effort distribution, if user doesn't requ

[GitHub] [spark] beliefer commented on a diff in pull request #36330: [SPARK-38897][SQL] DS V2 supports push down string functions

2022-05-19 Thread GitBox
beliefer commented on code in PR #36330: URL: https://github.com/apache/spark/pull/36330#discussion_r877653817 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java: ## @@ -228,4 +244,18 @@ protected String visitSQLFunction(String funcName

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36589: [SPARK-39218][SS][PYTHON] Make foreachBatch streaming query stop gracefully

2022-05-19 Thread GitBox
HyukjinKwon commented on code in PR #36589: URL: https://github.com/apache/spark/pull/36589#discussion_r877653141 ## python/pyspark/sql/tests/test_streaming.py: ## @@ -592,6 +592,18 @@ def collectBatch(df, id): if q: q.stop() +def test_streami

[GitHub] [spark] zsxwing commented on a diff in pull request #36589: [SPARK-39218][SS][PYTHON] Make foreachBatch streaming query stop gracefully

2022-05-19 Thread GitBox
zsxwing commented on code in PR #36589: URL: https://github.com/apache/spark/pull/36589#discussion_r877648746 ## python/pyspark/sql/tests/test_streaming.py: ## @@ -592,6 +592,18 @@ def collectBatch(df, id): if q: q.stop() +def test_streaming_f

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36589: [SPARK-39218][SS][PYTHON] Make foreachBatch streaming query stop gracefully

2022-05-19 Thread GitBox
HyukjinKwon commented on code in PR #36589: URL: https://github.com/apache/spark/pull/36589#discussion_r877647489 ## python/pyspark/sql/tests/test_streaming.py: ## @@ -592,6 +592,18 @@ def collectBatch(df, id): if q: q.stop() +def test_streami

[GitHub] [spark] zsxwing commented on a diff in pull request #36589: [SPARK-39218][SS][PYTHON] Make foreachBatch streaming query stop gracefully

2022-05-19 Thread GitBox
zsxwing commented on code in PR #36589: URL: https://github.com/apache/spark/pull/36589#discussion_r877642301 ## python/pyspark/sql/tests/test_streaming.py: ## @@ -592,6 +592,18 @@ def collectBatch(df, id): if q: q.stop() +def test_streaming_f

[GitHub] [spark] HyukjinKwon commented on pull request #36611: [SPARK-39204][BUILD][CORE][SQL][DSTREAM][GRAPHX][K8S][ML][MLLIB][SS][YARN][EXAMPLES][SHELL] Replace `Utils.createTempDir` with `JavaUtils

2022-05-19 Thread GitBox
HyukjinKwon commented on PR #36611: URL: https://github.com/apache/spark/pull/36611#issuecomment-1132340311 Yeah, I think we should better fix `Utils.createTempDir`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] github-actions[bot] closed pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2022-05-19 Thread GitBox
github-actions[bot] closed pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified URL: https://github.com/apache/spark/pull/34785 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] github-actions[bot] closed pull request #35049: [SPARK-37757][BUILD] Enable Spark test scheduled job on ARM runner

2022-05-19 Thread GitBox
github-actions[bot] closed pull request #35049: [SPARK-37757][BUILD] Enable Spark test scheduled job on ARM runner URL: https://github.com/apache/spark/pull/35049 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] github-actions[bot] closed pull request #35402: [SPARK-37536][SQL] Allow for API user to disable Shuffle on Local Mode

2022-05-19 Thread GitBox
github-actions[bot] closed pull request #35402: [SPARK-37536][SQL] Allow for API user to disable Shuffle on Local Mode URL: https://github.com/apache/spark/pull/35402 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] github-actions[bot] commented on pull request #35424: [WIP][SPARK-38116] Add auto commit option to JDBC PostgreSQL driver and set the option false default

2022-05-19 Thread GitBox
github-actions[bot] commented on PR #35424: URL: https://github.com/apache/spark/pull/35424#issuecomment-1132318749 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] dongjoon-hyun commented on pull request #36004: [SPARK-38681][SQL] Support nested generic case classes

2022-05-19 Thread GitBox
dongjoon-hyun commented on PR #36004: URL: https://github.com/apache/spark/pull/36004#issuecomment-1132318738 Thank you, @eejbyfeldt , @cloud-fan , @srowen ! cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] srowen commented on pull request #36004: [SPARK-38681][SQL] Support nested generic case classes

2022-05-19 Thread GitBox
srowen commented on PR #36004: URL: https://github.com/apache/spark/pull/36004#issuecomment-1132316150 Merged to master/3.3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] srowen closed pull request #36004: [SPARK-38681][SQL] Support nested generic case classes

2022-05-19 Thread GitBox
srowen closed pull request #36004: [SPARK-38681][SQL] Support nested generic case classes URL: https://github.com/apache/spark/pull/36004 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #36004: [SPARK-38681][SQL] Support nested generic case classes

2022-05-19 Thread GitBox
dongjoon-hyun commented on PR #36004: URL: https://github.com/apache/spark/pull/36004#issuecomment-1132295581 Thank you, @eejbyfeldt . cc @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] hai-tao-1 commented on pull request #36606: [SPARK-39232][CORE] History Server Main Page App List Filtering

2022-05-19 Thread GitBox
hai-tao-1 commented on PR #36606: URL: https://github.com/apache/spark/pull/36606#issuecomment-1132271280 The PR test fails with ```[error] spark-core: Failed binary compatibility check against org.apache.spark:spark-core_2.12:3.2.0! Found 9 potential problems (filtered 924)```. Anyone coul

[GitHub] [spark] hai-tao-1 commented on pull request #36606: [SPARK-39232][CORE] History Server Main Page App List Filtering

2022-05-19 Thread GitBox
hai-tao-1 commented on PR #36606: URL: https://github.com/apache/spark/pull/36606#issuecomment-1132271279 The PR test fails with ```[error] spark-core: Failed binary compatibility check against org.apache.spark:spark-core_2.12:3.2.0! Found 9 potential problems (filtered 924)```. Anyone coul

[GitHub] [spark] dongjoon-hyun commented on pull request #36597: [SPARK-39225][CORE] Support `spark.history.fs.update.batchSize`

2022-05-19 Thread GitBox
dongjoon-hyun commented on PR #36597: URL: https://github.com/apache/spark/pull/36597#issuecomment-1132159846 Merged to master. I added you to the Apache Spark contributor group and assigned SPARK-39225 to you, @hai-tao-1 . Welcome to the Apache Spark community. -- This is an automated

[GitHub] [spark] dongjoon-hyun closed pull request #36597: [SPARK-39225][CORE] Support `spark.history.fs.update.batchSize`

2022-05-19 Thread GitBox
dongjoon-hyun closed pull request #36597: [SPARK-39225][CORE] Support `spark.history.fs.update.batchSize` URL: https://github.com/apache/spark/pull/36597 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] amaliujia commented on pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-05-19 Thread GitBox
amaliujia commented on PR #36586: URL: https://github.com/apache/spark/pull/36586#issuecomment-1132121981 R: @cloud-fan this PR is ready to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] huaxingao opened a new pull request, #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2022-05-19 Thread GitBox
huaxingao opened a new pull request, #34785: URL: https://github.com/apache/spark/pull/34785 ### What changes were proposed in this pull request? Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified ### Why are the changes needed?

[GitHub] [spark] otterc commented on a diff in pull request #36601: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set to

2022-05-19 Thread GitBox
otterc commented on code in PR #36601: URL: https://github.com/apache/spark/pull/36601#discussion_r877366840 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1885,6 +1885,14 @@ private[spark] class DAGScheduler( mapOutputTracker.

[GitHub] [spark] aokolnychyi commented on pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2022-05-19 Thread GitBox
aokolnychyi commented on PR #34785: URL: https://github.com/apache/spark/pull/34785#issuecomment-1132014116 Thanks for the PR, @huaxingao. I think it is a great feature and it would be awesome to get it done. I spent some time thinking about this and have a few questions/proposals.

[GitHub] [spark] mridulm commented on a diff in pull request #36601: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set t

2022-05-19 Thread GitBox
mridulm commented on code in PR #36601: URL: https://github.com/apache/spark/pull/36601#discussion_r877361726 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1885,6 +1885,14 @@ private[spark] class DAGScheduler( mapOutputTracker.

[GitHub] [spark] MaxGekk commented on pull request #36603: [SPARK-39163][SQL] Throw an exception w/ error class for an invalid bucket file

2022-05-19 Thread GitBox
MaxGekk commented on PR #36603: URL: https://github.com/apache/spark/pull/36603#issuecomment-1132003245 @panbingkun Could you backport this to branch-3.3, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] MaxGekk closed pull request #36603: [SPARK-39163][SQL] Throw an exception w/ error class for an invalid bucket file

2022-05-19 Thread GitBox
MaxGekk closed pull request #36603: [SPARK-39163][SQL] Throw an exception w/ error class for an invalid bucket file URL: https://github.com/apache/spark/pull/36603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] otterc commented on a diff in pull request #36601: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set to

2022-05-19 Thread GitBox
otterc commented on code in PR #36601: URL: https://github.com/apache/spark/pull/36601#discussion_r877340985 ## core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala: ## @@ -1786,4 +1786,32 @@ class ShuffleBlockFetcherIteratorSuite extends SparkFun

[GitHub] [spark] hai-tao-1 commented on pull request #36597: [SPARK-39225][CORE] Support `spark.history.fs.update.batchSize`

2022-05-19 Thread GitBox
hai-tao-1 commented on PR #36597: URL: https://github.com/apache/spark/pull/36597#issuecomment-1131988355 > Thank you for updates, @hai-tao-1 . Yes, the only remaining comment is the test case. > > > We need a test case for the configuration. Please check the corner cases especially.

[GitHub] [spark] otterc commented on a diff in pull request #36601: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set to

2022-05-19 Thread GitBox
otterc commented on code in PR #36601: URL: https://github.com/apache/spark/pull/36601#discussion_r877329983 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1885,6 +1885,14 @@ private[spark] class DAGScheduler( mapOutputTracker.

[GitHub] [spark] LuciferYang commented on pull request #36611: [SPARK-39204][BUILD][CORE][SQL][DSTREAM][GRAPHX][K8S][ML][MLLIB][SS][YARN][EXAMPLES][SHELL] Replace `Utils.createTempDir` with `JavaUtils

2022-05-19 Thread GitBox
LuciferYang commented on PR #36611: URL: https://github.com/apache/spark/pull/36611#issuecomment-1131979146 It seems that this change is big. Another way to keep one `createTempDir` is to let `Utils.createTempDir` call `JavaUtils.createTempDir` . Is this acceptable? -- This is an aut

[GitHub] [spark] akpatnam25 commented on a diff in pull request #36601: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is se

2022-05-19 Thread GitBox
akpatnam25 commented on code in PR #36601: URL: https://github.com/apache/spark/pull/36601#discussion_r877316473 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1885,6 +1885,14 @@ private[spark] class DAGScheduler( mapOutputTracker.

[GitHub] [spark] akpatnam25 commented on a diff in pull request #36601: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is se

2022-05-19 Thread GitBox
akpatnam25 commented on code in PR #36601: URL: https://github.com/apache/spark/pull/36601#discussion_r877316296 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1885,6 +1885,14 @@ private[spark] class DAGScheduler( mapOutputTracker.

[GitHub] [spark] nkronenfeld opened a new pull request, #36613: [WIP][SPARK-30983] Support typed select in Datasets up to the max tuple size

2022-05-19 Thread GitBox
nkronenfeld opened a new pull request, #36613: URL: https://github.com/apache/spark/pull/36613 ### What changes were proposed in this pull request? This PR simply adds typed select methods to Dataset up to the max Tuple size of 22. This has been bugging me for years, so I final

[GitHub] [spark] vli-databricks commented on pull request #36584: [SPARK-39213][SQL] Create ANY_VALUE aggregate function

2022-05-19 Thread GitBox
vli-databricks commented on PR #36584: URL: https://github.com/apache/spark/pull/36584#issuecomment-1131948634 Yes, the purpose is ease of migration, removed change to `functions.scala` to limit scope to Spark SQL only. -- This is an automated message from the Apache Git Service. To respo

[GitHub] [spark] MaxGekk commented on pull request #36584: [SPARK-39213][SQL] Create ANY_VALUE aggregate function

2022-05-19 Thread GitBox
MaxGekk commented on PR #36584: URL: https://github.com/apache/spark/pull/36584#issuecomment-1131939361 How about to add the function to other APIs like first() in - PySpark: https://github.com/apache/spark/blob/b63674ea5f746306a96ab8c39c23a230a6cb9566/sql/core/src/main/scala/org/apache/s

[GitHub] [spark] mridulm commented on a diff in pull request #36601: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set t

2022-05-19 Thread GitBox
mridulm commented on code in PR #36601: URL: https://github.com/apache/spark/pull/36601#discussion_r877275914 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1885,6 +1885,14 @@ private[spark] class DAGScheduler( mapOutputTracker.

[GitHub] [spark] mridulm commented on a diff in pull request #36601: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set t

2022-05-19 Thread GitBox
mridulm commented on code in PR #36601: URL: https://github.com/apache/spark/pull/36601#discussion_r877273610 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1885,6 +1885,14 @@ private[spark] class DAGScheduler( mapOutputTracker.

[GitHub] [spark] MaxGekk commented on a diff in pull request #36580: [SPARK-39167][SQL] Throw an exception w/ an error class for multiple rows from a subquery used as an expression

2022-05-19 Thread GitBox
MaxGekk commented on code in PR #36580: URL: https://github.com/apache/spark/pull/36580#discussion_r877269334 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -1971,4 +1971,10 @@ object QueryExecutionErrors extends QueryErrorsBase {

[GitHub] [spark] MaxGekk closed pull request #36612: [SPARK-39234][SQL] Code clean up in SparkThrowableHelper.getMessage

2022-05-19 Thread GitBox
MaxGekk closed pull request #36612: [SPARK-39234][SQL] Code clean up in SparkThrowableHelper.getMessage URL: https://github.com/apache/spark/pull/36612 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] MaxGekk commented on pull request #36612: [SPARK-39234][SQL] Code clean up in SparkThrowableHelper.getMessage

2022-05-19 Thread GitBox
MaxGekk commented on PR #36612: URL: https://github.com/apache/spark/pull/36612#issuecomment-1131917484 +1, LGTM. Merging to master. Thank you, @gengliangwang and @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] vli-databricks commented on pull request #36584: [SPARK-39213] Create ANY_VALUE aggregate function

2022-05-19 Thread GitBox
vli-databricks commented on PR #36584: URL: https://github.com/apache/spark/pull/36584#issuecomment-1131915487 @MaxGekk please review and help me merge this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [spark] mridulm commented on a diff in pull request #36601: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set t

2022-05-19 Thread GitBox
mridulm commented on code in PR #36601: URL: https://github.com/apache/spark/pull/36601#discussion_r877251054 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4342,6 +4342,56 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSparkCo

[GitHub] [spark] tanvn commented on pull request #27590: [SPARK-30703][SQL][DOCS][FollowUp] Declare the ANSI SQL compliance options as experimental

2022-05-19 Thread GitBox
tanvn commented on PR #27590: URL: https://github.com/apache/spark/pull/27590#issuecomment-1131855914 @gengliangwang @dongjoon-hyun Hi, I have a question. In Spark 3.2.1, are `spark.sql.ansi.enabled` and `spark.sql.storeAssignmentPolicy` still considered as experimental options ? I

[GitHub] [spark] dongjoon-hyun commented on pull request #36377: [SPARK-39043][SQL] Spark SQL Hive client should not gather statistic by default.

2022-05-19 Thread GitBox
dongjoon-hyun commented on PR #36377: URL: https://github.com/apache/spark/pull/36377#issuecomment-1131851884 Thank you for the reverting decision, @cloud-fan and @AngersZh . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] cloud-fan closed pull request #35850: [SPARK-38529][SQL] Prevent GeneratorNestedColumnAliasing to be applied to non-Explode generators

2022-05-19 Thread GitBox
cloud-fan closed pull request #35850: [SPARK-38529][SQL] Prevent GeneratorNestedColumnAliasing to be applied to non-Explode generators URL: https://github.com/apache/spark/pull/35850 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] cloud-fan commented on pull request #35850: [SPARK-38529][SQL] Prevent GeneratorNestedColumnAliasing to be applied to non-Explode generators

2022-05-19 Thread GitBox
cloud-fan commented on PR #35850: URL: https://github.com/apache/spark/pull/35850#issuecomment-1131827928 thanks, merging to master/3.3! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] cloud-fan commented on a diff in pull request #35850: [SPARK-38529][SQL] Prevent GeneratorNestedColumnAliasing to be applied to non-Explode generators

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #35850: URL: https://github.com/apache/spark/pull/35850#discussion_r877169574 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala: ## @@ -321,6 +321,38 @@ object GeneratorNestedColumnAliasing {

[GitHub] [spark] cloud-fan commented on a diff in pull request #36295: [SPARK-38978][SQL] Support push down OFFSET to JDBC data source V2

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #36295: URL: https://github.com/apache/spark/pull/36295#discussion_r877141680 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -203,6 +204,245 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with Expl

[GitHub] [spark] cloud-fan commented on a diff in pull request #36295: [SPARK-38978][SQL] Support push down OFFSET to JDBC data source V2

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #36295: URL: https://github.com/apache/spark/pull/36295#discussion_r877141680 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -203,6 +204,245 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with Expl

[GitHub] [spark] cloud-fan commented on a diff in pull request #36593: [SPARK-39139][SQL] DS V2 push-down framework supports DS V2 UDF

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #36593: URL: https://github.com/apache/spark/pull/36593#discussion_r877123725 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCCatalog.scala: ## @@ -32,11 +35,14 @@ import org.apache.spark.sql.jdbc.{JdbcDialect,

[GitHub] [spark] cloud-fan commented on a diff in pull request #36593: [SPARK-39139][SQL] DS V2 push-down framework supports DS V2 UDF

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #36593: URL: https://github.com/apache/spark/pull/36593#discussion_r877113355 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -201,6 +203,14 @@ class V2ExpressionBuilder( None }

[GitHub] [spark] cloud-fan commented on a diff in pull request #36593: [SPARK-39139][SQL] DS V2 push-down framework supports DS V2 UDF

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #36593: URL: https://github.com/apache/spark/pull/36593#discussion_r877113355 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -201,6 +203,14 @@ class V2ExpressionBuilder( None }

[GitHub] [spark] cloud-fan commented on a diff in pull request #36593: [SPARK-39139][SQL] DS V2 push-down framework supports DS V2 UDF

2022-05-19 Thread GitBox
cloud-fan commented on code in PR #36593: URL: https://github.com/apache/spark/pull/36593#discussion_r877115137 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala: ## @@ -744,6 +744,14 @@ object DataSourceStrategy PushableColu

  1   2   >