[GitHub] [spark] wineternity commented on pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-12-06 Thread GitBox
wineternity commented on PR #38702: URL: https://github.com/apache/spark/pull/38702#issuecomment-1340463701 > The change looks good to me. +CC @Ngone51 > > Btw, do you also want to remove the `if (event.taskInfo == null) {` check in beginning of `onTaskEnd` ? > > Make it a prec

[GitHub] [spark] wankunde commented on a diff in pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox
wankunde commented on code in PR #38672: URL: https://github.com/apache/spark/pull/38672#discussion_r1041811602 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/LikeAnyBenchmark.scala: ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41231][SQL] Adds an array_prepend function to catalyst

2022-12-06 Thread GitBox
navinvishy commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1041817673 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -119,21 +117,24 @@ case class Size(child: Expression, lega

[GitHub] [spark] beliefer commented on a diff in pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox
beliefer commented on code in PR #38672: URL: https://github.com/apache/spark/pull/38672#discussion_r1041818122 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/LikeAnyBenchmark.scala: ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [spark] infoankitp commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
infoankitp commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1041824037 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,133 @@ case class ArrayExcept(left: Expressi

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1041827496 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,133 @@ case class ArrayExcept(left: Express

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1041827973 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,133 @@ case class ArrayExcept(left: Express

[GitHub] [spark] zhengruifeng commented on pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38914: URL: https://github.com/apache/spark/pull/38914#issuecomment-1340493981 also cc @cloud-fan @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #38958: [SPARK-41433][CONNECT] Make Max Arrow BatchSize configurable

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38958: URL: https://github.com/apache/spark/pull/38958#issuecomment-1340494395 cc @grundprinzip @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] LuciferYang opened a new pull request, #38960: [SPARK-41435][SQL] Make `curdate()` throw `WRONG_NUM_ARGS ` when args is not null

2022-12-06 Thread GitBox
LuciferYang opened a new pull request, #38960: URL: https://github.com/apache/spark/pull/38960 ### What changes were proposed in this pull request? `curdate()` throw `QueryCompilationErrors.invalidFunctionArgumentNumberError` with `Seq.empty` input when `expressions` is not empty, then

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041840529 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/colstats/ColumnStatistics.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [spark] zhengruifeng opened a new pull request, #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox
zhengruifeng opened a new pull request, #38961: URL: https://github.com/apache/spark/pull/38961 ### What changes were proposed in this pull request? Implement `collection` functions alphabetically, this PR contains `A` ~ `C` except: - aggregate, array_sort - need the support of L

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041840770 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/colstats/ColumnStatistics.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041840929 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/colstats/ColumnStatistics.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041841092 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Statistics.java: ## @@ -31,4 +35,7 @@ public interface Statistics { OptionalLong sizeInBytes();

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041844991 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -63,6 +63,24 @@ class SparkConnectFunctionTests(SparkConnectFuncTestCase): """These test c

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041844991 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -63,6 +63,24 @@ class SparkConnectFunctionTests(SparkConnectFuncTestCase): """These test c

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-06 Thread GitBox
LuciferYang commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1041845222 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expressi

[GitHub] [spark] huaxingao commented on pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
huaxingao commented on PR #38904: URL: https://github.com/apache/spark/pull/38904#issuecomment-1340511363 > Also curious how this is to be used by Spark The newly added `ColumnStatistics` is converted to logical `ColumnStat` in this [method](https://github.com/apache/spark/blob/0

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox
huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041841165 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -294,7 +313,30 @@ abstract class InMemoryBaseTable( val ob

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041847513 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -413,6 +431,144 @@ def test_aggregation_functions(self): sdf.groupBy("a").agg(SF.p

[GitHub] [spark] jerrypeng commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
jerrypeng commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041848355 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ + "TOPIC_PARTITIONS_IN_END_OFFSET_ARE_NOT_SAME_WITH_PREFETCHED"

[GitHub] [spark] jerrypeng commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
jerrypeng commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041848809 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ + "TOPIC_PARTITIONS_IN_END_OFFSET_ARE_NOT_SAME_WITH_PREFETCHED"

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041844991 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -63,6 +63,24 @@ class SparkConnectFunctionTests(SparkConnectFuncTestCase): """These test c

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041856552 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ + "TOPIC_PARTITIONS_IN_END_OFFSET_ARE_NOT_SAME_WITH_PREFETCHE

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-06 Thread GitBox
LuciferYang commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1340524315 > Thanks for reviewing this. @LuciferYang let me know when you think it's ready to go. @HyukjinKwon @zhengruifeng The Scala part is good to me, please further review, thanks ~

[GitHub] [spark] jerrypeng commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
jerrypeng commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041856979 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class KafkaMicroBatchS

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041860498 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class KafkaMicroBatc

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041860498 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class KafkaMicroBatc

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041860498 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class KafkaMicroBatc

[GitHub] [spark] jerrypeng commented on pull request #38898: [SPARK-41375][SS] Avoid empty latest KafkaSourceOffset

2022-12-06 Thread GitBox
jerrypeng commented on PR #38898: URL: https://github.com/apache/spark/pull/38898#issuecomment-1340530153 @wecharyu can you run one batch and then delete all the partitions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] MaxGekk commented on a diff in pull request #38937: [SPARK-41406][SQL] Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic

2022-12-06 Thread GitBox
MaxGekk commented on code in PR #38937: URL: https://github.com/apache/spark/pull/38937#discussion_r1041863502 ## sql/core/src/test/resources/sql-tests/results/except-all.sql.out: ## @@ -230,10 +230,9 @@ org.apache.spark.sql.AnalysisException { "errorClass" : "NUM_COLUMNS_MI

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041864100 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class KafkaMicroBatc

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041864100 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class KafkaMicroBatc

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox
zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041847513 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -413,6 +431,144 @@ def test_aggregation_functions(self): sdf.groupBy("a").agg(SF.p

[GitHub] [spark] MaxGekk commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
MaxGekk commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041866109 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ + "TOPIC_PARTITIONS_IN_END_OFFSET_ARE_NOT_SAME_WITH_PREFETCHED" :

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041866576 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ + "TOPIC_PARTITIONS_IN_END_OFFSET_ARE_NOT_SAME_WITH_PREFETCHE

[GitHub] [spark] jerrypeng commented on a diff in pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit

2022-12-06 Thread GitBox
jerrypeng commented on code in PR #38880: URL: https://github.com/apache/spark/pull/38880#discussion_r1041868304 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -310,6 +311,9 @@ class RocksDB( "checkpoint" -> checkpointTime

[GitHub] [spark] zhengruifeng closed pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-06 Thread GitBox
zhengruifeng closed pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions URL: https://github.com/apache/spark/pull/38914 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] cloud-fan commented on a diff in pull request #38942: [SPARK-41437][SQL] Do not optimize the input query twice for v1 write fallback

2022-12-06 Thread GitBox
cloud-fan commented on code in PR #38942: URL: https://github.com/apache/spark/pull/38942#discussion_r1041868772 ## sql/core/src/test/scala/org/apache/spark/sql/connector/V1WriteFallbackSuite.scala: ## @@ -132,17 +132,21 @@ class V1WriteFallbackSuite extends QueryTest with Shar

[GitHub] [spark] cloud-fan commented on pull request #38942: [SPARK-41437][SQL] Do not optimize the input query twice for v1 write fallback

2022-12-06 Thread GitBox
cloud-fan commented on PR #38942: URL: https://github.com/apache/spark/pull/38942#issuecomment-1340537896 cc @viirya @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] zhengruifeng commented on pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-06 Thread GitBox
zhengruifeng commented on PR #38914: URL: https://github.com/apache/spark/pull/38914#issuecomment-1340537857 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] grundprinzip commented on pull request #38879: [SPARK-41362][CONNECT][PYTHON] Better error messages for invalid argument types.

2022-12-06 Thread GitBox
grundprinzip commented on PR #38879: URL: https://github.com/apache/spark/pull/38879#issuecomment-1340539415 @HyukjinKwon @zhengruifeng @amaliujia more opinions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit

2022-12-06 Thread GitBox
HeartSaVioR commented on code in PR #38880: URL: https://github.com/apache/spark/pull/38880#discussion_r1041871281 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -310,6 +311,9 @@ class RocksDB( "checkpoint" -> checkpointTi

[GitHub] [spark] beliefer opened a new pull request, #38962: [SPARK-40852][CONNECT][PYTHON] Add document for `DataFrame.summary`

2022-12-06 Thread GitBox
beliefer opened a new pull request, #38962: URL: https://github.com/apache/spark/pull/38962 ### What changes were proposed in this pull request? This PR adds document for `DataFrame.summary`. ### Why are the changes needed? This PR adds document for `DataFrame.summary`.

[GitHub] [spark] wankunde commented on pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox
wankunde commented on PR #38672: URL: https://github.com/apache/spark/pull/38672#issuecomment-1340543632 After `LikeSimplification`, the combination of multiple like expressions with `OR` can be pushdown to parquet reader, while `like any` can not. So close this PR. -- This is an auto

[GitHub] [spark] wankunde closed pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox
wankunde closed pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions URL: https://github.com/apache/spark/pull/38672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

<    1   2   3   4