date:20221206

[GitHub] [spark] wineternity commented on pull request #38702: [SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-12-06 Thread GitBox

wineternity commented on PR #38702: URL: https://github.com/apache/spark/pull/38702#issuecomment-1340463701 > The change looks good to me. +CC @Ngone51 > > Btw, do you also want to remove the `if (event.taskInfo == null) {` check in beginning of `onTaskEnd` ? > > Make it a prec

[GitHub] [spark] wankunde commented on a diff in pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox

wankunde commented on code in PR #38672: URL: https://github.com/apache/spark/pull/38672#discussion_r1041811602 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/LikeAnyBenchmark.scala: ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [spark] navinvishy commented on a diff in pull request #38947: [SPARK-41231][SQL] Adds an array_prepend function to catalyst

2022-12-06 Thread GitBox

navinvishy commented on code in PR #38947: URL: https://github.com/apache/spark/pull/38947#discussion_r1041817673 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -119,21 +117,24 @@ case class Size(child: Expression, lega

[GitHub] [spark] beliefer commented on a diff in pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox

beliefer commented on code in PR #38672: URL: https://github.com/apache/spark/pull/38672#discussion_r1041818122 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/LikeAnyBenchmark.scala: ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [spark] infoankitp commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox

infoankitp commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1041824037 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,133 @@ case class ArrayExcept(left: Expressi

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox

LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1041827496 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,133 @@ case class ArrayExcept(left: Express

[GitHub] [spark] LuciferYang commented on a diff in pull request #38865: [SPARK-41232][SQL][PYTHON] Adding array_append function

2022-12-06 Thread GitBox

LuciferYang commented on code in PR #38865: URL: https://github.com/apache/spark/pull/38865#discussion_r1041827973 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,133 @@ case class ArrayExcept(left: Express

[GitHub] [spark] zhengruifeng commented on pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-06 Thread GitBox

zhengruifeng commented on PR #38914: URL: https://github.com/apache/spark/pull/38914#issuecomment-1340493981 also cc @cloud-fan @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #38958: [SPARK-41433][CONNECT] Make Max Arrow BatchSize configurable

2022-12-06 Thread GitBox

zhengruifeng commented on PR #38958: URL: https://github.com/apache/spark/pull/38958#issuecomment-1340494395 cc @grundprinzip @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] LuciferYang opened a new pull request, #38960: [SPARK-41435][SQL] Make `curdate()` throw `WRONG_NUM_ARGS ` when args is not null

2022-12-06 Thread GitBox

LuciferYang opened a new pull request, #38960: URL: https://github.com/apache/spark/pull/38960 ### What changes were proposed in this pull request? `curdate()` throw `QueryCompilationErrors.invalidFunctionArgumentNumberError` with `Seq.empty` input when `expressions` is not empty, then

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox

huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041840529 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/colstats/ColumnStatistics.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [spark] zhengruifeng opened a new pull request, #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox

zhengruifeng opened a new pull request, #38961: URL: https://github.com/apache/spark/pull/38961 ### What changes were proposed in this pull request? Implement `collection` functions alphabetically, this PR contains `A` ~ `C` except: - aggregate, array_sort - need the support of L

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox

huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041840770 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/colstats/ColumnStatistics.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox

huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041840929 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/colstats/ColumnStatistics.java: ## @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox

huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041841092 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Statistics.java: ## @@ -31,4 +35,7 @@ public interface Statistics { OptionalLong sizeInBytes();

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox

zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041844991 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -63,6 +63,24 @@ class SparkConnectFunctionTests(SparkConnectFuncTestCase): """These test c

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox

zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041844991 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -63,6 +63,24 @@ class SparkConnectFunctionTests(SparkConnectFuncTestCase): """These test c

[GitHub] [spark] LuciferYang commented on a diff in pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-06 Thread GitBox

LuciferYang commented on code in PR #38874: URL: https://github.com/apache/spark/pull/38874#discussion_r1041845222 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4600,3 +4600,51 @@ case class ArrayExcept(left: Expressi

[GitHub] [spark] huaxingao commented on pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox

huaxingao commented on PR #38904: URL: https://github.com/apache/spark/pull/38904#issuecomment-1340511363 > Also curious how this is to be used by Spark The newly added `ColumnStatistics` is converted to logical `ColumnStat` in this [method](https://github.com/apache/spark/blob/0

[GitHub] [spark] huaxingao commented on a diff in pull request #38904: [SPARK-41378][SQL] Support Column Stats in DS v2

2022-12-06 Thread GitBox

huaxingao commented on code in PR #38904: URL: https://github.com/apache/spark/pull/38904#discussion_r1041841165 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala: ## @@ -294,7 +313,30 @@ abstract class InMemoryBaseTable( val ob

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox

zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041847513 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -413,6 +431,144 @@ def test_aggregation_functions(self): sdf.groupBy("a").agg(SF.p

[GitHub] [spark] jerrypeng commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox

jerrypeng commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041848355 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ + "TOPIC_PARTITIONS_IN_END_OFFSET_ARE_NOT_SAME_WITH_PREFETCHED"

[GitHub] [spark] jerrypeng commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox

jerrypeng commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041848809 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ + "TOPIC_PARTITIONS_IN_END_OFFSET_ARE_NOT_SAME_WITH_PREFETCHED"

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox

zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041844991 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -63,6 +63,24 @@ class SparkConnectFunctionTests(SparkConnectFuncTestCase): """These test c

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox

HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041856552 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ + "TOPIC_PARTITIONS_IN_END_OFFSET_ARE_NOT_SAME_WITH_PREFETCHE

[GitHub] [spark] LuciferYang commented on pull request #38874: [SPARK-41235][SQL][PYTHON]High-order function: array_compact implementation

2022-12-06 Thread GitBox

LuciferYang commented on PR #38874: URL: https://github.com/apache/spark/pull/38874#issuecomment-1340524315 > Thanks for reviewing this. @LuciferYang let me know when you think it's ready to go. @HyukjinKwon @zhengruifeng The Scala part is good to me, please further review, thanks ~

[GitHub] [spark] jerrypeng commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox

jerrypeng commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041856979 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class KafkaMicroBatchS

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox

HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041860498 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class KafkaMicroBatc

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox

HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041860498 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class KafkaMicroBatc

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox

HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041860498 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class KafkaMicroBatc

[GitHub] [spark] jerrypeng commented on pull request #38898: [SPARK-41375][SS] Avoid empty latest KafkaSourceOffset

2022-12-06 Thread GitBox

jerrypeng commented on PR #38898: URL: https://github.com/apache/spark/pull/38898#issuecomment-1340530153 @wecharyu can you run one batch and then delete all the partitions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] MaxGekk commented on a diff in pull request #38937: [SPARK-41406][SQL] Refactor error message for `NUM_COLUMNS_MISMATCH` to make it more generic

2022-12-06 Thread GitBox

MaxGekk commented on code in PR #38937: URL: https://github.com/apache/spark/pull/38937#discussion_r1041863502 ## sql/core/src/test/resources/sql-tests/results/except-all.sql.out: ## @@ -230,10 +230,9 @@ org.apache.spark.sql.AnalysisException { "errorClass" : "NUM_COLUMNS_MI

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox

HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041864100 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class KafkaMicroBatc

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox

HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041864100 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala: ## @@ -316,6 +320,50 @@ private[kafka010] class KafkaMicroBatc

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38961: [SPARK-41436][CONNECT][PYTHON] Implement `collection` functions: A~C

2022-12-06 Thread GitBox

zhengruifeng commented on code in PR #38961: URL: https://github.com/apache/spark/pull/38961#discussion_r1041847513 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -413,6 +431,144 @@ def test_aggregation_functions(self): sdf.groupBy("a").agg(SF.p

[GitHub] [spark] MaxGekk commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox

MaxGekk commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041866109 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ + "TOPIC_PARTITIONS_IN_END_OFFSET_ARE_NOT_SAME_WITH_PREFETCHED" :

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38911: [SPARK-41387][SS] Assert current end offset from Kafka data source for Trigger.AvailableNow

2022-12-06 Thread GitBox

HeartSaVioR commented on code in PR #38911: URL: https://github.com/apache/spark/pull/38911#discussion_r1041866576 ## connector/kafka-0-10-sql/src/test/resources/error/kafka-error-classes.json: ## @@ -0,0 +1,26 @@ +{ + "TOPIC_PARTITIONS_IN_END_OFFSET_ARE_NOT_SAME_WITH_PREFETCHE

[GitHub] [spark] jerrypeng commented on a diff in pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit

2022-12-06 Thread GitBox

jerrypeng commented on code in PR #38880: URL: https://github.com/apache/spark/pull/38880#discussion_r1041868304 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -310,6 +311,9 @@ class RocksDB( "checkpoint" -> checkpointTime

[GitHub] [spark] zhengruifeng closed pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-06 Thread GitBox

zhengruifeng closed pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions URL: https://github.com/apache/spark/pull/38914 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] cloud-fan commented on a diff in pull request #38942: [SPARK-41437][SQL] Do not optimize the input query twice for v1 write fallback

2022-12-06 Thread GitBox

cloud-fan commented on code in PR #38942: URL: https://github.com/apache/spark/pull/38942#discussion_r1041868772 ## sql/core/src/test/scala/org/apache/spark/sql/connector/V1WriteFallbackSuite.scala: ## @@ -132,17 +132,21 @@ class V1WriteFallbackSuite extends QueryTest with Shar

[GitHub] [spark] cloud-fan commented on pull request #38942: [SPARK-41437][SQL] Do not optimize the input query twice for v1 write fallback

2022-12-06 Thread GitBox

cloud-fan commented on PR #38942: URL: https://github.com/apache/spark/pull/38942#issuecomment-1340537896 cc @viirya @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] zhengruifeng commented on pull request #38914: [SPARK-41381][CONNECT][PYTHON] Implement `count_distinct` and `sum_distinct` functions

2022-12-06 Thread GitBox

zhengruifeng commented on PR #38914: URL: https://github.com/apache/spark/pull/38914#issuecomment-1340537857 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] grundprinzip commented on pull request #38879: [SPARK-41362][CONNECT][PYTHON] Better error messages for invalid argument types.

2022-12-06 Thread GitBox

grundprinzip commented on PR #38879: URL: https://github.com/apache/spark/pull/38879#issuecomment-1340539415 @HyukjinKwon @zhengruifeng @amaliujia more opinions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38880: [SPARK-38277][SS] Clear write batch after RocksDB state store's commit

2022-12-06 Thread GitBox

HeartSaVioR commented on code in PR #38880: URL: https://github.com/apache/spark/pull/38880#discussion_r1041871281 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -310,6 +311,9 @@ class RocksDB( "checkpoint" -> checkpointTi

[GitHub] [spark] beliefer opened a new pull request, #38962: [SPARK-40852][CONNECT][PYTHON] Add document for `DataFrame.summary`

2022-12-06 Thread GitBox

beliefer opened a new pull request, #38962: URL: https://github.com/apache/spark/pull/38962 ### What changes were proposed in this pull request? This PR adds document for `DataFrame.summary`. ### Why are the changes needed? This PR adds document for `DataFrame.summary`.

[GitHub] [spark] wankunde commented on pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox

wankunde commented on PR #38672: URL: https://github.com/apache/spark/pull/38672#issuecomment-1340543632 After `LikeSimplification`, the combination of multiple like expressions with `OR` can be pushdown to parquet reader, while `like any` can not. So close this PR. -- This is an auto

[GitHub] [spark] wankunde closed pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions

2022-12-06 Thread GitBox

wankunde closed pull request #38672: [SPARK-41159][SQL] Optimize like any and like all expressions URL: https://github.com/apache/spark/pull/38672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

< 1 2 3 4

301 - 347 of 347 matches

Mail list logo