[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37655: [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37655: URL: https://github.com/apache/spark/pull/37655#discussion_r978284666 ## sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql: ## @@ -57,3 +57,6 @@ SELECT k1, k2, avg(v) FROM (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) GROUP BY

[GitHub] [spark] beliefer commented on pull request #37977: [SPARK-37203][SQL][FOLLOWUP] Fix bug the buffer of AggregatingAccumulator will not be created if the input rows is empty

2022-09-22 Thread GitBox
beliefer commented on PR #37977: URL: https://github.com/apache/spark/pull/37977#issuecomment-1255823110 ping @MaxGekk cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37655: [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37655: URL: https://github.com/apache/spark/pull/37655#discussion_r978283059 ## sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql: ## @@ -57,3 +57,6 @@ SELECT k1, k2, avg(v) FROM (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) GROUP BY

[GitHub] [spark] beliefer opened a new pull request, #37977: [SPARK-37203][SQL][FOLLOWUP] Fix bug the buffer of AggregatingAccumulator will not be created if the input rows is empty

2022-09-22 Thread GitBox
beliefer opened a new pull request, #37977: URL: https://github.com/apache/spark/pull/37977 ### What changes were proposed in this pull request? When `AggregatingAccumulator` serialize aggregate buffer, may throwing NPE. There is one test case could repeat this error. ``` val

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37655: [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37655: URL: https://github.com/apache/spark/pull/37655#discussion_r978283629 ## sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql: ## @@ -57,3 +57,6 @@ SELECT k1, k2, avg(v) FROM (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) GROUP BY

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #37935: [SPARK-40492][SS] Do maintenance before streaming StateStore unload

2022-09-22 Thread GitBox
chaoqin-li1123 commented on code in PR #37935: URL: https://github.com/apache/spark/pull/37935#discussion_r978283545 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala: ## @@ -357,6 +357,75 @@ class StateStoreSuite extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37655: [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37655: URL: https://github.com/apache/spark/pull/37655#discussion_r978283059 ## sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql: ## @@ -57,3 +57,6 @@ SELECT k1, k2, avg(v) FROM (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) GROUP BY

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37655: [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37655: URL: https://github.com/apache/spark/pull/37655#discussion_r978283059 ## sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql: ## @@ -57,3 +57,6 @@ SELECT k1, k2, avg(v) FROM (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) GROUP BY

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37935: [SPARK-40492][SS] Do maintenance before streaming StateStore unload

2022-09-22 Thread GitBox
HeartSaVioR commented on code in PR #37935: URL: https://github.com/apache/spark/pull/37935#discussion_r978280322 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala: ## @@ -357,6 +357,75 @@ class StateStoreSuite extends

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #37935: [SPARK-40492][SS] Do maintenance before streaming StateStore unload

2022-09-22 Thread GitBox
chaoqin-li1123 commented on code in PR #37935: URL: https://github.com/apache/spark/pull/37935#discussion_r978273702 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala: ## @@ -357,6 +357,75 @@ class StateStoreSuite extends

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37935: [SPARK-40492][SS] Do maintenance before streaming StateStore unload

2022-09-22 Thread GitBox
HeartSaVioR commented on code in PR #37935: URL: https://github.com/apache/spark/pull/37935#discussion_r978271143 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala: ## @@ -357,6 +357,75 @@ class StateStoreSuite extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #37655: [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns

2022-09-22 Thread GitBox
cloud-fan commented on code in PR #37655: URL: https://github.com/apache/spark/pull/37655#discussion_r978270064 ## sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql: ## @@ -57,3 +57,6 @@ SELECT k1, k2, avg(v) FROM (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) GROUP BY

[GitHub] [spark] cloud-fan commented on a diff in pull request #37655: [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns

2022-09-22 Thread GitBox
cloud-fan commented on code in PR #37655: URL: https://github.com/apache/spark/pull/37655#discussion_r978269721 ## sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql: ## @@ -57,3 +57,6 @@ SELECT k1, k2, avg(v) FROM (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) GROUP BY

[GitHub] [spark] xiaonanyang-db commented on a diff in pull request #37933: [SPARK-40474][SQL] Correct CSV schema inference and data parsing behavior on columns with mixed dates and timestamps

2022-09-22 Thread GitBox
xiaonanyang-db commented on code in PR #37933: URL: https://github.com/apache/spark/pull/37933#discussion_r978269589 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala: ## @@ -2927,17 +2974,17 @@ abstract class CSVSuite }

[GitHub] [spark] cloud-fan commented on a diff in pull request #37933: [SPARK-40474][SQL] Correct CSV schema inference and data parsing behavior on columns with mixed dates and timestamps

2022-09-22 Thread GitBox
cloud-fan commented on code in PR #37933: URL: https://github.com/apache/spark/pull/37933#discussion_r978268428 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala: ## @@ -2927,17 +2974,17 @@ abstract class CSVSuite } check(

[GitHub] [spark] panbingkun commented on pull request #37941: [SPARK-40501][SQL] Add PushProjectionThroughLimit for Optimizer

2022-09-22 Thread GitBox
panbingkun commented on PR #37941: URL: https://github.com/apache/spark/pull/37941#issuecomment-1255800188 > Ah sorry I misread the code. Let's add this rule then. I think it's beneficial, as it kinds of "normalize" the order of project and limit operator, so that we can have more chances

[GitHub] [spark] warrenzhu25 commented on a diff in pull request #37924: [SPARK-40481][CORE] Ignore stage fetch failure caused by decommissioned executor

2022-09-22 Thread GitBox
warrenzhu25 commented on code in PR #37924: URL: https://github.com/apache/spark/pull/37924#discussion_r978254789 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2159,6 +2176,24 @@ private[spark] class DAGScheduler( } } + /** + *

[GitHub] [spark] itholic closed pull request #37966: [SPARK-40462][PYTHON] Support np.ndarray for `functions.lit`

2022-09-22 Thread GitBox
itholic closed pull request #37966: [SPARK-40462][PYTHON] Support np.ndarray for `functions.lit` URL: https://github.com/apache/spark/pull/37966 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] itholic commented on pull request #37966: [SPARK-40462][PYTHON] Support np.ndarray for `functions.lit`

2022-09-22 Thread GitBox
itholic commented on PR #37966: URL: https://github.com/apache/spark/pull/37966#issuecomment-1255781028 Just noticed that we already support this. Let me just close this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] beliefer commented on a diff in pull request #34474: [SPARK-37203][SQL] Fix NotSerializableException when observe with TypedImperativeAggregate

2022-09-22 Thread GitBox
beliefer commented on code in PR #34474: URL: https://github.com/apache/spark/pull/34474#discussion_r978252209 ## sql/core/src/main/scala/org/apache/spark/sql/execution/AggregatingAccumulator.scala: ## @@ -188,6 +197,17 @@ class AggregatingAccumulator private(

[GitHub] [spark] HeartSaVioR commented on pull request #37935: [SPARK-40492][SS] Do maintenance before streaming StateStore unload

2022-09-22 Thread GitBox
HeartSaVioR commented on PR #37935: URL: https://github.com/apache/spark/pull/37935#issuecomment-1255769799 I gave a feedback offline but also duplicate here for the history. `getStateStoreProvider` also report active provider instance and get “inactive provider instances” which

[GitHub] [spark] itholic commented on a diff in pull request #37966: [SPARK-40462][PYTHON] Support np.ndarray for `functions.lit`.

2022-09-22 Thread GitBox
itholic commented on code in PR #37966: URL: https://github.com/apache/spark/pull/37966#discussion_r978243957 ## python/pyspark/sql/functions.py: ## @@ -164,13 +166,24 @@ def lit(col: Any) -> Column: +--+ | [1, 2, 3]| +--+ + +

[GitHub] [spark] itholic commented on a diff in pull request #37966: [SPARK-40462][PYTHON] Support np.ndarray for `functions.lit`.

2022-09-22 Thread GitBox
itholic commented on code in PR #37966: URL: https://github.com/apache/spark/pull/37966#discussion_r978243957 ## python/pyspark/sql/functions.py: ## @@ -164,13 +166,24 @@ def lit(col: Any) -> Column: +--+ | [1, 2, 3]| +--+ + +

[GitHub] [spark] itholic commented on a diff in pull request #37966: [SPARK-40462][PYTHON] Support np.ndarray for `functions.lit`.

2022-09-22 Thread GitBox
itholic commented on code in PR #37966: URL: https://github.com/apache/spark/pull/37966#discussion_r978243957 ## python/pyspark/sql/functions.py: ## @@ -164,13 +166,24 @@ def lit(col: Any) -> Column: +--+ | [1, 2, 3]| +--+ + +

[GitHub] [spark] LuciferYang commented on a diff in pull request #37976: [DON'T MERGE][SQL][TESTS] Restore the file appender log level threshold of the hive UTs to info

2022-09-22 Thread GitBox
LuciferYang commented on code in PR #37976: URL: https://github.com/apache/spark/pull/37976#discussion_r978238657 ## sql/hive/src/test/resources/log4j2.properties: ## @@ -36,9 +36,9 @@ appender.file.fileName = target/unit-tests.log appender.file.layout.type = PatternLayout

[GitHub] [spark] ukby1234 commented on a diff in pull request #37960: [SPARK-39200][CORE] Make Fallback Storage readFully on content

2022-09-22 Thread GitBox
ukby1234 commented on code in PR #37960: URL: https://github.com/apache/spark/pull/37960#discussion_r978243212 ## core/src/test/java/org/apache/spark/storage/ReadPartialFileSystem.java: ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] ukby1234 commented on a diff in pull request #37960: [SPARK-39200][CORE] Make Fallback Storage readFully on content

2022-09-22 Thread GitBox
ukby1234 commented on code in PR #37960: URL: https://github.com/apache/spark/pull/37960#discussion_r978239504 ## core/src/test/java/org/apache/spark/storage/ReadPartialFileSystem.java: ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] LuciferYang commented on a diff in pull request #37976: [DON'T MERGE][SQL][TESTS] Restore the file appender log level threshold of the hive UTs to info

2022-09-22 Thread GitBox
LuciferYang commented on code in PR #37976: URL: https://github.com/apache/spark/pull/37976#discussion_r978239034 ## sql/hive/src/test/resources/log4j2.properties: ## @@ -36,9 +36,9 @@ appender.file.fileName = target/unit-tests.log appender.file.layout.type = PatternLayout

[GitHub] [spark] LuciferYang commented on a diff in pull request #37976: [DON'T MERGE][SQL][TESTS] Restore the file appender log level threshold of the hive UTs to info

2022-09-22 Thread GitBox
LuciferYang commented on code in PR #37976: URL: https://github.com/apache/spark/pull/37976#discussion_r978238657 ## sql/hive/src/test/resources/log4j2.properties: ## @@ -36,9 +36,9 @@ appender.file.fileName = target/unit-tests.log appender.file.layout.type = PatternLayout

[GitHub] [spark] LuciferYang opened a new pull request, #37976: [DON'T MERGE][SQL][TESTS] Restore the file appender log level threshold of the hive UTs to info

2022-09-22 Thread GitBox
LuciferYang opened a new pull request, #37976: URL: https://github.com/apache/spark/pull/37976 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] SandishKumarHN commented on pull request #37972: Protobuf support for Spark - from_proto AND to_proto

2022-09-22 Thread GitBox
SandishKumarHN commented on PR #37972: URL: https://github.com/apache/spark/pull/37972#issuecomment-1255755036 Build issues that are unrelated to PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] yabola commented on pull request #37779: [wip][SPARK-40320][Core] Executor should exit when it failed to initialize for fatal error

2022-09-22 Thread GitBox
yabola commented on PR #37779: URL: https://github.com/apache/spark/pull/37779#issuecomment-1255754778 @Ngone51 @mridulm Yeah~You are right. Do you have any idea to solve this problem? My idea is to just exit Executor as PR changed. -- This is an automated message from the Apache Git

[GitHub] [spark] zhengruifeng commented on a diff in pull request #37923: [SPARK-40334][PS] Implement `GroupBy.prod`

2022-09-22 Thread GitBox
zhengruifeng commented on code in PR #37923: URL: https://github.com/apache/spark/pull/37923#discussion_r978229729 ## python/pyspark/pandas/groupby.py: ## @@ -3237,10 +3337,10 @@ def _validate_agg_columns(self, numeric_only: Optional[bool], function_name: str if

[GitHub] [spark] HyukjinKwon closed pull request #37971: [MINOR][YARN][TESTS] Rename `logConfFile` in `BaseYarnClusterSuite` from `log4j.properties` to `log4j2.properties`

2022-09-22 Thread GitBox
HyukjinKwon closed pull request #37971: [MINOR][YARN][TESTS] Rename `logConfFile` in `BaseYarnClusterSuite` from `log4j.properties` to `log4j2.properties` URL: https://github.com/apache/spark/pull/37971 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] LuciferYang commented on pull request #37971: [MINOR][YARN][TESTS] Rename `logConfFile` in `BaseYarnClusterSuite` from `log4j.properties` to `log4j2.properties`

2022-09-22 Thread GitBox
LuciferYang commented on PR #37971: URL: https://github.com/apache/spark/pull/37971#issuecomment-1255743200 thanks @HyukjinKwon @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #37971: [MINOR][YARN][TESTS] Rename `logConfFile` in `BaseYarnClusterSuite` from `log4j.properties` to `log4j2.properties`

2022-09-22 Thread GitBox
HyukjinKwon commented on PR #37971: URL: https://github.com/apache/spark/pull/37971#issuecomment-1255742765 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] brkyvz commented on a diff in pull request #37933: [SPARK-40474][SQL] Correct CSV schema inference and data parsing behavior on columns with mixed dates and timestamps

2022-09-22 Thread GitBox
brkyvz commented on code in PR #37933: URL: https://github.com/apache/spark/pull/37933#discussion_r978224414 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala: ## @@ -59,6 +59,13 @@ class CSVInferSchema(val options: CSVOptions) extends

[GitHub] [spark] zhengruifeng opened a new pull request, #37975: [SPARK-40543][PS][SQL] Make `ddof` in `DataFrame.var` and `Series.var` accept arbitary integers

2022-09-22 Thread GitBox
zhengruifeng opened a new pull request, #37975: URL: https://github.com/apache/spark/pull/37975 ### What changes were proposed in this pull request? add a new `var` expression to support arbitary integeral `ddof` ### Why are the changes needed? for API coverage ###

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37933: [SPARK-40474][SQL] Correct CSV schema inference and data parsing behavior on columns with mixed dates and timestamps

2022-09-22 Thread GitBox
HyukjinKwon commented on code in PR #37933: URL: https://github.com/apache/spark/pull/37933#discussion_r978223404 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala: ## @@ -59,6 +59,13 @@ class CSVInferSchema(val options: CSVOptions) extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37960: [SPARK-39200][CORE] Make Fallback Storage readFully on content

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37960: URL: https://github.com/apache/spark/pull/37960#discussion_r978218342 ## core/src/test/java/org/apache/spark/storage/ReadPartialFileSystem.java: ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37960: [SPARK-39200][CORE] Make Fallback Storage readFully on content

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37960: URL: https://github.com/apache/spark/pull/37960#discussion_r978218342 ## core/src/test/java/org/apache/spark/storage/ReadPartialFileSystem.java: ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] zhengruifeng closed pull request #37974: [SPARK-40542][PS][SQL] Make `ddof` in `DataFrame.std` and `Series.std` accept arbitary integers

2022-09-22 Thread GitBox
zhengruifeng closed pull request #37974: [SPARK-40542][PS][SQL] Make `ddof` in `DataFrame.std` and `Series.std` accept arbitary integers URL: https://github.com/apache/spark/pull/37974 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37960: [SPARK-39200][CORE] Make Fallback Storage readFully on content

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37960: URL: https://github.com/apache/spark/pull/37960#discussion_r978218342 ## core/src/test/java/org/apache/spark/storage/ReadPartialFileSystem.java: ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] zhengruifeng commented on pull request #37974: [SPARK-40542][PS][SQL] Make `ddof` in `DataFrame.std` and `Series.std` accept arbitary integers

2022-09-22 Thread GitBox
zhengruifeng commented on PR #37974: URL: https://github.com/apache/spark/pull/37974#issuecomment-1255731663 Merged into master, thanks @HyukjinKwon for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] roczei commented on a diff in pull request #37679: [SPARK-35242][SQL] Support changing session catalog's default database

2022-09-22 Thread GitBox
roczei commented on code in PR #37679: URL: https://github.com/apache/spark/pull/37679#discussion_r978187565 ## sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala: ## @@ -148,13 +148,18 @@ private[sql] class SharedState( val externalCatalog =

[GitHub] [spark] sadikovi commented on pull request #37965: [SPARK-40527][SQL] Keep struct field names or map keys in CreateStruct

2022-09-22 Thread GitBox
sadikovi commented on PR #37965: URL: https://github.com/apache/spark/pull/37965#issuecomment-1255713486 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] mridulm commented on pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-09-22 Thread GitBox
mridulm commented on PR #37533: URL: https://github.com/apache/spark/pull/37533#issuecomment-1255711624 Merged to master. Thanks for working on this @wankunde ! Thanks for the review @otterc :-) -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] mridulm closed pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-09-22 Thread GitBox
mridulm closed pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow URL: https://github.com/apache/spark/pull/37533 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] mridulm commented on pull request #37899: [SPARK-40455][CORE]Abort result stage directly when it failed caused by FetchFailedException

2022-09-22 Thread GitBox
mridulm commented on PR #37899: URL: https://github.com/apache/spark/pull/37899#issuecomment-1255710180 If a result stage does not have pending partitions, it does not need to be aborted - since there are no partitions to be computed. If a result stage has pending partitions with an

[GitHub] [spark] HyukjinKwon closed pull request #37970: [SPARK-40531][BUILD] Upgrade zstd-jni from 1.5.2-3 to 1.5.2-4

2022-09-22 Thread GitBox
HyukjinKwon closed pull request #37970: [SPARK-40531][BUILD] Upgrade zstd-jni from 1.5.2-3 to 1.5.2-4 URL: https://github.com/apache/spark/pull/37970 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #37970: [SPARK-40531][BUILD] Upgrade zstd-jni from 1.5.2-3 to 1.5.2-4

2022-09-22 Thread GitBox
HyukjinKwon commented on PR #37970: URL: https://github.com/apache/spark/pull/37970#issuecomment-1255702675 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #37965: [SPARK-40527][SQL] Keep struct field names or map keys in CreateStruct

2022-09-22 Thread GitBox
HyukjinKwon closed pull request #37965: [SPARK-40527][SQL] Keep struct field names or map keys in CreateStruct URL: https://github.com/apache/spark/pull/37965 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #37965: [SPARK-40527][SQL] Keep struct field names or map keys in CreateStruct

2022-09-22 Thread GitBox
HyukjinKwon commented on PR #37965: URL: https://github.com/apache/spark/pull/37965#issuecomment-1255702378 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] roczei commented on a diff in pull request #37679: [SPARK-35242][SQL] Support changing session catalog's default database

2022-09-22 Thread GitBox
roczei commented on code in PR #37679: URL: https://github.com/apache/spark/pull/37679#discussion_r978187565 ## sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala: ## @@ -148,13 +148,18 @@ private[sql] class SharedState( val externalCatalog =

[GitHub] [spark] roczei commented on a diff in pull request #37679: [SPARK-35242][SQL] Support changing session catalog's default database

2022-09-22 Thread GitBox
roczei commented on code in PR #37679: URL: https://github.com/apache/spark/pull/37679#discussion_r978186734 ## sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala: ## @@ -148,13 +148,18 @@ private[sql] class SharedState( val externalCatalog =

[GitHub] [spark] roczei commented on a diff in pull request #37679: [SPARK-35242][SQL] Support changing session catalog's default database

2022-09-22 Thread GitBox
roczei commented on code in PR #37679: URL: https://github.com/apache/spark/pull/37679#discussion_r978186527 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala: ## @@ -43,7 +44,7 @@ class V2SessionCatalog(catalog: SessionCatalog)

[GitHub] [spark] roczei commented on a diff in pull request #37679: [SPARK-35242][SQL] Support changing session catalog's default database

2022-09-22 Thread GitBox
roczei commented on code in PR #37679: URL: https://github.com/apache/spark/pull/37679#discussion_r978186198 ## sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala: ## @@ -36,7 +36,7 @@ import org.apache.spark.SparkFunSuite import org.apache.spark.sql._

[GitHub] [spark] panbingkun commented on pull request #37970: [SPARK-40531][BUILD] Upgrade zstd-jni from 1.5.2-3 to 1.5.2-4

2022-09-22 Thread GitBox
panbingkun commented on PR #37970: URL: https://github.com/apache/spark/pull/37970#issuecomment-1255686053 > Can you retrigger the test? it seems unrelated OK,Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] github-actions[bot] commented on pull request #36234: [SPARK-38409][CORE] Do not export gauges with null values in prometheus metric snapshots

2022-09-22 Thread GitBox
github-actions[bot] commented on PR #36234: URL: https://github.com/apache/spark/pull/36234#issuecomment-1255683877 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #36279: [WIP][SPARK-38965][SHUFFLE]Optimize RemoteBlockPushResolver with a memory pool

2022-09-22 Thread GitBox
github-actions[bot] commented on PR #36279: URL: https://github.com/apache/spark/pull/36279#issuecomment-1255683861 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #36301: [SPARK-21697][SQL] NPE & ExceptionInInitializerError trying to load UDF from HDFS

2022-09-22 Thread GitBox
github-actions[bot] commented on PR #36301: URL: https://github.com/apache/spark/pull/36301#issuecomment-1255683848 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #36305: [SPARK-38987][shuffle] Handle fallback when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set to true

2022-09-22 Thread GitBox
github-actions[bot] commented on PR #36305: URL: https://github.com/apache/spark/pull/36305#issuecomment-1255683815 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #36304: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

2022-09-22 Thread GitBox
github-actions[bot] commented on PR #36304: URL: https://github.com/apache/spark/pull/36304#issuecomment-1255683830 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #36378: [SPARK-39022][SQL] Fix combination of HAVING and SORT not being resolved correctly

2022-09-22 Thread GitBox
github-actions[bot] commented on PR #36378: URL: https://github.com/apache/spark/pull/36378#issuecomment-1255683800 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #36453: SPARK-39103: SparkContext.addFiles trigger backend exception if it tr…

2022-09-22 Thread GitBox
github-actions[bot] closed pull request #36453: SPARK-39103: SparkContext.addFiles trigger backend exception if it tr… URL: https://github.com/apache/spark/pull/36453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] github-actions[bot] commented on pull request #36438: [SPARK-39092][SQL] Propagate Empty Partitions

2022-09-22 Thread GitBox
github-actions[bot] commented on PR #36438: URL: https://github.com/apache/spark/pull/36438#issuecomment-1255683790 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #36483: [SPARK-39126][SQL] After eliminating join to one side, that side should take advantage of LocalShuffleRead optimization

2022-09-22 Thread GitBox
github-actions[bot] closed pull request #36483: [SPARK-39126][SQL] After eliminating join to one side, that side should take advantage of LocalShuffleRead optimization URL: https://github.com/apache/spark/pull/36483 -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] github-actions[bot] closed pull request #36485: [SPARK-39128][SQL][HIVE] Log cost time for getting FileStatus in HadoopTableReader

2022-09-22 Thread GitBox
github-actions[bot] closed pull request #36485: [SPARK-39128][SQL][HIVE] Log cost time for getting FileStatus in HadoopTableReader URL: https://github.com/apache/spark/pull/36485 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] github-actions[bot] closed pull request #36495: [SPARK-39136][SQL] JDBCTable support table properties

2022-09-22 Thread GitBox
github-actions[bot] closed pull request #36495: [SPARK-39136][SQL] JDBCTable support table properties URL: https://github.com/apache/spark/pull/36495 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] github-actions[bot] closed pull request #36613: [WIP][SPARK-30983] Support typed select in Datasets up to the max tuple size

2022-09-22 Thread GitBox
github-actions[bot] closed pull request #36613: [WIP][SPARK-30983] Support typed select in Datasets up to the max tuple size URL: https://github.com/apache/spark/pull/36613 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] github-actions[bot] closed pull request #36665: [SPARK-39287][CORE] TaskSchedulerImpl should quickly ignore task finished event if its task was finished state.

2022-09-22 Thread GitBox
github-actions[bot] closed pull request #36665: [SPARK-39287][CORE] TaskSchedulerImpl should quickly ignore task finished event if its task was finished state. URL: https://github.com/apache/spark/pull/36665 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] github-actions[bot] closed pull request #36751: [WIP][SPARK-39366][CORE] Do not release write locks on task end.

2022-09-22 Thread GitBox
github-actions[bot] closed pull request #36751: [WIP][SPARK-39366][CORE] Do not release write locks on task end. URL: https://github.com/apache/spark/pull/36751 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] closed pull request #36668: [SPARK-39291][CORE] Fetch blocks and open stream should not respond a closed channel

2022-09-22 Thread GitBox
github-actions[bot] closed pull request #36668: [SPARK-39291][CORE] Fetch blocks and open stream should not respond a closed channel URL: https://github.com/apache/spark/pull/36668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] github-actions[bot] closed pull request #36678: [SPARK-39297][CORE][UI] bugfix: spark.ui.proxyBase contains proxy or history

2022-09-22 Thread GitBox
github-actions[bot] closed pull request #36678: [SPARK-39297][CORE][UI] bugfix: spark.ui.proxyBase contains proxy or history URL: https://github.com/apache/spark/pull/36678 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] github-actions[bot] closed pull request #36798: [SPARK-39408][SQL] Update the buildKeys for DynamicPruningSubquery.withNewPlan

2022-09-22 Thread GitBox
github-actions[bot] closed pull request #36798: [SPARK-39408][SQL] Update the buildKeys for DynamicPruningSubquery.withNewPlan URL: https://github.com/apache/spark/pull/36798 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] github-actions[bot] commented on pull request #36823: [SPARK-39429][SQL] Convert Inner Join With Aggregation to Semi Join

2022-09-22 Thread GitBox
github-actions[bot] commented on PR #36823: URL: https://github.com/apache/spark/pull/36823#issuecomment-1255683703 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #36856: [SPARK-39455][SQL] Improve expression non-codegen code path performance by cache data type matching

2022-09-22 Thread GitBox
github-actions[bot] commented on PR #36856: URL: https://github.com/apache/spark/pull/36856#issuecomment-1255683688 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] zhengruifeng opened a new pull request, #37974: [SPARK-40542][PS] Make `ddof` in `DataFrame.std` and `Series.std` accept arbitary integers

2022-09-22 Thread GitBox
zhengruifeng opened a new pull request, #37974: URL: https://github.com/apache/spark/pull/37974 ### What changes were proposed in this pull request? add a new `std` expression to support arbitary integral `ddof` ### Why are the changes needed? for API coverage ###

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37655: [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37655: URL: https://github.com/apache/spark/pull/37655#discussion_r978173963 ## sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql: ## @@ -57,3 +57,6 @@ SELECT k1, k2, avg(v) FROM (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) GROUP BY

[GitHub] [spark] zhengruifeng commented on pull request #37918: [SPARK-40476][ML][SQL] Reduce the shuffle size of ALS

2022-09-22 Thread GitBox
zhengruifeng commented on PR #37918: URL: https://github.com/apache/spark/pull/37918#issuecomment-1255672550 Thanks for the reviews! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37655: [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37655: URL: https://github.com/apache/spark/pull/37655#discussion_r978174565 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala: ## @@ -151,10 +151,15 @@ case class GroupingSets( override def

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37655: [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37655: URL: https://github.com/apache/spark/pull/37655#discussion_r978174266 ## sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql: ## @@ -57,3 +57,6 @@ SELECT k1, k2, avg(v) FROM (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) GROUP BY

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #37655: [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns

2022-09-22 Thread GitBox
dongjoon-hyun commented on code in PR #37655: URL: https://github.com/apache/spark/pull/37655#discussion_r978173963 ## sql/core/src/test/resources/sql-tests/inputs/grouping_set.sql: ## @@ -57,3 +57,6 @@ SELECT k1, k2, avg(v) FROM (VALUES (1,1,1),(2,2,2)) AS t(k1,k2,v) GROUP BY

[GitHub] [spark] mridulm commented on pull request #37779: [wip][SPARK-40320][Core] Executor should exit when it failed to initialize for fatal error

2022-09-22 Thread GitBox
mridulm commented on PR #37779: URL: https://github.com/apache/spark/pull/37779#issuecomment-1255648587 A few points: * Plugins are initialized as part of `Executor` construction, but after uncaught exception handler is set. * If anything other than `NonFatal` is thrown in

[GitHub] [spark] srowen commented on pull request #37970: [SPARK-40531][BUILD] Upgrade zstd-jni from 1.5.2-3 to 1.5.2-4

2022-09-22 Thread GitBox
srowen commented on PR #37970: URL: https://github.com/apache/spark/pull/37970#issuecomment-1255646568 Can you retrigger the test? it seems unrelated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] ukby1234 commented on a diff in pull request #37960: [SPARK-39200][CORE] Make Fallback Storage readFully on content

2022-09-22 Thread GitBox
ukby1234 commented on code in PR #37960: URL: https://github.com/apache/spark/pull/37960#discussion_r978122465 ## core/src/test/scala/org/apache/spark/storage/FallbackStorageSuite.scala: ## @@ -107,6 +106,51 @@ class FallbackStorageSuite extends SparkFunSuite with

[GitHub] [spark] ukby1234 commented on a diff in pull request #37960: [SPARK-39200][CORE] Make Fallback Storage readFully on content

2022-09-22 Thread GitBox
ukby1234 commented on code in PR #37960: URL: https://github.com/apache/spark/pull/37960#discussion_r978122320 ## core/src/test/scala/org/apache/spark/storage/FallbackStorageSuite.scala: ## @@ -107,6 +106,51 @@ class FallbackStorageSuite extends SparkFunSuite with

[GitHub] [spark] ukby1234 commented on a diff in pull request #37960: [SPARK-39200][CORE] Make Fallback Storage readFully on content

2022-09-22 Thread GitBox
ukby1234 commented on code in PR #37960: URL: https://github.com/apache/spark/pull/37960#discussion_r978121955 ## core/src/test/scala/org/apache/spark/storage/FallbackStorageSuite.scala: ## @@ -18,14 +18,11 @@ package org.apache.spark.storage import

[GitHub] [spark] viirya commented on a diff in pull request #37969: [SPARK-40530][SQL] Add error-related developer APIs

2022-09-22 Thread GitBox
viirya commented on code in PR #37969: URL: https://github.com/apache/spark/pull/37969#discussion_r978094495 ## core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala: ## @@ -137,22 +139,22 @@ class SparkThrowableSuite extends SparkFunSuite {

[GitHub] [spark] viirya commented on a diff in pull request #37969: [SPARK-40530][SQL] Add error-related developer APIs

2022-09-22 Thread GitBox
viirya commented on code in PR #37969: URL: https://github.com/apache/spark/pull/37969#discussion_r978093752 ## core/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

[GitHub] [spark] viirya commented on a diff in pull request #37969: [SPARK-40530][SQL] Add error-related developer APIs

2022-09-22 Thread GitBox
viirya commented on code in PR #37969: URL: https://github.com/apache/spark/pull/37969#discussion_r978065616 ## core/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

[GitHub] [spark] viirya commented on a diff in pull request #37969: [SPARK-40530][SQL] Add error-related developer APIs

2022-09-22 Thread GitBox
viirya commented on code in PR #37969: URL: https://github.com/apache/spark/pull/37969#discussion_r978064295 ## core/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

[GitHub] [spark] MaxGekk opened a new pull request, #37973: [WIP][SPARK-40540][SQL] Migrate compilation errors onto error classes

2022-09-22 Thread GitBox
MaxGekk opened a new pull request, #37973: URL: https://github.com/apache/spark/pull/37973 ### What changes were proposed in this pull request? In the PR, I propose to migrate all compilation errors onto temporary error classes with the prefix `_LEGACY_ERROR_TEMP_`. The error message

[GitHub] [spark] peter-toth commented on a diff in pull request #36027: [SPARK-38717][SQL] Handle Hive's bucket spec case preserving behaviour

2022-09-22 Thread GitBox
peter-toth commented on code in PR #36027: URL: https://github.com/apache/spark/pull/36027#discussion_r977992658 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -435,6 +439,14 @@ private[hive] class HiveClientImpl(

[GitHub] [spark] bluesmoon commented on a diff in pull request #16497: [SPARK-19118] [SQL] Percentile support for frequency distribution table

2022-09-22 Thread GitBox
bluesmoon commented on code in PR #16497: URL: https://github.com/apache/spark/pull/16497#discussion_r977961271 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala: ## @@ -44,22 +45,30 @@ import

[GitHub] [spark] shrprasa commented on pull request #37880: [SPARK-39399] [CORE] [K8S]: Fix proxy-user authentication for Spark on k8s in cluster deploy mode

2022-09-22 Thread GitBox
shrprasa commented on PR #37880: URL: https://github.com/apache/spark/pull/37880#issuecomment-1255386540 @gaborgsomogyi @dongjoon-hyun @HyukjinKwon Can anyone please review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SandishKumarHN opened a new pull request, #37972: Protobuf support for Spark - from_proto AND to_proto

2022-09-22 Thread GitBox
SandishKumarHN opened a new pull request, #37972: URL: https://github.com/apache/spark/pull/37972 From SandishKumarHN(sanysand...@gmail.com) and Mohan Parthasarathy(mposde...@gmail.com) # Introduction Protocol buffers are Google's language-neutral, platform-neutral, extensible

[GitHub] [spark] BryanCutler commented on pull request #35391: [SPARK-38098][PYTHON] Add support for ArrayType of nested StructType to arrow-based conversion

2022-09-22 Thread GitBox
BryanCutler commented on PR #35391: URL: https://github.com/apache/spark/pull/35391#issuecomment-1255314395 Tests passing now, merged to master. Thanks all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] BryanCutler closed pull request #35391: [SPARK-38098][PYTHON] Add support for ArrayType of nested StructType to arrow-based conversion

2022-09-22 Thread GitBox
BryanCutler closed pull request #35391: [SPARK-38098][PYTHON] Add support for ArrayType of nested StructType to arrow-based conversion URL: https://github.com/apache/spark/pull/35391 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] MaxGekk commented on a diff in pull request #37969: [SPARK-40530][SQL] Add error-related developer APIs

2022-09-22 Thread GitBox
MaxGekk commented on code in PR #37969: URL: https://github.com/apache/spark/pull/37969#discussion_r977867254 ## core/src/main/scala/org/apache/spark/SparkThrowableHelper.scala: ## @@ -178,9 +90,7 @@ private[spark] object SparkThrowableHelper { val errorSubClass =

  1   2   3   >