[GitHub] [spark] sadikovi opened a new pull request, #37653: [SPARK-40215][SQL] Add SQL configs to control CSV/JSON date and timestamp parsing behaviour

2022-08-24 Thread GitBox
sadikovi opened a new pull request, #37653: URL: https://github.com/apache/spark/pull/37653 ### What changes were proposed in this pull request? This is a follow-up for [SPARK-39731](https://issues.apache.org/jira/browse/SPARK-39731) and PR

[GitHub] [spark] HeartSaVioR closed pull request #37474: [SPARK-40039][SS] Introducing a streaming checkpoint file manager based on Hadoop's Abortable interface

2022-08-24 Thread GitBox
HeartSaVioR closed pull request #37474: [SPARK-40039][SS] Introducing a streaming checkpoint file manager based on Hadoop's Abortable interface URL: https://github.com/apache/spark/pull/37474 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HeartSaVioR commented on pull request #37474: [SPARK-40039][SS] Introducing a streaming checkpoint file manager based on Hadoop's Abortable interface

2022-08-24 Thread GitBox
HeartSaVioR commented on PR #37474: URL: https://github.com/apache/spark/pull/37474#issuecomment-1226802090 Looks like Steve's comment has addressed. Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] yangwwei closed pull request #37622: [SPARK-40187][DOCS] Add `Apache YuniKorn` scheduler docs

2022-08-24 Thread GitBox
yangwwei closed pull request #37622: [SPARK-40187][DOCS] Add `Apache YuniKorn` scheduler docs URL: https://github.com/apache/spark/pull/37622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37652: [SPARK-40214][PYTHON][SQL] add 'get' to functions

2022-08-24 Thread GitBox
HyukjinKwon commented on code in PR #37652: URL: https://github.com/apache/spark/pull/37652#discussion_r954490084 ## python/pyspark/sql/functions.py: ## @@ -4845,6 +4849,73 @@ def element_at(col: "ColumnOrName", extraction: Any) -> Column: return

[GitHub] [spark] LuciferYang commented on a diff in pull request #37624: [SPARK-40186][CORE][YARN] Ensure `mergedShuffleCleaner` have been shutdown before `db` close

2022-08-24 Thread GitBox
LuciferYang commented on code in PR #37624: URL: https://github.com/apache/spark/pull/37624#discussion_r954489073 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -795,13 +796,34 @@ public void registerExecutor(String

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37652: [SPARK-40214][PYTHON][SQL] add 'get' to functions

2022-08-24 Thread GitBox
HyukjinKwon commented on code in PR #37652: URL: https://github.com/apache/spark/pull/37652#discussion_r954489472 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -3958,6 +3958,26 @@ object functions { ElementAt(column.expr, lit(value).expr) } +

[GitHub] [spark] LuciferYang commented on a diff in pull request #37624: [SPARK-40186][CORE][YARN] Ensure `mergedShuffleCleaner` have been shutdown before `db` close

2022-08-24 Thread GitBox
LuciferYang commented on code in PR #37624: URL: https://github.com/apache/spark/pull/37624#discussion_r954489073 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -795,13 +796,34 @@ public void registerExecutor(String

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36056: [SPARK-36571][SQL] Add an SQLOverwriteHadoopMapReduceCommitProtocol to support all SQL overwrite write data to staging dir

2022-08-24 Thread GitBox
dongjoon-hyun commented on code in PR #36056: URL: https://github.com/apache/spark/pull/36056#discussion_r954488764 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SQLOverwriteHadoopMapReduceCommitProtocolSuite.scala: ## @@ -0,0 +1,208 @@ +/* + * Licensed

[GitHub] [spark] dongjoon-hyun commented on pull request #36056: [SPARK-36571][SQL] Add an SQLOverwriteHadoopMapReduceCommitProtocol to support all SQL overwrite write data to staging dir

2022-08-24 Thread GitBox
dongjoon-hyun commented on PR #36056: URL: https://github.com/apache/spark/pull/36056#issuecomment-1226758074 Thank you, @AngersZh . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36961: [SPARK-39562][SQL][TESTS] Make `hive-thriftserver` module unit tests pass in IPv6 env

2022-08-24 Thread GitBox
dongjoon-hyun commented on code in PR #36961: URL: https://github.com/apache/spark/pull/36961#discussion_r954488030 ## sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala: ## @@ -1189,6 +1189,7 @@ abstract class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37650: [SPARK-40210][PYTHON][CORE] Fix math atan2, hypot, pow and pmod float argument call

2022-08-24 Thread GitBox
HyukjinKwon commented on code in PR #37650: URL: https://github.com/apache/spark/pull/37650#discussion_r954487181 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -1733,6 +1733,19 @@ object functions { */ def atan2(yValue: Double, xName: String):

[GitHub] [spark] otterc commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-08-24 Thread GitBox
otterc commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r954486898 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -282,13 +286,19 @@ private[spark] class DAGScheduler( None } - // Use

[GitHub] [spark] HyukjinKwon commented on pull request #37651: [SPARK-40213][SQL] Support ASCII value conversion for Latin-1 characters

2022-08-24 Thread GitBox
HyukjinKwon commented on PR #37651: URL: https://github.com/apache/spark/pull/37651#issuecomment-1226755439 Merged to master, branch-3.3, branch-3.2 and branch-3.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon closed pull request #37651: [SPARK-40213][SQL] Support ASCII value conversion for Latin-1 characters

2022-08-24 Thread GitBox
HyukjinKwon closed pull request #37651: [SPARK-40213][SQL] Support ASCII value conversion for Latin-1 characters URL: https://github.com/apache/spark/pull/37651 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng opened a new pull request, #37652: [SPARK-40214][PYTHON][SQL] add 'get' to functions

2022-08-24 Thread GitBox
zhengruifeng opened a new pull request, #37652: URL: https://github.com/apache/spark/pull/37652 ### What changes were proposed in this pull request? expose `get` to dataframe functions ### Why are the changes needed? for function parity ### Does this PR introduce

[GitHub] [spark] MaxGekk commented on pull request #37649: [SPARK-40209][SQL] Don't change the interval value of Decimal in `changePrecision()` on errors

2022-08-24 Thread GitBox
MaxGekk commented on PR #37649: URL: https://github.com/apache/spark/pull/37649#issuecomment-1226744505 > This change seems to break the design. The bad design is to leave a Decimal value in inconsistent/wrong state between invokes of Decimal methods. Seems like the current

[GitHub] [spark] wangyum commented on a diff in pull request #37565: [SPARK-40137][SQL] Combines limits after projection

2022-08-24 Thread GitBox
wangyum commented on code in PR #37565: URL: https://github.com/apache/spark/pull/37565#discussion_r954475934 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -769,6 +769,16 @@ object LimitPushDown extends Rule[LogicalPlan] { //

[GitHub] [spark] ulysses-you commented on a diff in pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE

2022-08-24 Thread GitBox
ulysses-you commented on code in PR #37612: URL: https://github.com/apache/spark/pull/37612#discussion_r954473375 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala: ## @@ -28,16 +28,31 @@ object AQEUtils { def getRequiredDistribution(p:

[GitHub] [spark] wankunde commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-08-24 Thread GitBox
wankunde commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r954467609 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2259,37 +2259,51 @@ private[spark] class DAGScheduler( }

[GitHub] [spark] wankunde commented on pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-08-24 Thread GitBox
wankunde commented on PR #37533: URL: https://github.com/apache/spark/pull/37533#issuecomment-1226729326 Hi, @otterc @mridulm , I updated the code, could you help to review the new code? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] LuciferYang commented on a diff in pull request #37610: [SPARK-38888][BUILD][CORE][YARN][DOCS] Add `RocksDB` support for shuffle state store

2022-08-24 Thread GitBox
LuciferYang commented on code in PR #37610: URL: https://github.com/apache/spark/pull/37610#discussion_r954458914 ## docs/configuration.md: ## @@ -1007,6 +1007,28 @@ Apart from these, the following properties are also available, and may be useful 3.3.0 + +

[GitHub] [spark] cloud-fan commented on a diff in pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE

2022-08-24 Thread GitBox
cloud-fan commented on code in PR #37612: URL: https://github.com/apache/spark/pull/37612#discussion_r954458141 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala: ## @@ -28,16 +28,31 @@ object AQEUtils { def getRequiredDistribution(p:

[GitHub] [spark] cloud-fan commented on a diff in pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE

2022-08-24 Thread GitBox
cloud-fan commented on code in PR #37612: URL: https://github.com/apache/spark/pull/37612#discussion_r954457377 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala: ## @@ -28,16 +28,31 @@ object AQEUtils { def getRequiredDistribution(p:

[GitHub] [spark] mridulm commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-08-24 Thread GitBox
mridulm commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r954454349 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2259,37 +2259,51 @@ private[spark] class DAGScheduler( }

[GitHub] [spark] cloud-fan commented on a diff in pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE

2022-08-24 Thread GitBox
cloud-fan commented on code in PR #37612: URL: https://github.com/apache/spark/pull/37612#discussion_r954456484 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala: ## @@ -51,6 +51,11 @@ case class InsertAdaptiveSparkPlan( case

[GitHub] [spark] cloud-fan commented on a diff in pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE

2022-08-24 Thread GitBox
cloud-fan commented on code in PR #37612: URL: https://github.com/apache/spark/pull/37612#discussion_r954456371 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala: ## @@ -51,6 +51,11 @@ case class InsertAdaptiveSparkPlan( case

[GitHub] [spark] mridulm commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-08-24 Thread GitBox
mridulm commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r954454349 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2259,37 +2259,51 @@ private[spark] class DAGScheduler( }

[GitHub] [spark] mridulm commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-08-24 Thread GitBox
mridulm commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r954454349 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2259,37 +2259,51 @@ private[spark] class DAGScheduler( }

[GitHub] [spark] HyukjinKwon closed pull request #37642: [SPARK-40202][PYTHON][SQL] Allow a dictionary in SparkSession.config in PySpark

2022-08-24 Thread GitBox
HyukjinKwon closed pull request #37642: [SPARK-40202][PYTHON][SQL] Allow a dictionary in SparkSession.config in PySpark URL: https://github.com/apache/spark/pull/37642 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] cloud-fan commented on a diff in pull request #37483: [SPARK-40112][SQL] Improve the TO_BINARY() function

2022-08-24 Thread GitBox
cloud-fan commented on code in PR #37483: URL: https://github.com/apache/spark/pull/37483#discussion_r954455653 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2487,59 +2538,117 @@ case class Encode(value: Expression,

[GitHub] [spark] HyukjinKwon commented on pull request #37642: [SPARK-40202][PYTHON][SQL] Allow a dictionary in SparkSession.config in PySpark

2022-08-24 Thread GitBox
HyukjinKwon commented on PR #37642: URL: https://github.com/apache/spark/pull/37642#issuecomment-1226711074 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on a diff in pull request #37610: [SPARK-38888][BUILD][CORE][YARN][DOCS] Add `RocksDB` support for shuffle state store

2022-08-24 Thread GitBox
LuciferYang commented on code in PR #37610: URL: https://github.com/apache/spark/pull/37610#discussion_r954455185 ## docs/configuration.md: ## @@ -1007,6 +1007,28 @@ Apart from these, the following properties are also available, and may be useful 3.3.0 + +

[GitHub] [spark] LuciferYang commented on a diff in pull request #37610: [SPARK-38888][BUILD][CORE][YARN][DOCS] Add `RocksDB` support for shuffle state store

2022-08-24 Thread GitBox
LuciferYang commented on code in PR #37610: URL: https://github.com/apache/spark/pull/37610#discussion_r954455185 ## docs/configuration.md: ## @@ -1007,6 +1007,28 @@ Apart from these, the following properties are also available, and may be useful 3.3.0 + +

[GitHub] [spark] cloud-fan commented on a diff in pull request #37483: [SPARK-40112][SQL] Improve the TO_BINARY() function

2022-08-24 Thread GitBox
cloud-fan commented on code in PR #37483: URL: https://github.com/apache/spark/pull/37483#discussion_r954454856 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2487,59 +2538,117 @@ case class Encode(value: Expression,

[GitHub] [spark] mridulm commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-08-24 Thread GitBox
mridulm commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r954454349 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2259,37 +2259,51 @@ private[spark] class DAGScheduler( }

[GitHub] [spark] ulysses-you commented on a diff in pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE

2022-08-24 Thread GitBox
ulysses-you commented on code in PR #37612: URL: https://github.com/apache/spark/pull/37612#discussion_r954454246 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala: ## @@ -28,16 +28,28 @@ object AQEUtils { def getRequiredDistribution(p:

[GitHub] [spark] LuciferYang commented on a diff in pull request #37610: [SPARK-38888][BUILD][CORE][YARN][DOCS] Add `RocksDB` support for shuffle state store

2022-08-24 Thread GitBox
LuciferYang commented on code in PR #37610: URL: https://github.com/apache/spark/pull/37610#discussion_r954453611 ## docs/configuration.md: ## @@ -1007,6 +1007,28 @@ Apart from these, the following properties are also available, and may be useful 3.3.0 + +

[GitHub] [spark] cloud-fan commented on pull request #37631: [SPARK-40194][SQL] SPLIT function on empty regex should truncate trailing empty string.

2022-08-24 Thread GitBox
cloud-fan commented on PR #37631: URL: https://github.com/apache/spark/pull/37631#issuecomment-1226707879 This seems a bit inconsistent. According to the function doc, `-1` means no limit, and it's confusing why no limit is different from a large enough limit (which means no limit as

[GitHub] [spark] cloud-fan closed pull request #37488: [SPARK-40055][SQL] listCatalogs should also return spark_catalog even when spark_catalog implementation is defaultSessionCatalog

2022-08-24 Thread GitBox
cloud-fan closed pull request #37488: [SPARK-40055][SQL] listCatalogs should also return spark_catalog even when spark_catalog implementation is defaultSessionCatalog URL: https://github.com/apache/spark/pull/37488 -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] cloud-fan commented on pull request #37488: [SPARK-40055][SQL] listCatalogs should also return spark_catalog even when spark_catalog implementation is defaultSessionCatalog

2022-08-24 Thread GitBox
cloud-fan commented on PR #37488: URL: https://github.com/apache/spark/pull/37488#issuecomment-1226699453 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE

2022-08-24 Thread GitBox
cloud-fan commented on code in PR #37612: URL: https://github.com/apache/spark/pull/37612#discussion_r95592 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEUtils.scala: ## @@ -28,16 +28,28 @@ object AQEUtils { def getRequiredDistribution(p:

[GitHub] [spark] cloud-fan commented on a diff in pull request #36961: [SPARK-39562][SQL][TESTS] Make `hive-thriftserver` module unit tests pass in IPv6 env

2022-08-24 Thread GitBox
cloud-fan commented on code in PR #36961: URL: https://github.com/apache/spark/pull/36961#discussion_r954434975 ## sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala: ## @@ -1189,6 +1189,7 @@ abstract class

[GitHub] [spark] cloud-fan commented on pull request #36995: [SPARK-39607][SQL][DSV2] Distribution and ordering support V2 function in writing

2022-08-24 Thread GitBox
cloud-fan commented on PR #36995: URL: https://github.com/apache/spark/pull/36995#issuecomment-1226676187 > This means it now relies on Spark's hash function for bucketing though, which could be different from other engines. Let's think about it this way: The v2 data source only

[GitHub] [spark] ulysses-you commented on pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE

2022-08-24 Thread GitBox
ulysses-you commented on PR #37612: URL: https://github.com/apache/spark/pull/37612#issuecomment-1226672854 It should be clear now. This pr only did two things: 1. Only apply AQE for the children of DeserializeToObjectExec 2. Check all partitioning for requiredDistribution so we

[GitHub] [spark] cloud-fan commented on a diff in pull request #37649: [SPARK-40209][SQL] Don't change the interval value of Decimal in `changePrecision()` on errors

2022-08-24 Thread GitBox
cloud-fan commented on code in PR #37649: URL: https://github.com/apache/spark/pull/37649#discussion_r954423097 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala: ## @@ -394,48 +394,49 @@ final class Decimal extends Ordered[Decimal] with Serializable {

[GitHub] [spark] cloud-fan commented on pull request #37649: [SPARK-40209][SQL] Don't change the interval value of Decimal in `changePrecision()` on errors

2022-08-24 Thread GitBox
cloud-fan commented on PR #37649: URL: https://github.com/apache/spark/pull/37649#issuecomment-1226667234 > This change seems to break the design. I don't think so. Having side effects for better performance is OK, but it doesn't mean we can leave the decimal value in a "crashed"

[GitHub] [spark] ulysses-you commented on a diff in pull request #37612: [SPARK-39915][SQL] Ensure the output partitioning is user-specified in AQE

2022-08-24 Thread GitBox
ulysses-you commented on code in PR #37612: URL: https://github.com/apache/spark/pull/37612#discussion_r954422378 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ValidateSparkPlan.scala: ## @@ -39,30 +45,41 @@ object ValidateSparkPlan extends Rule[SparkPlan]

[GitHub] [spark] cloud-fan commented on a diff in pull request #37544: [SPARK-40110][SQL][TESTS] Add JDBCWithAQESuite

2022-08-24 Thread GitBox
cloud-fan commented on code in PR #37544: URL: https://github.com/apache/spark/pull/37544#discussion_r954419078 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala: ## @@ -44,7 +45,8 @@ import org.apache.spark.sql.test.SharedSparkSession import

[GitHub] [spark] linhongliu-db commented on pull request #37651: [SPARK-40213][SQL] Support ASCII value conversion for Latin-1 characters

2022-08-24 Thread GitBox
linhongliu-db commented on PR #37651: URL: https://github.com/apache/spark/pull/37651#issuecomment-1226660283 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] linhongliu-db opened a new pull request, #37651: [SPARK-40213][SQL] Support ASCII value conversion for Latin-1 characters

2022-08-24 Thread GitBox
linhongliu-db opened a new pull request, #37651: URL: https://github.com/apache/spark/pull/37651 ### What changes were proposed in this pull request? This PR proposes to support ASCII value conversion for Latin-1 Supplement characters. ### Why are the changes needed? `ascii()`

[GitHub] [spark] kazuyukitanimura commented on a diff in pull request #37544: [SPARK-40110][SQL][TESTS] Add JDBCWithAQESuite

2022-08-24 Thread GitBox
kazuyukitanimura commented on code in PR #37544: URL: https://github.com/apache/spark/pull/37544#discussion_r954413572 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala: ## @@ -44,7 +45,8 @@ import org.apache.spark.sql.test.SharedSparkSession import

[GitHub] [spark] github-actions[bot] commented on pull request #36563: [SPARK-39194][SQL] Add a pre resolution builder for spark session extensions

2022-08-24 Thread GitBox
github-actions[bot] commented on PR #36563: URL: https://github.com/apache/spark/pull/36563#issuecomment-1226627592 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] amaliujia commented on pull request #37488: [SPARK-40055][SQL] listCatalogs should also return spark_catalog even when spark_catalog implementation is defaultSessionCatalog

2022-08-24 Thread GitBox
amaliujia commented on PR #37488: URL: https://github.com/apache/spark/pull/37488#issuecomment-1226591268 @cloud-fan I should have fixed DSv2 test failures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] tgravescs commented on a diff in pull request #37624: [SPARK-40186][CORE][YARN] Ensure `mergedShuffleCleaner` have been shutdown before `db` close

2022-08-24 Thread GitBox
tgravescs commented on code in PR #37624: URL: https://github.com/apache/spark/pull/37624#discussion_r954312802 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -795,13 +796,34 @@ public void registerExecutor(String

[GitHub] [spark] tgravescs commented on a diff in pull request #37624: [SPARK-40186][CORE][YARN] Ensure `mergedShuffleCleaner` have been shutdown before `db` close

2022-08-24 Thread GitBox
tgravescs commented on code in PR #37624: URL: https://github.com/apache/spark/pull/37624#discussion_r954309533 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -795,13 +796,34 @@ public void registerExecutor(String

[GitHub] [spark] gengliangwang commented on pull request #37649: [SPARK-40209][SQL] Don't change the interval value of Decimal in `changePrecision()` on errors

2022-08-24 Thread GitBox
gengliangwang commented on PR #37649: URL: https://github.com/apache/spark/pull/37649#issuecomment-1226225115 Spark's decimal has two methods by design: * changePrecision: update in place for better performance * toPrecision: make a copy and check if changing precision works

[GitHub] [spark] mridulm commented on a diff in pull request #37624: [SPARK-40186][CORE][YARN] Ensure `mergedShuffleCleaner` have been shutdown before `db` close

2022-08-24 Thread GitBox
mridulm commented on code in PR #37624: URL: https://github.com/apache/spark/pull/37624#discussion_r954250369 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -795,13 +796,34 @@ public void registerExecutor(String

[GitHub] [spark] mridulm commented on a diff in pull request #37624: [SPARK-40186][CORE][YARN] Ensure `mergedShuffleCleaner` have been shutdown before `db` close

2022-08-24 Thread GitBox
mridulm commented on code in PR #37624: URL: https://github.com/apache/spark/pull/37624#discussion_r954250369 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -795,13 +796,34 @@ public void registerExecutor(String

[GitHub] [spark] mridulm commented on a diff in pull request #37624: [SPARK-40186][CORE][YARN] Ensure `mergedShuffleCleaner` have been shutdown before `db` close

2022-08-24 Thread GitBox
mridulm commented on code in PR #37624: URL: https://github.com/apache/spark/pull/37624#discussion_r954250369 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -795,13 +796,34 @@ public void registerExecutor(String

[GitHub] [spark] khalidmammadov opened a new pull request, #37650: [SPARK-40210][PYTHON][CORE] Fix math atan2, hypot, pow and pmod float argument call

2022-08-24 Thread GitBox
khalidmammadov opened a new pull request, #37650: URL: https://github.com/apache/spark/pull/37650 ### What changes were proposed in this pull request? PySpark atan2, hypot, pow and pmod functions marked as accepting float type as argument but produce error when called together. For

[GitHub] [spark] MaxGekk commented on a diff in pull request #37620: [SPARK-40183][SQL] Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion

2022-08-24 Thread GitBox
MaxGekk commented on code in PR #37620: URL: https://github.com/apache/spark/pull/37620#discussion_r954236013 ## sql/core/src/test/resources/sql-tests/results/cast.sql.out: ## @@ -866,10 +866,10 @@ struct<> -- !query output org.apache.spark.SparkArithmeticException { -

[GitHub] [spark] otterc commented on a diff in pull request #37624: [SPARK-40186][CORE][YARN] Ensure `mergedShuffleCleaner` have been shutdown before `db` close

2022-08-24 Thread GitBox
otterc commented on code in PR #37624: URL: https://github.com/apache/spark/pull/37624#discussion_r954220998 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -795,13 +796,34 @@ public void registerExecutor(String

[GitHub] [spark] MaxGekk opened a new pull request, #37649: [WIP][SQL] Modify the interval value of decimal in `changePrecision` on errors only

2022-08-24 Thread GitBox
MaxGekk opened a new pull request, #37649: URL: https://github.com/apache/spark/pull/37649 ### What changes were proposed in this pull request? ### Why are the changes needed? To don't confuse users by error messages. That improves user experience with Spark SQL.

[GitHub] [spark] sunchao commented on pull request #36995: [SPARK-39607][SQL][DSV2] Distribution and ordering support V2 function in writing

2022-08-24 Thread GitBox
sunchao commented on PR #36995: URL: https://github.com/apache/spark/pull/36995#issuecomment-1226134733 In general, let's say a V2 transform function maps a key to a value, given a set of keys, the value space should always be <= the key space. Therefore, it's seems better for Spark to

[GitHub] [spark] sunchao commented on pull request #36995: [SPARK-39607][SQL][DSV2] Distribution and ordering support V2 function in writing

2022-08-24 Thread GitBox
sunchao commented on PR #36995: URL: https://github.com/apache/spark/pull/36995#issuecomment-1226108103 > How can we use this feature to implement bucket writing? We can use the expression (a v2 function) that calculates the bucket ID as the clustering expressions. Then Spark will make

[GitHub] [spark] vitaliili-db commented on a diff in pull request #37632: [SPARK-40197][SQL] Replace query plan with context for MULTI_VALUE_SUBQUERY_ERROR

2022-08-24 Thread GitBox
vitaliili-db commented on code in PR #37632: URL: https://github.com/apache/spark/pull/37632#discussion_r954176021 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -2071,9 +2071,14 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] vitaliili-db commented on pull request #37631: [SPARK-40194][SQL] SPLIT function on empty regex should truncate trailing empty string.

2022-08-24 Thread GitBox
vitaliili-db commented on PR #37631: URL: https://github.com/apache/spark/pull/37631#issuecomment-1226095179 @cloud-fan other databases don't have `limit` parameter in `split` function. In addition a second parameter is a string in other systems as opposed to regex in Spark. Closest method

[GitHub] [spark] sunchao commented on a diff in pull request #36995: [SPARK-39607][SQL][DSV2] Distribution and ordering support V2 function in writing

2022-08-24 Thread GitBox
sunchao commented on code in PR #36995: URL: https://github.com/apache/spark/pull/36995#discussion_r954156009 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DistributionAndOrderingUtils.scala: ## @@ -17,22 +17,33 @@ package

[GitHub] [spark] vitaliili-db commented on a diff in pull request #37483: [SPARK-40112][SQL] Improve the TO_BINARY() function

2022-08-24 Thread GitBox
vitaliili-db commented on code in PR #37483: URL: https://github.com/apache/spark/pull/37483#discussion_r954153222 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2487,59 +2538,117 @@ case class Encode(value: Expression,

[GitHub] [spark] vitaliili-db commented on a diff in pull request #37483: [SPARK-40112][SQL] Improve the TO_BINARY() function

2022-08-24 Thread GitBox
vitaliili-db commented on code in PR #37483: URL: https://github.com/apache/spark/pull/37483#discussion_r954150734 ## core/src/main/scala/org/apache/spark/SparkException.scala: ## @@ -287,14 +287,18 @@ private[spark] class SparkNoSuchMethodException( private[spark] class

[GitHub] [spark] huaxingao commented on pull request #37643: [SQL][SPARK-39528][FOLLOWUP] Make DynamicPartitionPruningV2FilterSuite extend DynamicPartitionPruningV2Suite

2022-08-24 Thread GitBox
huaxingao commented on PR #37643: URL: https://github.com/apache/spark/pull/37643#issuecomment-1226082727 Merged to master. Thanks @cloud-fan for your review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] huaxingao closed pull request #37643: [SQL][SPARK-39528][FOLLOWUP] Make DynamicPartitionPruningV2FilterSuite extend DynamicPartitionPruningV2Suite

2022-08-24 Thread GitBox
huaxingao closed pull request #37643: [SQL][SPARK-39528][FOLLOWUP] Make DynamicPartitionPruningV2FilterSuite extend DynamicPartitionPruningV2Suite URL: https://github.com/apache/spark/pull/37643 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] ueshin commented on pull request #37642: [SPARK-40202][PYTHON][SQL] Allow a dictionary in SparkSession.config in PySpark

2022-08-24 Thread GitBox
ueshin commented on PR #37642: URL: https://github.com/apache/spark/pull/37642#issuecomment-1226081447 We also need to add the overload definition: ```py @overload def config(self, *, map: Optional[Dict[str, "OptionalPrimitiveType"]]) -> "SparkSession.Builder":

[GitHub] [spark] vitaliili-db commented on a diff in pull request #37483: [SPARK-40112][SQL] Improve the TO_BINARY() function

2022-08-24 Thread GitBox
vitaliili-db commented on code in PR #37483: URL: https://github.com/apache/spark/pull/37483#discussion_r954146929 ## core/src/main/resources/error/error-classes.json: ## @@ -70,6 +70,11 @@ "Another instance of this query was just started by a concurrent session."

[GitHub] [spark] kevin85421 commented on a diff in pull request #37411: [SPARK-39984][CORE] Check workerLastHeartbeat with master before HeartbeatReceiver expires an executor

2022-08-24 Thread GitBox
kevin85421 commented on code in PR #37411: URL: https://github.com/apache/spark/pull/37411#discussion_r954138822 ## core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala: ## @@ -77,17 +77,61 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)

[GitHub] [spark] ueshin commented on a diff in pull request #37642: [SPARK-40202][PYTHON][SQL] Allow a dictionary in SparkSession.config in PySpark

2022-08-24 Thread GitBox
ueshin commented on code in PR #37642: URL: https://github.com/apache/spark/pull/37642#discussion_r954135287 ## python/pyspark/sql/session.py: ## @@ -202,6 +202,7 @@ def config( key: Optional[str] = None, value: Optional[Any] = None, conf:

[GitHub] [spark] gengliangwang commented on a diff in pull request #37632: [SPARK-40197][SQL] Replace query plan with context for MULTI_VALUE_SUBQUERY_ERROR

2022-08-24 Thread GitBox
gengliangwang commented on code in PR #37632: URL: https://github.com/apache/spark/pull/37632#discussion_r954138248 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -2071,9 +2071,14 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] gengliangwang commented on a diff in pull request #37632: [SPARK-40197][SQL] Replace query plan with context for MULTI_VALUE_SUBQUERY_ERROR

2022-08-24 Thread GitBox
gengliangwang commented on code in PR #37632: URL: https://github.com/apache/spark/pull/37632#discussion_r954128592 ## core/src/main/scala/org/apache/spark/SparkException.scala: ## @@ -29,7 +29,8 @@ class SparkException( cause: Throwable, errorClass: Option[String],

[GitHub] [spark] mridulm commented on a diff in pull request #37624: [SPARK-40186][CORE][YARN] Ensure `mergedShuffleCleaner` have been shutdown before `db` close

2022-08-24 Thread GitBox
mridulm commented on code in PR #37624: URL: https://github.com/apache/spark/pull/37624#discussion_r954127910 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -795,13 +796,34 @@ public void registerExecutor(String

[GitHub] [spark] xinrong-meng commented on a diff in pull request #37635: [WIP][SPARK-40131][PYTHON] Support NumPy ndarray in built-in functions

2022-08-24 Thread GitBox
xinrong-meng commented on code in PR #37635: URL: https://github.com/apache/spark/pull/37635#discussion_r954114300 ## python/pyspark/sql/types.py: ## @@ -2256,11 +2260,47 @@ def convert(self, obj: datetime.timedelta, gateway_client: GatewayClient) -> Jav ) +class

[GitHub] [spark] xinrong-meng commented on a diff in pull request #37635: [WIP][SPARK-40131][PYTHON] Support NumPy ndarray in built-in functions

2022-08-24 Thread GitBox
xinrong-meng commented on code in PR #37635: URL: https://github.com/apache/spark/pull/37635#discussion_r954113915 ## python/pyspark/sql/types.py: ## @@ -2256,11 +2260,47 @@ def convert(self, obj: datetime.timedelta, gateway_client: GatewayClient) -> Jav ) +class

[GitHub] [spark] xinrong-meng commented on a diff in pull request #37635: [WIP][SPARK-40131][PYTHON] Support NumPy ndarray in built-in functions

2022-08-24 Thread GitBox
xinrong-meng commented on code in PR #37635: URL: https://github.com/apache/spark/pull/37635#discussion_r954103989 ## python/pyspark/sql/types.py: ## @@ -2256,11 +2260,47 @@ def convert(self, obj: datetime.timedelta, gateway_client: GatewayClient) -> Jav ) +class

[GitHub] [spark] otterc commented on a diff in pull request #37624: [SPARK-40186][CORE][YARN] Ensure `mergedShuffleCleaner` have been shutdown before `db` close

2022-08-24 Thread GitBox
otterc commented on code in PR #37624: URL: https://github.com/apache/spark/pull/37624#discussion_r954102886 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -795,13 +796,34 @@ public void registerExecutor(String

[GitHub] [spark] otterc commented on a diff in pull request #37624: [SPARK-40186][CORE][YARN] Ensure `mergedShuffleCleaner` have been shutdown before `db` close

2022-08-24 Thread GitBox
otterc commented on code in PR #37624: URL: https://github.com/apache/spark/pull/37624#discussion_r954102886 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -795,13 +796,34 @@ public void registerExecutor(String

[GitHub] [spark] AmplabJenkins commented on pull request #37629: [SPARK-40160][PYTHON][DOCS] Make pyspark.broadcast examples self-contained

2022-08-24 Thread GitBox
AmplabJenkins commented on PR #37629: URL: https://github.com/apache/spark/pull/37629#issuecomment-1226041156 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] kevin85421 commented on a diff in pull request #37411: [SPARK-39984][CORE] Check workerLastHeartbeat with master before HeartbeatReceiver expires an executor

2022-08-24 Thread GitBox
kevin85421 commented on code in PR #37411: URL: https://github.com/apache/spark/pull/37411#discussion_r954098890 ## core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala: ## @@ -77,17 +77,61 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)

[GitHub] [spark] gengliangwang commented on a diff in pull request #37620: [SPARK-40183][SQL] Use error class NUMERIC_VALUE_OUT_OF_RANGE for overflow in decimal conversion

2022-08-24 Thread GitBox
gengliangwang commented on code in PR #37620: URL: https://github.com/apache/spark/pull/37620#discussion_r954096295 ## sql/core/src/test/resources/sql-tests/results/cast.sql.out: ## @@ -866,10 +866,10 @@ struct<> -- !query output org.apache.spark.SparkArithmeticException { -

[GitHub] [spark] otterc commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-08-24 Thread GitBox
otterc commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r954011984 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2259,37 +2259,51 @@ private[spark] class DAGScheduler( }

[GitHub] [spark] mridulm commented on a diff in pull request #37624: [SPARK-40186][CORE][YARN] Ensure `mergedShuffleCleaner` have been shutdown before `db` close

2022-08-24 Thread GitBox
mridulm commented on code in PR #37624: URL: https://github.com/apache/spark/pull/37624#discussion_r95402 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -795,13 +796,34 @@ public void registerExecutor(String

[GitHub] [spark] kevin85421 commented on a diff in pull request #37411: [SPARK-39984][CORE] Check workerLastHeartbeat with master before HeartbeatReceiver expires an executor

2022-08-24 Thread GitBox
kevin85421 commented on code in PR #37411: URL: https://github.com/apache/spark/pull/37411#discussion_r954074380 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2398,4 +2398,20 @@ package object config { .version("3.3.0") .intConf

[GitHub] [spark] mridulm commented on a diff in pull request #37610: [SPARK-38888][BUILD][CORE][YARN][DOCS] Add `RocksDB` support for shuffle state store

2022-08-24 Thread GitBox
mridulm commented on code in PR #37610: URL: https://github.com/apache/spark/pull/37610#discussion_r954071160 ## docs/configuration.md: ## @@ -1007,6 +1007,28 @@ Apart from these, the following properties are also available, and may be useful 3.3.0 + +

[GitHub] [spark] mridulm commented on a diff in pull request #37610: [SPARK-38888][BUILD][CORE][YARN][DOCS] Add `RocksDB` support for shuffle state store

2022-08-24 Thread GitBox
mridulm commented on code in PR #37610: URL: https://github.com/apache/spark/pull/37610#discussion_r954071160 ## docs/configuration.md: ## @@ -1007,6 +1007,28 @@ Apart from these, the following properties are also available, and may be useful 3.3.0 + +

[GitHub] [spark] mridulm commented on pull request #37528: [SPARK-40094][CORE] Send TaskEnd event when task failed with NotSerializableException or TaskOutputFileAlreadyExistException

2022-08-24 Thread GitBox
mridulm commented on PR #37528: URL: https://github.com/apache/spark/pull/37528#issuecomment-1226003959 Merged to master. Thanks for fixing this @wangshengjie123 ! Thanks for the review @Ngone51 :-) -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] mridulm closed pull request #37528: [SPARK-40094][CORE] Send TaskEnd event when task failed with NotSerializableException or TaskOutputFileAlreadyExistException

2022-08-24 Thread GitBox
mridulm closed pull request #37528: [SPARK-40094][CORE] Send TaskEnd event when task failed with NotSerializableException or TaskOutputFileAlreadyExistException URL: https://github.com/apache/spark/pull/37528 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] otterc commented on a diff in pull request #37624: [SPARK-40186][CORE][YARN] Ensure `mergedShuffleCleaner` have been shutdown before `db` close

2022-08-24 Thread GitBox
otterc commented on code in PR #37624: URL: https://github.com/apache/spark/pull/37624#discussion_r954026698 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -795,13 +796,34 @@ public void registerExecutor(String

[GitHub] [spark] cloud-fan commented on a diff in pull request #37483: [SPARK-40112][SQL] Improve the TO_BINARY() function

2022-08-24 Thread GitBox
cloud-fan commented on code in PR #37483: URL: https://github.com/apache/spark/pull/37483#discussion_r954014236 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2487,59 +2538,117 @@ case class Encode(value: Expression,

[GitHub] [spark] cloud-fan commented on a diff in pull request #37483: [SPARK-40112][SQL] Improve the TO_BINARY() function

2022-08-24 Thread GitBox
cloud-fan commented on code in PR #37483: URL: https://github.com/apache/spark/pull/37483#discussion_r954014236 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2487,59 +2538,117 @@ case class Encode(value: Expression,

[GitHub] [spark] otterc commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-08-24 Thread GitBox
otterc commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r954011984 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2259,37 +2259,51 @@ private[spark] class DAGScheduler( }

[GitHub] [spark] cloud-fan commented on a diff in pull request #37483: [SPARK-40112][SQL] Improve the TO_BINARY() function

2022-08-24 Thread GitBox
cloud-fan commented on code in PR #37483: URL: https://github.com/apache/spark/pull/37483#discussion_r954004766 ## core/src/main/scala/org/apache/spark/SparkException.scala: ## @@ -287,14 +287,18 @@ private[spark] class SparkNoSuchMethodException( private[spark] class

  1   2   3   >