[PR] [WIP][INFRA] Make dropdown for workflows/benchmark input parameters [spark]

2024-07-21 Thread via GitHub
panbingkun opened a new pull request, #47438: URL: https://github.com/apache/spark/pull/47438 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48958][BUILD] Upgrade `zstd-jni` to 1.5.6-4 [spark]

2024-07-21 Thread via GitHub
panbingkun commented on PR #47432: URL: https://github.com/apache/spark/pull/47432#issuecomment-2242170184 > ZStandardBenchmark Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-48957][SS] Return sub-classified error class on state store load for hdfs and rocksdb provider [spark]

2024-07-21 Thread via GitHub
anishshri-db commented on PR #47431: URL: https://github.com/apache/spark/pull/47431#issuecomment-2242146542 @HeartSaVioR - PTAL, thx ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-48958][BUILD] Upgrade `zstd-jni` to 1.5.6-4 [spark]

2024-07-21 Thread via GitHub
panbingkun commented on PR #47432: URL: https://github.com/apache/spark/pull/47432#issuecomment-2242140609 > ZStandardBenchmark org.apache.spark.io.ZStandardBenchmark JDK 17: JDK 21: -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] [SPARK-48961][PYTHON] Make the parameter naming of `PySparkException` consistent with JVM [spark]

2024-07-21 Thread via GitHub
itholic commented on code in PR #47436: URL: https://github.com/apache/spark/pull/47436#discussion_r1685984713 ## python/docs/source/development/logger.rst: ## @@ -52,7 +52,7 @@ Example log entry: "file": "/path/to/file.py", "line_no": "17", Review Comment:

Re: [PR] [SPARK-48752][FOLLOWUP][PYTHON][DOCS] Use explicit name for line number in log [spark]

2024-07-21 Thread via GitHub
itholic commented on PR #47437: URL: https://github.com/apache/spark/pull/47437#issuecomment-2242120624 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] [SPARK-48752][FOLLOWUP][PYTHON][DOCS] Use explicit name for line number in log [spark]

2024-07-21 Thread via GitHub
itholic opened a new pull request, #47437: URL: https://github.com/apache/spark/pull/47437 ### What changes were proposed in this pull request? This PR followups for https://github.com/apache/spark/pull/47145 to rename the log field naming ### Why are the changes needed?

Re: [PR] [SPARK-48961][PYTHON] Make the parameter naming of `PySparkException` consistent with JVM [spark]

2024-07-21 Thread via GitHub
itholic commented on code in PR #47436: URL: https://github.com/apache/spark/pull/47436#discussion_r1685981444 ## python/docs/source/development/logger.rst: ## @@ -52,7 +52,7 @@ Example log entry: "file": "/path/to/file.py", "line_no": "17", Review Comment:

Re: [PR] [SPARK-48961][PYTHON] Make the parameter naming of `PySparkException` consistent with JVM [spark]

2024-07-21 Thread via GitHub
HyukjinKwon commented on code in PR #47436: URL: https://github.com/apache/spark/pull/47436#discussion_r1685979935 ## python/docs/source/development/logger.rst: ## @@ -52,7 +52,7 @@ Example log entry: "file": "/path/to/file.py", "line_no": "17", Review Comment

Re: [PR] [SPARK-48961][PYTHON] Make the parameter naming of `PySparkException` consistent with JVM [spark]

2024-07-21 Thread via GitHub
itholic commented on code in PR #47436: URL: https://github.com/apache/spark/pull/47436#discussion_r1685975346 ## python/docs/source/development/logger.rst: ## @@ -52,7 +52,7 @@ Example log entry: "file": "/path/to/file.py", "line_no": "17", Review Comment:

Re: [PR] [SPARK-48961][PYTHON] Make the parameter naming of `PySparkException` consistent with JVM [spark]

2024-07-21 Thread via GitHub
HyukjinKwon commented on code in PR #47436: URL: https://github.com/apache/spark/pull/47436#discussion_r1685967841 ## python/docs/source/development/logger.rst: ## @@ -52,7 +52,7 @@ Example log entry: "file": "/path/to/file.py", "line_no": "17", Review Comment

Re: [PR] [SPARK-48961][PYTHON] Make the parameter naming of `PySparkException` consistent with JVM [spark]

2024-07-21 Thread via GitHub
itholic commented on PR #47436: URL: https://github.com/apache/spark/pull/47436#issuecomment-2242087457 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] [SPARK-48961][PYTHON] Make the parameter naming of `PySparkException` consistent with JVM [spark]

2024-07-21 Thread via GitHub
itholic opened a new pull request, #47436: URL: https://github.com/apache/spark/pull/47436 ### What changes were proposed in this pull request? This PR proposes to make the parameter naming of `PySparkException` consistent with JVM ### Why are the changes needed? The

[PR] [MINOR][TESTS][DOCS] Fix some typos in `LZFBenchmark` [spark]

2024-07-21 Thread via GitHub
wayneguow opened a new pull request, #47435: URL: https://github.com/apache/spark/pull/47435 ### What changes were proposed in this pull request? This RP aims to fix some typos in `LZFBenchmark`. ### Why are the changes needed? Fix typos and avoid confusion.

Re: [PR] [SPARK-48958][BUILD] Upgrade `zstd-jni` to 1.5.6-4 [spark]

2024-07-21 Thread via GitHub
LuciferYang commented on PR #47432: URL: https://github.com/apache/spark/pull/47432#issuecomment-2241980182 update benchmark result of ZStandardBenchmark? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-48958][BUILD] Upgrade `zstd-jni` to 1.5.6-4 [spark]

2024-07-21 Thread via GitHub
panbingkun commented on PR #47432: URL: https://github.com/apache/spark/pull/47432#issuecomment-2241970450 cc @dongjoon-hyun @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [WIP][SPARK-48948][SQL] Introduce `SHOW VARIABLES LIKE ... ` SQL syntax to get variables [spark]

2024-07-21 Thread via GitHub
panbingkun commented on PR #47422: URL: https://github.com/apache/spark/pull/47422#issuecomment-2241969878 cc @srielau @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-47307][SQL][FOLLOWUP][3.5] Promote spark.sql.legacy.chunkBase64String.enabled from a legacy/internal config to a regular/public one [spark]

2024-07-21 Thread via GitHub
yaooqinn closed pull request #47416: [SPARK-47307][SQL][FOLLOWUP][3.5] Promote spark.sql.legacy.chunkBase64String.enabled from a legacy/internal config to a regular/public one URL: https://github.com/apache/spark/pull/47416 -- This is an automated message from the Apache Git Service. To resp

Re: [PR] [SPARK-48936][CONNECT] Makes spark-submit works with Spark connect [spark]

2024-07-21 Thread via GitHub
HyukjinKwon commented on code in PR #47434: URL: https://github.com/apache/spark/pull/47434#discussion_r1685869245 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -890,7 +891,57 @@ object SparkSession extends Logging { // the re

[PR] [SPARK-48936][CONNECT] Makes spark-submit works with Spark connect [spark]

2024-07-21 Thread via GitHub
HyukjinKwon opened a new pull request, #47434: URL: https://github.com/apache/spark/pull/47434 ### What changes were proposed in this pull request? This PR proposes to add the support of `--remote` at `bin/spark-submit` so it can use Spark Connect easily. This PR inclues: - Make `b

Re: [PR] [SPARK-48959][SQL] Make `NoSuchNamespaceException` extend `NoSuchDatabaseException` to restore the exception handling [spark]

2024-07-21 Thread via GitHub
zhengruifeng commented on code in PR #47433: URL: https://github.com/apache/spark/pull/47433#discussion_r1685856407 ## sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala: ## @@ -409,7 +409,6 @@ private[sql] object CatalogV2Util { } catch

Re: [PR] [SPARK-48851][SQL] Change the value of `SCHEMA_NOT_FOUND` from `namespace` to `catalog.namespace` [spark]

2024-07-21 Thread via GitHub
zhengruifeng commented on PR #47276: URL: https://github.com/apache/spark/pull/47276#issuecomment-2241860283 This change is not only a user-facing change, it actually breaks the exception handling in external catalogs like: ``` try { } catch { case e: NoSuchDatabaseExce

[PR] [SPARK-48959][SQL] Make `NoSuchNamespaceException` extend `NoSuchDatabaseException` to restore the exception handling [spark]

2024-07-21 Thread via GitHub
zhengruifeng opened a new pull request, #47433: URL: https://github.com/apache/spark/pull/47433 ### What changes were proposed in this pull request? Make `NoSuchNamespaceException` extend `NoSuchNamespaceException` ### Why are the changes needed? 1, https://github.com/apache/

[PR] [SPARK-48958][BUILD] Upgrade `zstd-jni` to 1.5.6-4 [spark]

2024-07-21 Thread via GitHub
panbingkun opened a new pull request, #47432: URL: https://github.com/apache/spark/pull/47432 ### What changes were proposed in this pull request? The pr aims to upgrade `zstd-jni` from `1.5.6-3` to `1.5.6-4`. ### Why are the changes needed? 1.v1.5.6-3 VS v1.5.6-4 https://gith

Re: [PR] [WIP] Refactor of collation aware string functions [spark]

2024-07-21 Thread via GitHub
github-actions[bot] commented on PR #46031: URL: https://github.com/apache/spark/pull/46031#issuecomment-2241836529 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47639] Support codegen for json_tuple. [spark]

2024-07-21 Thread via GitHub
github-actions[bot] closed pull request #45765: [SPARK-47639] Support codegen for json_tuple. URL: https://github.com/apache/spark/pull/45765 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-48955][SQL] `ArrayCompact`'s datatype should be `containsNull = false` [spark]

2024-07-21 Thread via GitHub
HyukjinKwon closed pull request #47430: [SPARK-48955][SQL] `ArrayCompact`'s datatype should be `containsNull = false` URL: https://github.com/apache/spark/pull/47430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Insert Into Statement improvement [spark]

2024-07-21 Thread via GitHub
HyukjinKwon commented on PR #47428: URL: https://github.com/apache/spark/pull/47428#issuecomment-2241820903 Mind filing a JIRA please? See also https://spark.apache.org/contributing.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [MINOR][PYTHON] Fix type hint for `from_utc_timestamp` and `to_utc_timestamp` [spark]

2024-07-21 Thread via GitHub
HyukjinKwon closed pull request #47429: [MINOR][PYTHON] Fix type hint for `from_utc_timestamp` and `to_utc_timestamp` URL: https://github.com/apache/spark/pull/47429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [MINOR][PYTHON] Fix type hint for `from_utc_timestamp` and `to_utc_timestamp` [spark]

2024-07-21 Thread via GitHub
HyukjinKwon commented on PR #47429: URL: https://github.com/apache/spark/pull/47429#issuecomment-2241820706 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48955][SQL] `ArrayCompact`'s datatype should be `containsNull = false` [spark]

2024-07-21 Thread via GitHub
HyukjinKwon commented on PR #47430: URL: https://github.com/apache/spark/pull/47430#issuecomment-2241820569 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48891][SS] Refactor StateSchemaCompatibilityChecker to unify all state schema formats [spark]

2024-07-21 Thread via GitHub
HeartSaVioR closed pull request #47359: [SPARK-48891][SS] Refactor StateSchemaCompatibilityChecker to unify all state schema formats URL: https://github.com/apache/spark/pull/47359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48891][SS] Refactor StateSchemaCompatibilityChecker to unify all state schema formats [spark]

2024-07-21 Thread via GitHub
HeartSaVioR commented on PR #47359: URL: https://github.com/apache/spark/pull/47359#issuecomment-2241811013 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-48957][SS] Return sub-classified error class on state store load for hdfs and rocksdb provider [spark]

2024-07-21 Thread via GitHub
anishshri-db opened a new pull request, #47431: URL: https://github.com/apache/spark/pull/47431 ### What changes were proposed in this pull request? Return sub-classified error class on state store load for hdfs and rocksdb provider ### Why are the changes needed? Without th

Re: [PR] [SPARK-48755] State V2 base implementation and ValueState support [spark]

2024-07-21 Thread via GitHub
anishshri-db commented on code in PR #47133: URL: https://github.com/apache/spark/pull/47133#discussion_r1685785543 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -76,7 +76,8 @@ class IncrementalExecution( StreamingRe

Re: [PR] [SPARK-48700] [SQL] Mode expression for complex types (all collations) [spark]

2024-07-21 Thread via GitHub
uros-db commented on code in PR #47154: URL: https://github.com/apache/spark/pull/47154#discussion_r1685756288 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala: ## @@ -1789,44 +1798,90 @@ class CollationSQLExpressionsSuite s"named_str

Re: [PR] [SPARK-48700] [SQL] Mode expression for complex types (all collations) [spark]

2024-07-21 Thread via GitHub
uros-db commented on code in PR #47154: URL: https://github.com/apache/spark/pull/47154#discussion_r1685755850 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -106,11 +162,11 @@ case class Mode( val collationAwareBuffer =

Re: [PR] [SPARK-48700] [SQL] Mode expression for complex types (all collations) [spark]

2024-07-21 Thread via GitHub
uros-db commented on code in PR #47154: URL: https://github.com/apache/spark/pull/47154#discussion_r1685755850 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -106,11 +162,11 @@ case class Mode( val collationAwareBuffer =

Re: [PR] [SPARK-48700] [SQL] Mode expression for complex types (all collations) [spark]

2024-07-21 Thread via GitHub
uros-db commented on code in PR #47154: URL: https://github.com/apache/spark/pull/47154#discussion_r1685752681 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -106,11 +162,11 @@ case class Mode( val collationAwareBuffer =

Re: [PR] [SPARK-48700] [SQL] Mode expression for complex types (all collations) [spark]

2024-07-21 Thread via GitHub
uros-db commented on code in PR #47154: URL: https://github.com/apache/spark/pull/47154#discussion_r1685753351 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -86,6 +71,77 @@ case class Mode( buffer } + private def

Re: [PR] [SPARK-48700] [SQL] Mode expression for complex types (all collations) [spark]

2024-07-21 Thread via GitHub
uros-db commented on code in PR #47154: URL: https://github.com/apache/spark/pull/47154#discussion_r1685753146 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -86,6 +71,77 @@ case class Mode( buffer } + private def

Re: [PR] [SPARK-48700] [SQL] Mode expression for complex types (all collations) [spark]

2024-07-21 Thread via GitHub
uros-db commented on code in PR #47154: URL: https://github.com/apache/spark/pull/47154#discussion_r1685752681 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -106,11 +162,11 @@ case class Mode( val collationAwareBuffer =

Re: [PR] [SPARK-48700] [SQL] Mode expression for complex types (all collations) [spark]

2024-07-21 Thread via GitHub
GideonPotok commented on code in PR #47154: URL: https://github.com/apache/spark/pull/47154#discussion_r1685751172 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -106,11 +155,13 @@ case class Mode( val collationAwareBuff

Re: [PR] [SPARK-48495][SQL][DOCS] Describe shredding scheme for Variant [spark]

2024-07-21 Thread via GitHub
cashmand commented on code in PR #46831: URL: https://github.com/apache/spark/pull/46831#discussion_r1685750798 ## common/variant/shredding.md: ## @@ -0,0 +1,244 @@ +# Shredding Overview + +The Spark Variant type is designed to store and process semi-structured data efficiently

Re: [PR] [SPARK-48954] try_mod() replaces try_remainder() [spark]

2024-07-21 Thread via GitHub
zhengruifeng commented on PR #47427: URL: https://github.com/apache/spark/pull/47427#issuecomment-2241531762 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48954] try_mod() replaces try_remainder() [spark]

2024-07-21 Thread via GitHub
zhengruifeng closed pull request #47427: [SPARK-48954] try_mod() replaces try_remainder() URL: https://github.com/apache/spark/pull/47427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific