Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-06-05 Thread via GitHub
uros-db commented on PR #46700: URL: https://github.com/apache/spark/pull/46700#issuecomment-2151524400 in conclusion: we'll need to modify the logic for illegal UTF8 byte sequence replacement, to stay consistent with ICU4C however, we'll do this in a follow-up PR -- This is an aut

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-06-05 Thread via GitHub
uros-db commented on code in PR #46700: URL: https://github.com/apache/spark/pull/46700#discussion_r1628861090 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -26,6 +27,156 @@ // checkstyle.off: AvoidEscapedUnicodeCharacters publi

Re: [PR] [SPARK-47857][SQL] Utilize `java.sql.RowId.getBytes` API directly for UTF8String [spark]

2024-06-05 Thread via GitHub
yaooqinn closed pull request #46062: [SPARK-47857][SQL] Utilize `java.sql.RowId.getBytes` API directly for UTF8String URL: https://github.com/apache/spark/pull/46062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48540][CORE] Avoid ivy output loading settings to stdout [spark]

2024-06-05 Thread via GitHub
yaooqinn commented on PR #46882: URL: https://github.com/apache/spark/pull/46882#issuecomment-2151515489 Merged to master. Thank you @cxzl25 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-48540][CORE] Avoid ivy output loading settings to stdout [spark]

2024-06-05 Thread via GitHub
yaooqinn closed pull request #46882: [SPARK-48540][CORE] Avoid ivy output loading settings to stdout URL: https://github.com/apache/spark/pull/46882 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-48540][CORE] Avoid ivy output loading settings to stdout [spark]

2024-06-05 Thread via GitHub
cxzl25 commented on PR #46882: URL: https://github.com/apache/spark/pull/46882#issuecomment-2151501765 @LuciferYang @yaooqinn Please help review, thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-48539][BUILD][TESTS] Upgrade docker-java to 3.3.6 [spark]

2024-06-05 Thread via GitHub
yaooqinn commented on PR #46881: URL: https://github.com/apache/spark/pull/46881#issuecomment-2151492372 Thank you, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-48539][BUILD][TESTS] Upgrade docker-java to 3.3.6 [spark]

2024-06-05 Thread via GitHub
yaooqinn closed pull request #46881: [SPARK-48539][BUILD][TESTS] Upgrade docker-java to 3.3.6 URL: https://github.com/apache/spark/pull/46881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-48548][BUILD] Add LICENSE/NOTICE for spark-core with shaded dependencies [spark]

2024-06-05 Thread via GitHub
yaooqinn commented on PR #46891: URL: https://github.com/apache/spark/pull/46891#issuecomment-2151487745 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[PR] [SPARK-48548][BUILD]Add LICENSE/NOTICE for spark-core with shaded dependencies [spark]

2024-06-05 Thread via GitHub
yaooqinn opened a new pull request, #46891: URL: https://github.com/apache/spark/pull/46891 ### What changes were proposed in this pull request? The core module shipped with some bundled dependencies, it's better to add LICENSE/NOTICE to conform to the ASF policies.

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-06-05 Thread via GitHub
uros-db commented on code in PR #46700: URL: https://github.com/apache/spark/pull/46700#discussion_r1628823909 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -296,6 +344,45 @@ public static String toLowerCase(final String t

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-06-05 Thread via GitHub
mkaravel commented on code in PR #46700: URL: https://github.com/apache/spark/pull/46700#discussion_r1628822473 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -296,6 +344,45 @@ public static String toLowerCase(final String

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-06-05 Thread via GitHub
mkaravel commented on code in PR #46700: URL: https://github.com/apache/spark/pull/46700#discussion_r1628821647 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -412,9 +412,9 @@ protected Collation buildCollation() { "UT

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-06-05 Thread via GitHub
uros-db commented on code in PR #46700: URL: https://github.com/apache/spark/pull/46700#discussion_r1628820660 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -296,6 +344,45 @@ public static String toLowerCase(final String t

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-06-05 Thread via GitHub
mkaravel commented on code in PR #46700: URL: https://github.com/apache/spark/pull/46700#discussion_r1628813249 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -26,6 +27,156 @@ // checkstyle.off: AvoidEscapedUnicodeCharacters publ

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-06-05 Thread via GitHub
mkaravel commented on code in PR #46700: URL: https://github.com/apache/spark/pull/46700#discussion_r1628799978 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -296,6 +344,45 @@ public static String toLowerCase(final String

Re: [PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-06-05 Thread via GitHub
mkaravel commented on PR #46772: URL: https://github.com/apache/spark/pull/46772#issuecomment-2151433489 > As noted in one of the resolved comments - there's no loss of coverage. However, some tests have been (temporarily) removed because `StringTrim` no longer supports UNICODE collation gi

Re: [PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-06-05 Thread via GitHub
mkaravel commented on code in PR #46772: URL: https://github.com/apache/spark/pull/46772#discussion_r1628787249 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollationExpressionSuite.scala: ## @@ -174,10 +174,10 @@ class CollationExpressionSuite extend

Re: [PR] [SPARK-48495][SQL][DOCS] Describe shredding scheme for Variant [spark]

2024-06-05 Thread via GitHub
Samrose-Ahmed commented on PR #46831: URL: https://github.com/apache/spark/pull/46831#issuecomment-2151413685 I suggest moving the implementation of Variant to a separate repo outside of the Spark project. I see the usage of "Open Variant" instead of "Spark Variant" in [recent announcement

Re: [PR] [SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL [spark]

2024-06-05 Thread via GitHub
srielau commented on code in PR #46707: URL: https://github.com/apache/spark/pull/46707#discussion_r1628771638 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala: ## @@ -3488,6 +3488,35 @@ class DataSourceV2SQLSuiteV1Filter } } + test

[PR] [WIP] multi-line CSV schema inference should also throw FAILED_READ_FILE [spark]

2024-06-05 Thread via GitHub
cloud-fan opened a new pull request, #46890: URL: https://github.com/apache/spark/pull/46890 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How w

Re: [PR] [SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL [spark]

2024-06-05 Thread via GitHub
cloud-fan commented on code in PR #46707: URL: https://github.com/apache/spark/pull/46707#discussion_r1628746131 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala: ## @@ -3488,6 +3488,35 @@ class DataSourceV2SQLSuiteV1Filter } } + te

Re: [PR] [SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL [spark]

2024-06-05 Thread via GitHub
dtenedor commented on code in PR #46707: URL: https://github.com/apache/spark/pull/46707#discussion_r1628744481 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala: ## @@ -3488,6 +3488,35 @@ class DataSourceV2SQLSuiteV1Filter } } + tes

Re: [PR] [SPARK-47552][CORE][FOLLOWUP] Set spark.hadoop.fs.s3a.connection.establish.timeout to numeric [spark]

2024-06-05 Thread via GitHub
cloud-fan closed pull request #46874: [SPARK-47552][CORE][FOLLOWUP] Set spark.hadoop.fs.s3a.connection.establish.timeout to numeric URL: https://github.com/apache/spark/pull/46874 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-47552][CORE][FOLLOWUP] Set spark.hadoop.fs.s3a.connection.establish.timeout to numeric [spark]

2024-06-05 Thread via GitHub
cloud-fan commented on PR #46874: URL: https://github.com/apache/spark/pull/46874#issuecomment-2151358605 let me merge this follow-up first. We can continue the discussion in the original PR. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] [SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL [spark]

2024-06-05 Thread via GitHub
cloud-fan commented on code in PR #46707: URL: https://github.com/apache/spark/pull/46707#discussion_r1628736440 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -823,13 +823,17 @@ identifierComment relationPrimary : identifierRefer

Re: [PR] [SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL [spark]

2024-06-05 Thread via GitHub
cloud-fan commented on code in PR #46707: URL: https://github.com/apache/spark/pull/46707#discussion_r1628732028 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala: ## @@ -3488,6 +3488,35 @@ class DataSourceV2SQLSuiteV1Filter } } + te

Re: [PR] [SPARK-48547][DEPLOY] Add opt-in flag to have SparkSubmit automatically call System.exit after user code main method exits [spark]

2024-06-05 Thread via GitHub
pan3793 commented on PR #46889: URL: https://github.com/apache/spark/pull/46889#issuecomment-2151326847 This patch covers the SPARK-34674 and SPARK-42698 cases. And maybe the code introduced in SPARK-34674 can be removed. SPARK-34674 is used to tackle the Spark on K8s exit issues, it

Re: [PR] [SPARK-48541][CORE] Add new exit code for executors killed by TaskReaper [spark]

2024-06-05 Thread via GitHub
bozhang2820 commented on PR #46883: URL: https://github.com/apache/spark/pull/46883#issuecomment-2151299075 @JoshRosen, do you mind taking a look at this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-48538][SQL] Avoid HMS memory leak casued by bonecp [spark]

2024-06-05 Thread via GitHub
yaooqinn commented on PR #46879: URL: https://github.com/apache/spark/pull/46879#issuecomment-2151298639 Merged to master. Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-48538][SQL] Avoid HMS memory leak casued by bonecp [spark]

2024-06-05 Thread via GitHub
yaooqinn closed pull request #46879: [SPARK-48538][SQL] Avoid HMS memory leak casued by bonecp URL: https://github.com/apache/spark/pull/46879 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-48538][SQL] Avoid HMS memory leak casued by bonecp [spark]

2024-06-05 Thread via GitHub
LuciferYang commented on code in PR #46879: URL: https://github.com/apache/spark/pull/46879#discussion_r1628695658 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -1340,6 +1340,15 @@ private[hive] object HiveClientImpl extends Logging {

Re: [PR] [SPARK-48538][SQL] Avoid HMS memory leak casued by bonecp [spark]

2024-06-05 Thread via GitHub
yaooqinn commented on code in PR #46879: URL: https://github.com/apache/spark/pull/46879#discussion_r1627847602 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -1340,6 +1340,15 @@ private[hive] object HiveClientImpl extends Logging {

Re: [PR] [SPARK-46714][SQL] Overwrite a partition with custom location [spark]

2024-06-05 Thread via GitHub
adrian-wang commented on PR #44725: URL: https://github.com/apache/spark/pull/44725#issuecomment-2151289718 @cloud-fan Can you help review this pull request? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [WIP][SPARK-47103][ML] Make the default storage level of intermediate datasets for MLlib configurable [spark]

2024-06-05 Thread via GitHub
github-actions[bot] commented on PR #45182: URL: https://github.com/apache/spark/pull/45182#issuecomment-2151159015 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-46639][SQL] Add WindowExec SQLMetrics [spark]

2024-06-05 Thread via GitHub
github-actions[bot] commented on PR #44646: URL: https://github.com/apache/spark/pull/44646#issuecomment-2151159031 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47552][CORE][FOLLOWUP] Set spark.hadoop.fs.s3a.connection.establish.timeout to numeric [spark]

2024-06-05 Thread via GitHub
viirya commented on PR #46874: URL: https://github.com/apache/spark/pull/46874#issuecomment-2151145005 For the purpose of this follow up, it looks good. For the question, if `spark.hadoop.fs.s3a.connection.establish.timeout` is set and `s3a.connection.establish.timeout` is also set, w

Re: [PR] [SPARK-48543][SS] Track state row validation failures using explicit error class [spark]

2024-06-05 Thread via GitHub
anishshri-db commented on PR #46885: URL: https://github.com/apache/spark/pull/46885#issuecomment-2151141161 Tests are all passing. Not sure why the page is not updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47552][CORE][FOLLOWUP] Set spark.hadoop.fs.s3a.connection.establish.timeout to numeric [spark]

2024-06-05 Thread via GitHub
cloud-fan commented on PR #46874: URL: https://github.com/apache/spark/pull/46874#issuecomment-2151140877 also cc @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error [spark]

2024-06-05 Thread via GitHub
urosstan-db commented on code in PR #46594: URL: https://github.com/apache/spark/pull/46594#discussion_r1628566226 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSessionCatalogSuite.scala: ## @@ -110,7 +110,19 @@ class InMemoryTableSessionCatalog

Re: [PR] [SPARK-48546][SQL] Fix ExpressionEncoder after replacing NullPointerExceptions with proper error classes in AssertNotNull expression [spark]

2024-06-05 Thread via GitHub
dtenedor commented on PR #46888: URL: https://github.com/apache/spark/pull/46888#issuecomment-2151118095 cc @HyukjinKwon, I forgot to update the `ExpressionEncoder` in https://github.com/apache/spark/pull/46793 :) -- This is an automated message from the Apache Git Service. To respond to

[PR] [SPARK-48547][DEPLOY] Add opt-in flag to have SparkSubmit automatically call System.exit after user code main method exits [spark]

2024-06-05 Thread via GitHub
JoshRosen opened a new pull request, #46889: URL: https://github.com/apache/spark/pull/46889 ### What changes were proposed in this pull request? This PR adds a new SparkConf flag option, `spark.submit.callSystemExitOnMainExit` (default false), which when true will cause SparkSubmit

[PR] [SPARK-48546][SQL] Fix ExpressionEncoder after replacing NullPointerExceptions with proper error classes in AssertNotNull expression [spark]

2024-06-05 Thread via GitHub
dtenedor opened a new pull request, #46888: URL: https://github.com/apache/spark/pull/46888 ### What changes were proposed in this pull request? In https://github.com/apache/spark/pull/46793, we replaced NullPointerExceptions with proper error classes in AssertNotNull expression. How

[PR] [SPARK-46124][FOLLOWUP][3.5][CONNECT][SS] Send missing fields in StreamingQueryProgress to client [spark]

2024-06-05 Thread via GitHub
WweiL opened a new pull request, #46887: URL: https://github.com/apache/spark/pull/46887 ### What changes were proposed in this pull request? Currently in PySpark Client, calling `query.lastProgress` won't return you three fields: https://github.com/apache/spark/blob/0bc2

Re: [PR] [SPARK-46124][FOLLOWUP][CONNECT][SS] Send missing fields in StreamingQueryProgress to client [spark]

2024-06-05 Thread via GitHub
WweiL commented on code in PR #46886: URL: https://github.com/apache/spark/pull/46886#discussion_r1628549021 ## python/pyspark/sql/tests/streaming/test_streaming.py: ## @@ -28,7 +28,7 @@ class StreamingTestsMixin: def test_streaming_query_functions_basic(self): -

Re: [PR] [SPARK-46124][FOLLOWUP][CONNECT][SS] Send all fields in StreamingQueryProgress [spark]

2024-06-05 Thread via GitHub
WweiL commented on PR #46886: URL: https://github.com/apache/spark/pull/46886#issuecomment-2151082389 cc @HeartSaVioR @HyukjinKwon @LuciferYang, can you guys take a look? Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[PR] [SPARK-46124][FOLLOWUP][CONNECT][SS] Send all fields in StreamingQueryProgress [spark]

2024-06-05 Thread via GitHub
WweiL opened a new pull request, #46886: URL: https://github.com/apache/spark/pull/46886 ### What changes were proposed in this pull request? Currently in PySpark Client, calling `query.lastProgress` won't return you three fields: https://github.com/apache/spark/blob/0bc2

Re: [PR] [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error [spark]

2024-06-05 Thread via GitHub
urosstan-db commented on code in PR #46594: URL: https://github.com/apache/spark/pull/46594#discussion_r1628526962 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -298,6 +299,11 @@ object ResolveDefaultColumns extends Quer

Re: [PR] [SPARK-42944][FOLLOWUP][SS][CONNECT] Reenable ApplyInPandasWithState tests [spark]

2024-06-05 Thread via GitHub
WweiL commented on PR #46853: URL: https://github.com/apache/spark/pull/46853#issuecomment-2151038491 also cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error [spark]

2024-06-05 Thread via GitHub
cloud-fan commented on code in PR #46594: URL: https://github.com/apache/spark/pull/46594#discussion_r1628490433 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSessionCatalogSuite.scala: ## @@ -110,7 +110,19 @@ class InMemoryTableSessionCatalog ex

Re: [PR] [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error [spark]

2024-06-05 Thread via GitHub
cloud-fan commented on code in PR #46594: URL: https://github.com/apache/spark/pull/46594#discussion_r1628489680 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -298,6 +299,11 @@ object ResolveDefaultColumns extends QueryE

Re: [PR] [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error [spark]

2024-06-05 Thread via GitHub
cloud-fan commented on code in PR #46594: URL: https://github.com/apache/spark/pull/46594#discussion_r1628485302 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -1387,6 +1387,12 @@ ], "sqlState" : "42623" }, + "COLUMN_DEFAULT_VALUE_IS_NOT_FOLD

Re: [PR] [SPARK-48307][SQL][FOLLOWUP] Allow outer references in un-referenced CTE relations [spark]

2024-06-05 Thread via GitHub
cloud-fan closed pull request #46869: [SPARK-48307][SQL][FOLLOWUP] Allow outer references in un-referenced CTE relations URL: https://github.com/apache/spark/pull/46869 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-48307][SQL][FOLLOWUP] Allow outer references in un-referenced CTE relations [spark]

2024-06-05 Thread via GitHub
cloud-fan commented on PR #46869: URL: https://github.com/apache/spark/pull/46869#issuecomment-2151004775 thanks for the review, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error [spark]

2024-06-05 Thread via GitHub
urosstan-db commented on code in PR #46594: URL: https://github.com/apache/spark/pull/46594#discussion_r1628406085 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -284,6 +284,11 @@ object ResolveDefaultColumns extends Quer

Re: [PR] [SPARK-48543] Track state row validation failures using explicit error class [spark]

2024-06-05 Thread via GitHub
anishshri-db commented on PR #46885: URL: https://github.com/apache/spark/pull/46885#issuecomment-2150898744 cc - @HeartSaVioR - PTAL, thx ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] [SPARK-48543] Track state row validation failures using explicit error class [spark]

2024-06-05 Thread via GitHub
anishshri-db opened a new pull request, #46885: URL: https://github.com/apache/spark/pull/46885 ### What changes were proposed in this pull request? Track state row validation failures using explicit error class ### Why are the changes needed? We want to track these exception

Re: [PR] [SPARK-48498][SQL] Always do char padding in predicates [spark]

2024-06-05 Thread via GitHub
cloud-fan closed pull request #46832: [SPARK-48498][SQL] Always do char padding in predicates URL: https://github.com/apache/spark/pull/46832 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-48498][SQL] Always do char padding in predicates [spark]

2024-06-05 Thread via GitHub
cloud-fan commented on PR #46832: URL: https://github.com/apache/spark/pull/46832#issuecomment-2150855333 thanks for review, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error [spark]

2024-06-05 Thread via GitHub
dtenedor commented on code in PR #46594: URL: https://github.com/apache/spark/pull/46594#discussion_r1628294419 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -284,6 +284,11 @@ object ResolveDefaultColumns extends QueryEr

Re: [PR] [SPARK-48286] Fix analysis of column with exists default expression - Add user facing error [spark]

2024-06-05 Thread via GitHub
urosstan-db commented on code in PR #46594: URL: https://github.com/apache/spark/pull/46594#discussion_r1628263999 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala: ## @@ -142,9 +143,16 @@ case class QualifiedColType( def getV2Defaul

Re: [PR] Metadata hdfs log [spark]

2024-06-05 Thread via GitHub
ericm-db closed pull request #46884: Metadata hdfs log URL: https://github.com/apache/spark/pull/46884 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: r

[PR] Metadata hdfs log [spark]

2024-06-05 Thread via GitHub
ericm-db opened a new pull request, #46884: URL: https://github.com/apache/spark/pull/46884 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How wa

Re: [PR] [SPARK-48513][SS] Add error class for state schema compatibility and minor refactoring [spark]

2024-06-05 Thread via GitHub
anishshri-db commented on code in PR #46856: URL: https://github.com/apache/spark/pull/46856#discussion_r1628220489 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala: ## @@ -44,37 +42,37 @@ class StateSchemaCompatibil

Re: [PR] [SPARK-48286] Fix analysis and creation of column with exists default expression [spark]

2024-06-05 Thread via GitHub
urosstan-db commented on code in PR #46594: URL: https://github.com/apache/spark/pull/46594#discussion_r1628183192 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala: ## @@ -142,9 +143,16 @@ case class QualifiedColType( def getV2Defaul

Re: [PR] [SPARK-47336][SQL][CONNECT] Provide to PySpark a functionality to get estimated size of DataFrame in bytes [spark]

2024-06-05 Thread via GitHub
SemyonSinchenko commented on PR #46368: URL: https://github.com/apache/spark/pull/46368#issuecomment-2150609665 @HyukjinKwon I'm sorry for tagging you again, but maybe you can make a look? Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-05 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1627976630 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,25 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [SPARK-48498][SQL] Always do char padding in predicates [spark]

2024-06-05 Thread via GitHub
cloud-fan commented on code in PR #46832: URL: https://github.com/apache/spark/pull/46832#discussion_r1628125549 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4603,6 +4603,14 @@ object SQLConf { .booleanConf .createWithDefault(true)

Re: [PR] [SPARK-47172][CORE] Add support for AES-GCM for RPC encryption [spark]

2024-06-05 Thread via GitHub
mridulm commented on code in PR #46515: URL: https://github.com/apache/spark/pull/46515#discussion_r1628057143 ## common/network-common/src/main/java/org/apache/spark/network/crypto/GcmTransportCipher.java: ## @@ -0,0 +1,434 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47172][CORE] Add support for AES-GCM for RPC encryption [spark]

2024-06-05 Thread via GitHub
mridulm commented on code in PR #46515: URL: https://github.com/apache/spark/pull/46515#discussion_r1628025776 ## common/network-common/src/main/java/org/apache/spark/network/crypto/GcmTransportCipher.java: ## @@ -0,0 +1,434 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47172][CORE] Add support for AES-GCM for RPC encryption [spark]

2024-06-05 Thread via GitHub
mridulm commented on code in PR #46515: URL: https://github.com/apache/spark/pull/46515#discussion_r1628056762 ## common/network-common/src/main/java/org/apache/spark/network/crypto/GcmTransportCipher.java: ## @@ -0,0 +1,434 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47172][CORE] Add support for AES-GCM for RPC encryption [spark]

2024-06-05 Thread via GitHub
mridulm commented on code in PR #46515: URL: https://github.com/apache/spark/pull/46515#discussion_r1628025776 ## common/network-common/src/main/java/org/apache/spark/network/crypto/GcmTransportCipher.java: ## @@ -0,0 +1,434 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-05 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1627976630 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,25 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-05 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1627976630 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,25 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-05 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1627976630 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,25 @@ case class Mode( if (buffer.isEmpty) {

[PR] [SPARK-48541][CORE] Add new exit code for executors killed by TaskReaper [spark]

2024-06-05 Thread via GitHub
bozhang2820 opened a new pull request, #46883: URL: https://github.com/apache/spark/pull/46883 ### What changes were proposed in this pull request? This change adds a new exit code, 53, for executors killed by TaskReaper. ### Why are the changes needed? This is to better monitor

Re: [PR] [SPARK-48538][SQL] Avoid HMS memory leak casued by bonecp [spark]

2024-06-05 Thread via GitHub
yaooqinn commented on code in PR #46879: URL: https://github.com/apache/spark/pull/46879#discussion_r1627847602 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -1340,6 +1340,15 @@ private[hive] object HiveClientImpl extends Logging {

[PR] [SPARK-48540][CORE] Avoid ivy output loading settings to stdout [spark]

2024-06-05 Thread via GitHub
cxzl25 opened a new pull request, #46882: URL: https://github.com/apache/spark/pull/46882 ### What changes were proposed in this pull request? This PR aims to avoid ivy output loading settings to stdout. ### Why are the changes needed? Now `org.apache.spark.util.MavenUtils#getMod

Re: [PR] [SPARK-48538][SQL] Avoid HMS memory leak casued by bonecp [spark]

2024-06-05 Thread via GitHub
LuciferYang commented on code in PR #46879: URL: https://github.com/apache/spark/pull/46879#discussion_r1627715299 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -1340,6 +1340,15 @@ private[hive] object HiveClientImpl extends Logging {

Re: [PR] [SPARK-48536][PYTHON][CONNECT] Cache user specified schema in applyInPandas and applyInArrow [spark]

2024-06-05 Thread via GitHub
zhengruifeng commented on PR #46877: URL: https://github.com/apache/spark/pull/46877#issuecomment-2149754591 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48536][PYTHON][CONNECT] Cache user specified schema in applyInPandas and applyInArrow [spark]

2024-06-05 Thread via GitHub
zhengruifeng closed pull request #46877: [SPARK-48536][PYTHON][CONNECT] Cache user specified schema in applyInPandas and applyInArrow URL: https://github.com/apache/spark/pull/46877 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48538][SQL] Avoid HMS memory leak casued by bonecp [spark]

2024-06-05 Thread via GitHub
yaooqinn commented on PR #46879: URL: https://github.com/apache/spark/pull/46879#issuecomment-2149715776 cc @dongjoon-hyun @cloud-fan @LuciferYang thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [SPARK-48539][BUILD][TESTS] Upgrade docker-java to 3.3.6 [spark]

2024-06-05 Thread via GitHub
wayneguow opened a new pull request, #46881: URL: https://github.com/apache/spark/pull/46881 ### What changes were proposed in this pull request? Upgrades docker-java to 3.3.6 ### Why are the changes needed? Bug Fixes and Enhancements: https://github.com/docker

[PR] [Only Test] Sentences improve [spark]

2024-06-05 Thread via GitHub
panbingkun opened a new pull request, #46880: URL: https://github.com/apache/spark/pull/46880 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48537][SQL] Support Hive UDAF inherited from `GenericUDAFResolver` or `GenericUDAFResolver2` [spark]

2024-06-05 Thread via GitHub
panbingkun commented on code in PR #46878: URL: https://github.com/apache/spark/pull/46878#discussion_r1627568220 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala: ## @@ -178,23 +178,26 @@ object HiveUDFExpressionBuilder extends SparkUDFExpress

Re: [PR] [SPARK-48532][BUILD] Upgrade maven plugin to latest version [spark]

2024-06-05 Thread via GitHub
panbingkun commented on PR #46870: URL: https://github.com/apache/spark/pull/46870#issuecomment-2149581508 cc @LuciferYang @dongjoon-hyun @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-48537][SQL] Support Hive UDAF inherited from `GenericUDAFResolver` or `GenericUDAFResolver2` [spark]

2024-06-05 Thread via GitHub
panbingkun commented on PR #46878: URL: https://github.com/apache/spark/pull/46878#issuecomment-2149576716 @wangyum @yaooqinn @cloud-fan @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-48537][SQL] Support Hive UDAF inherited from `GenericUDAFResolver` or `GenericUDAFResolver2` [spark]

2024-06-05 Thread via GitHub
panbingkun commented on code in PR #46878: URL: https://github.com/apache/spark/pull/46878#discussion_r1627556949 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala: ## @@ -178,23 +178,26 @@ object HiveUDFExpressionBuilder extends SparkUDFExpress

[PR] [SPARK-48538][SQL] Avoid HMS memory leak casued by bonecp [spark]

2024-06-05 Thread via GitHub
yaooqinn opened a new pull request, #46879: URL: https://github.com/apache/spark/pull/46879 ### What changes were proposed in this pull request? As described in HIVE-15551, HMS will memory leak when directsql is enabled for MySQL metastore DB. Although HIVE-15551 has be

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-06-05 Thread via GitHub
davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1625792580 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,28 @@ options { tokenVocab = SqlBaseLexer; } public boolean doubl

Re: [PR] [SPARK-46937][SQL][FOLLOWUP] Properly check registered function replacement [spark]

2024-06-05 Thread via GitHub
yaooqinn commented on PR #46876: URL: https://github.com/apache/spark/pull/46876#issuecomment-2149351338 Thank you @cloud-fan @HyukjinKwon Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-46937][SQL][FOLLOWUP] Properly check registered function replacement [spark]

2024-06-05 Thread via GitHub
yaooqinn closed pull request #46876: [SPARK-46937][SQL][FOLLOWUP] Properly check registered function replacement URL: https://github.com/apache/spark/pull/46876 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-48535][SS] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled [spark]

2024-06-05 Thread via GitHub
yaooqinn commented on PR #46875: URL: https://github.com/apache/spark/pull/46875#issuecomment-2149213663 Merged to master and 3.5. Thank you @anishshri-db @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-48535][SS] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled [spark]

2024-06-05 Thread via GitHub
yaooqinn closed pull request #46875: [SPARK-48535][SS] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled URL: https://github.com/apache/spark/pull/46875 -- This is an automated message from the Apache Git Service.

[PR] [SPARK-48537][SQL] Support Hive UDAF inherit from `GenericUDAFResolver` and `GenericUDAFResolver2` [spark]

2024-06-05 Thread via GitHub
panbingkun opened a new pull request, #46878: URL: https://github.com/apache/spark/pull/46878 ### What changes were proposed in this pull request? The pr aims to support `Hive UDAF` inherit from `GenericUDAFResolver` and `GenericUDAFResolver2`. ### Why are the changes needed?

Re: [PR] [SPARK-48498][SQL] Always do char padding in predicates [spark]

2024-06-05 Thread via GitHub
yaooqinn commented on code in PR #46832: URL: https://github.com/apache/spark/pull/46832#discussion_r1627244735 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4603,6 +4603,14 @@ object SQLConf { .booleanConf .createWithDefault(true)

Re: [PR] [SPARK-48410][SQL] Fix InitCap expression for UTF8_BINARY_LCASE & ICU collations [spark]

2024-06-05 Thread via GitHub
uros-db commented on code in PR #46732: URL: https://github.com/apache/spark/pull/46732#discussion_r1627203339 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -270,25 +272,22 @@ public static UTF8String exec(final UTF8String v, fina

[PR] [SPARK-48536][PYTHON][CONNECT] Cache user specified schema in applyInPandas and applyInArrow [spark]

2024-06-05 Thread via GitHub
zhengruifeng opened a new pull request, #46877: URL: https://github.com/apache/spark/pull/46877 ### What changes were proposed in this pull request? Cache user specified schema in applyInPandas and applyInArrow ### Why are the changes needed? to avoid extra RPCs ##

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-06-05 Thread via GitHub
uros-db commented on code in PR #46700: URL: https://github.com/apache/spark/pull/46700#discussion_r1627182142 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -412,9 +412,9 @@ protected Collation buildCollation() { "UTF

Re: [PR] [SPARK-47258][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_127[0-5] [spark]

2024-06-05 Thread via GitHub
LuciferYang commented on PR #46770: URL: https://github.com/apache/spark/pull/46770#issuecomment-2149099791 Is this one ready to go ? @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

  1   2   >