Re: [PR] [SPARK-48567][DO-NOT-REVIEW] last progress [spark]

2024-06-07 Thread via GitHub
WweiL commented on code in PR #46921: URL: https://github.com/apache/spark/pull/46921#discussion_r1631855885 ## python/pyspark/sql/streaming/listener.py: ## @@ -497,6 +499,19 @@ def fromJson(cls, j: Dict[str, Any]) -> "StreamingQueryProgress": else {}, )

Re: [PR] [SPARK-48567][DO-NOT-REVIEW] last progress [spark]

2024-06-07 Thread via GitHub
WweiL commented on code in PR #46921: URL: https://github.com/apache/spark/pull/46921#discussion_r1631855722 ## python/pyspark/sql/streaming/listener.py: ## @@ -497,6 +499,19 @@ def fromJson(cls, j: Dict[str, Any]) -> "StreamingQueryProgress": else {}, )

Re: [PR] [SPARK-46124][FOLLOWUP][CONNECT][SS] Send missing fields in StreamingQueryProgress to client [spark]

2024-06-07 Thread via GitHub
WweiL closed pull request #46886: [SPARK-46124][FOLLOWUP][CONNECT][SS] Send missing fields in StreamingQueryProgress to client URL: https://github.com/apache/spark/pull/46886 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46124][FOLLOWUP][CONNECT][SS] Send missing fields in StreamingQueryProgress to client [spark]

2024-06-07 Thread via GitHub
WweiL commented on PR #46886: URL: https://github.com/apache/spark/pull/46886#issuecomment-2155791038 Created another PR: https://github.com/apache/spark/pull/46921, it is not ready for review yet because I found another bug when working on it. After

Re: [PR] [SPARK-48567][DO-NOT-REVIEW] last progress [spark]

2024-06-07 Thread via GitHub
WweiL commented on PR #46921: URL: https://github.com/apache/spark/pull/46921#issuecomment-2155790851 Pending merging of https://github.com/apache/spark/pull/46886 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] [SPARK-48567][DO-NOT-REVIEW] last progress [spark]

2024-06-07 Thread via GitHub
WweiL opened a new pull request, #46921: URL: https://github.com/apache/spark/pull/46921 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[PR] [SPARK-48569][SS][CONNECT] Handle edge cases in query.name [spark]

2024-06-07 Thread via GitHub
WweiL opened a new pull request, #46920: URL: https://github.com/apache/spark/pull/46920 ### What changes were proposed in this pull request? 1. In connect, when a streaming query name is not specified, it's query.name should return None. Currently it returns an empty string

Re: [PR] [SPARK-48012][SQL] SPJ: Support Transfrom Expressions for One Side Shuffle [spark]

2024-06-07 Thread via GitHub
sunchao commented on code in PR #46255: URL: https://github.com/apache/spark/pull/46255#discussion_r1631825274 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -870,12 +870,30 @@ case class KeyGroupedShuffleSpec( if

Re: [PR] [SPARK-48012][SQL] SPJ: Support Transfrom Expressions for One Side Shuffle [spark]

2024-06-07 Thread via GitHub
szehon-ho commented on code in PR #46255: URL: https://github.com/apache/spark/pull/46255#discussion_r1631819220 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -870,12 +870,30 @@ case class KeyGroupedShuffleSpec( if

Re: [PR] [SPARK-48012][SQL] SPJ: Support Transfrom Expressions for One Side Shuffle [spark]

2024-06-07 Thread via GitHub
szehon-ho commented on code in PR #46255: URL: https://github.com/apache/spark/pull/46255#discussion_r1631819120 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TransformExpression.scala: ## @@ -113,4 +116,23 @@ case class TransformExpression(

Re: [PR] [SPARK-42944][FOLLOWUP][3.5][SS][CONNECT] Reenable ApplyInPandasWithState tests [spark]

2024-06-07 Thread via GitHub
WweiL closed pull request #46855: [SPARK-42944][FOLLOWUP][3.5][SS][CONNECT] Reenable ApplyInPandasWithState tests URL: https://github.com/apache/spark/pull/46855 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48566][PYTHON] Fix bug where partition indices are incorrect when UDTF analyze() uses both select and partitionColumns [spark]

2024-06-07 Thread via GitHub
dtenedor commented on code in PR #46918: URL: https://github.com/apache/spark/pull/46918#discussion_r1631794722 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/FunctionTableSubqueryArgumentExpression.scala: ## @@ -165,20 +165,18 @@ case class

Re: [PR] [SPARK-45880][SQL] Make the like `pattern` semantics used in all commands consistent [spark]

2024-06-07 Thread via GitHub
github-actions[bot] commented on PR #43751: URL: https://github.com/apache/spark/pull/43751#issuecomment-2155719359 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47153][CORE] Guard serialize/deserialize in JavaSerializer with try-with-resource block [spark]

2024-06-07 Thread via GitHub
github-actions[bot] commented on PR #45238: URL: https://github.com/apache/spark/pull/45238#issuecomment-2155719350 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [MINOR][PYTHON][TESTS] Move a test out of parity tests [spark]

2024-06-07 Thread via GitHub
zhengruifeng commented on PR #46914: URL: https://github.com/apache/spark/pull/46914#issuecomment-2155702592 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR][PYTHON][TESTS] Move a test out of parity tests [spark]

2024-06-07 Thread via GitHub
zhengruifeng closed pull request #46914: [MINOR][PYTHON][TESTS] Move a test out of parity tests URL: https://github.com/apache/spark/pull/46914 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL [spark]

2024-06-07 Thread via GitHub
szehon-ho commented on code in PR #46707: URL: https://github.com/apache/spark/pull/46707#discussion_r1631776812 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -823,13 +823,17 @@ identifierComment relationPrimary :

Re: [PR] [SPARK-47172][CORE] Add support for AES-GCM for RPC encryption [spark]

2024-06-07 Thread via GitHub
sweisdb commented on code in PR #46515: URL: https://github.com/apache/spark/pull/46515#discussion_r1631777034 ## common/network-common/src/main/java/org/apache/spark/network/crypto/GcmTransportCipher.java: ## @@ -0,0 +1,356 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-06-07 Thread via GitHub
cloud-fan commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1631776712 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingLogicalOperators.scala: ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-06-07 Thread via GitHub
cloud-fan commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1631776712 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingLogicalOperators.scala: ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-06-07 Thread via GitHub
cloud-fan commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1631776102 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingLogicalOperators.scala: ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-06-07 Thread via GitHub
cloud-fan commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1631775974 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -117,6 +118,57 @@ class AstBuilder extends DataTypeAstBuilder with

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-06-07 Thread via GitHub
cloud-fan commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1631774980 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AbstractSqlParser.scala: ## @@ -91,6 +91,19 @@ abstract class AbstractSqlParser extends

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-06-07 Thread via GitHub
cloud-fan commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1631774980 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AbstractSqlParser.scala: ## @@ -91,6 +91,19 @@ abstract class AbstractSqlParser extends

Re: [PR] [SPARK-47172][CORE] Add support for AES-GCM for RPC encryption [spark]

2024-06-07 Thread via GitHub
sweisdb commented on code in PR #46515: URL: https://github.com/apache/spark/pull/46515#discussion_r1631774639 ## common/network-common/src/main/java/org/apache/spark/network/crypto/GcmTransportCipher.java: ## @@ -0,0 +1,434 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-06-07 Thread via GitHub
cloud-fan commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1631774604 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,28 @@ options { tokenVocab = SqlBaseLexer; } public boolean

Re: [PR] [SPARK-47172][CORE] Add support for AES-GCM for RPC encryption [spark]

2024-06-07 Thread via GitHub
sweisdb commented on code in PR #46515: URL: https://github.com/apache/spark/pull/46515#discussion_r1631748387 ## common/network-common/src/main/java/org/apache/spark/network/crypto/GcmTransportCipher.java: ## @@ -0,0 +1,434 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47172][CORE] Add support for AES-GCM for RPC encryption [spark]

2024-06-07 Thread via GitHub
sweisdb commented on code in PR #46515: URL: https://github.com/apache/spark/pull/46515#discussion_r1631746777 ## common/network-common/src/main/java/org/apache/spark/network/crypto/GcmTransportCipher.java: ## @@ -0,0 +1,434 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47172][CORE] Add support for AES-GCM for RPC encryption [spark]

2024-06-07 Thread via GitHub
sweisdb commented on code in PR #46515: URL: https://github.com/apache/spark/pull/46515#discussion_r1631746413 ## common/network-common/src/main/java/org/apache/spark/network/crypto/GcmTransportCipher.java: ## @@ -0,0 +1,434 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47172][CORE] Add support for AES-GCM for RPC encryption [spark]

2024-06-07 Thread via GitHub
sweisdb commented on code in PR #46515: URL: https://github.com/apache/spark/pull/46515#discussion_r1631746164 ## common/network-common/src/main/java/org/apache/spark/network/crypto/GcmTransportCipher.java: ## @@ -0,0 +1,434 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-48511][SS] Remove TimeMode None from TransformWithState. [spark]

2024-06-07 Thread via GitHub
HeartSaVioR commented on PR #46825: URL: https://github.com/apache/spark/pull/46825#issuecomment-2155563771 Also let's revisit the UX. If they use neither timeout nor TTL, they could just do None regardless they have event time column or not. Given we remove None, what is the expectation

Re: [PR] [SPARK-48511][SS] Remove TimeMode None from TransformWithState. [spark]

2024-06-07 Thread via GitHub
HeartSaVioR commented on code in PR #46825: URL: https://github.com/apache/spark/pull/46825#discussion_r1631685631 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateInitialStateSuite.scala: ## Review Comment: Same. ##

Re: [PR] [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable and Fix UT [spark]

2024-06-07 Thread via GitHub
cloud-fan closed pull request #46912: [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable and Fix UT URL: https://github.com/apache/spark/pull/46912 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable and Fix UT [spark]

2024-06-07 Thread via GitHub
cloud-fan commented on code in PR #46912: URL: https://github.com/apache/spark/pull/46912#discussion_r1631678592 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -84,6 +83,17 @@ private[v2] trait V2JDBCTest extends

Re: [PR] [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable and Fix UT [spark]

2024-06-07 Thread via GitHub
cloud-fan commented on PR #46912: URL: https://github.com/apache/spark/pull/46912#issuecomment-2155531100 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48511][SS] Remove TimeMode None from TransformWithState. [spark]

2024-06-07 Thread via GitHub
HeartSaVioR commented on code in PR #46825: URL: https://github.com/apache/spark/pull/46825#discussion_r1631676571 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -502,6 +502,6 @@ class IncrementalExecution( case p:

Re: [PR] [SPARK-48511][SS] Remove TimeMode None from TransformWithState. [spark]

2024-06-07 Thread via GitHub
HeartSaVioR commented on code in PR #46825: URL: https://github.com/apache/spark/pull/46825#discussion_r1631675291 ## sql/api/src/main/java/org/apache/spark/sql/streaming/TimeMode.java: ## @@ -31,11 +30,6 @@ @Evolving public class TimeMode { -/** - * Neither timers

Re: [PR] [SPARK-48511][SS] Remove TimeMode None from TransformWithState. [spark]

2024-06-07 Thread via GitHub
HeartSaVioR commented on code in PR #46825: URL: https://github.com/apache/spark/pull/46825#discussion_r1631675291 ## sql/api/src/main/java/org/apache/spark/sql/streaming/TimeMode.java: ## @@ -31,11 +30,6 @@ @Evolving public class TimeMode { -/** - * Neither timers

Re: [PR] [SPARK-48566][PYTHON] Fix bug where partition indices are incorrect when UDTF analyze() uses both select and partitionColumns [spark]

2024-06-07 Thread via GitHub
ueshin commented on code in PR #46918: URL: https://github.com/apache/spark/pull/46918#discussion_r1631670942 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/FunctionTableSubqueryArgumentExpression.scala: ## @@ -165,20 +165,18 @@ case class

Re: [PR] [SPARK-48566][PYTHON] Fix bug where partition indices are incorrect when UDTF analyze() uses both select and partitionColumns [spark]

2024-06-07 Thread via GitHub
ueshin commented on code in PR #46918: URL: https://github.com/apache/spark/pull/46918#discussion_r1631655084 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/FunctionTableSubqueryArgumentExpression.scala: ## @@ -165,20 +165,18 @@ case class

Re: [PR] [SPARK-48495][SQL][DOCS] Describe shredding scheme for Variant [spark]

2024-06-07 Thread via GitHub
cashmand commented on PR #46831: URL: https://github.com/apache/spark/pull/46831#issuecomment-2155454782 Hi @Samrose-Ahmed, our intent is for this to be an open format that other engines can adopt. We're aiming to put common code in a Java library under common/variant, so that other

Re: [PR] [SPARK-48495][SQL][DOCS] Describe shredding scheme for Variant [spark]

2024-06-07 Thread via GitHub
cashmand commented on PR #46831: URL: https://github.com/apache/spark/pull/46831#issuecomment-2155445121 Hi @shaeqahmed, thanks for your detailed response. Your suggestions add a lot of flexibility to the shredding scheme! At the same time, we are wary of adding complexity that could be a

[PR] [SPARK-48566][PYTHON] Fix bug where partition indices are incorrect when UDTF analyze() uses both select and partitionColumns [spark]

2024-06-07 Thread via GitHub
dtenedor opened a new pull request, #46918: URL: https://github.com/apache/spark/pull/46918 ### What changes were proposed in this pull request? This PR fixes a bug that resulted in an internal error with some combination of the Python UDTF "select" and "partitionBy" options of the

Re: [PR] [SPARK-48556][SQL] Fix incorrect error message pointing to UNSUPPORTED_GROUPING_EXPRESSION [spark]

2024-06-07 Thread via GitHub
cloud-fan commented on code in PR #46900: URL: https://github.com/apache/spark/pull/46900#discussion_r1631566694 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -267,6 +267,7 @@ trait CheckAnalysis extends PredicateHelper with

Re: [PR] [MINOR][PYTHON][TESTS] Move a test out of parity tests [spark]

2024-06-07 Thread via GitHub
xinrong-meng commented on PR #46914: URL: https://github.com/apache/spark/pull/46914#issuecomment-2155320690 LGTM, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48557][SQL] Support scalar subquery with group-by on column equal to constant [spark]

2024-06-07 Thread via GitHub
cloud-fan commented on code in PR #46902: URL: https://github.com/apache/spark/pull/46902#discussion_r1631531290 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -274,7 +278,9 @@ object SubExprUtils extends PredicateHelper {

Re: [PR] Cxollationmode [spark]

2024-06-07 Thread via GitHub
GideonPotok commented on PR #46917: URL: https://github.com/apache/spark/pull/46917#issuecomment-2155141783 @dbatomic so do you think we should proceed with this approach? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] Cxollationmode [spark]

2024-06-07 Thread via GitHub
GideonPotok opened a new pull request, #46917: URL: https://github.com/apache/spark/pull/46917 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-07 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1627976630 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,25 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-07 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1629664004 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,25 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [SPARK-48560][SS][PYTHON] Make StreamingQueryListener.spark settable [spark]

2024-06-07 Thread via GitHub
zsxwing commented on code in PR #46909: URL: https://github.com/apache/spark/pull/46909#discussion_r1631388870 ## python/pyspark/sql/streaming/listener.py: ## @@ -75,6 +75,11 @@ def spark(self) -> Optional["SparkSession"]: # type: ignore[name-defined] # noq else:

Re: [PR] [WIP][SQL] UTF8 string validation [spark]

2024-06-07 Thread via GitHub
uros-db commented on code in PR #46845: URL: https://github.com/apache/spark/pull/46845#discussion_r1631168863 ## sql/core/src/test/resources/sql-tests/inputs/string-functions.sql: ## @@ -276,3 +276,16 @@ select luhn_check(6017); select

Re: [PR] [WIP][SQL] UTF8 string validation [spark]

2024-06-07 Thread via GitHub
uros-db commented on code in PR #46845: URL: https://github.com/apache/spark/pull/46845#discussion_r1631166863 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -686,6 +686,205 @@ case class EndsWith(left: Expression, right:

Re: [PR] [WIP][SQL] UTF8 string validation [spark]

2024-06-07 Thread via GitHub
uros-db commented on code in PR #46845: URL: https://github.com/apache/spark/pull/46845#discussion_r1631166079 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -686,6 +686,205 @@ case class EndsWith(left: Expression, right:

Re: [PR] [WIP][SQL] UTF8 string validation [spark]

2024-06-07 Thread via GitHub
uros-db commented on code in PR #46845: URL: https://github.com/apache/spark/pull/46845#discussion_r1631165016 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -686,6 +686,205 @@ case class EndsWith(left: Expression, right:

Re: [PR] [WIP][SQL] UTF8 string validation [spark]

2024-06-07 Thread via GitHub
uros-db commented on code in PR #46845: URL: https://github.com/apache/spark/pull/46845#discussion_r1631152669 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala: ## @@ -2050,4 +2051,93 @@ class StringExpressionsSuite extends

Re: [PR] [WIP][SQL] Invalid UTF-8 byte sequence replacement [spark]

2024-06-07 Thread via GitHub
uros-db commented on code in PR #46899: URL: https://github.com/apache/spark/pull/46899#discussion_r1631141047 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -270,6 +279,123 @@ public byte[] getBytes() { } } + /** + * Utility

Re: [PR] [WIP][SQL] Invalid UTF-8 byte sequence replacement [spark]

2024-06-07 Thread via GitHub
uros-db commented on code in PR #46899: URL: https://github.com/apache/spark/pull/46899#discussion_r1631136369 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -270,6 +279,123 @@ public byte[] getBytes() { } } + /** + * Utility

Re: [PR] [SPARK-48565][UI] Fix thread dump display in UI [spark]

2024-06-07 Thread via GitHub
pan3793 commented on PR #46916: URL: https://github.com/apache/spark/pull/46916#issuecomment-2154699417 cc @yaooqinn @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-48565][UI] Fix thread dump display in UI [spark]

2024-06-07 Thread via GitHub
pan3793 opened a new pull request, #46916: URL: https://github.com/apache/spark/pull/46916 ### What changes were proposed in this pull request? Thread dump display in UI is not pretty as before, this is side-effect introduced by SPARK-44863 ### Why are the changes

Re: [PR] [SPARK-48280][SQL] Add Expression Walker for Testing [spark]

2024-06-07 Thread via GitHub
mihailom-db commented on code in PR #46801: URL: https://github.com/apache/spark/pull/46801#discussion_r1631034190 ## sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala: ## @@ -0,0 +1,294 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-48280][SQL] Add Expression Walker for Testing [spark]

2024-06-07 Thread via GitHub
nikolamand-db commented on code in PR #46801: URL: https://github.com/apache/spark/pull/46801#discussion_r1631023339 ## sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala: ## @@ -0,0 +1,294 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-48280][SQL] Add Expression Walker for Testing [spark]

2024-06-07 Thread via GitHub
dbatomic commented on PR #46801: URL: https://github.com/apache/spark/pull/46801#issuecomment-2154562301 Can you update PR title to reflect the changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [WIP][SQL] Invalid UTF-8 byte sequence replacement [spark]

2024-06-07 Thread via GitHub
uros-db commented on code in PR #46899: URL: https://github.com/apache/spark/pull/46899#discussion_r1631020568 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -270,6 +279,123 @@ public byte[] getBytes() { } } + /** + * Utility

Re: [PR] [WIP][SQL] Invalid UTF-8 byte sequence replacement [spark]

2024-06-07 Thread via GitHub
uros-db commented on code in PR #46899: URL: https://github.com/apache/spark/pull/46899#discussion_r1631021388 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -270,6 +279,123 @@ public byte[] getBytes() { } } + /** + * Utility

Re: [PR] [SPARK-48280][SQL] Add Expression Walker for Testing [spark]

2024-06-07 Thread via GitHub
mihailom-db commented on code in PR #46801: URL: https://github.com/apache/spark/pull/46801#discussion_r1631019965 ## sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala: ## @@ -0,0 +1,294 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [WIP][SQL] Invalid UTF-8 byte sequence replacement [spark]

2024-06-07 Thread via GitHub
uros-db commented on code in PR #46899: URL: https://github.com/apache/spark/pull/46899#discussion_r1631015318 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -270,6 +279,123 @@ public byte[] getBytes() { } } + /** + * Utility

Re: [PR] [SPARK-48280][SQL] Add Expression Walker for Testing [spark]

2024-06-07 Thread via GitHub
mihailom-db commented on code in PR #46801: URL: https://github.com/apache/spark/pull/46801#discussion_r1631016047 ## sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala: ## @@ -0,0 +1,294 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-48280][SQL] Add Expression Walker for Testing [spark]

2024-06-07 Thread via GitHub
dbatomic commented on code in PR #46801: URL: https://github.com/apache/spark/pull/46801#discussion_r1631008964 ## sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala: ## @@ -0,0 +1,294 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE & ICU collations [spark]

2024-06-07 Thread via GitHub
uros-db commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1630998971 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -494,10 +494,10 @@ public void testUpper() throws SparkException {

[PR] [SPARK-48564][PYTHON][CONNECT] Propagate cached schema in set operations [spark]

2024-06-07 Thread via GitHub
zhengruifeng opened a new pull request, #46915: URL: https://github.com/apache/spark/pull/46915 ### What changes were proposed in this pull request? Propagate cached schema in set operations ### Why are the changes needed? to avoid extra RPC to get the schema of result data

[PR] [MINOR][PYTHON][TESTS] Move a test out of parity tests [spark]

2024-06-07 Thread via GitHub
zhengruifeng opened a new pull request, #46914: URL: https://github.com/apache/spark/pull/46914 ### What changes were proposed in this pull request? Move a test out of parity tests ### Why are the changes needed? it is not tested in Spark Classic, not a parity test

Re: [PR] [SPARK-48510][2/2] Support UDAF `toColumn` API in Spark Connect [spark]

2024-06-07 Thread via GitHub
xupefei commented on code in PR #46849: URL: https://github.com/apache/spark/pull/46849#discussion_r1630979481 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -48,6 +48,7 @@ message Expression { CommonInlineUserDefinedFunction

Re: [PR] [SPARK-48510][2/2] Support UDAF `toColumn` API in Spark Connect [spark]

2024-06-07 Thread via GitHub
xupefei commented on code in PR #46849: URL: https://github.com/apache/spark/pull/46849#discussion_r1630945022 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/Aggregator.scala: ## @@ -52,7 +56,7 @@ import org.apache.spark.sql.{Encoder,

[PR] [SPARK-48563][BUILD] Upgrade `pickle` to 1.5 [spark]

2024-06-07 Thread via GitHub
LuciferYang opened a new pull request, #46913: URL: https://github.com/apache/spark/pull/46913 ### What changes were proposed in this pull request? This pr aims upgrade `pickle` from 1.3 to 1.5. ### Why are the changes needed? The new version include a new fix related to [empty

Re: [PR] [WIP][SPARK-37448][SQL] Multiple performance optimizations related to CurrentOrigin.withOrigin [spark]

2024-06-07 Thread via GitHub
JoshRosen closed pull request #46908: [WIP][SPARK-37448][SQL] Multiple performance optimizations related to CurrentOrigin.withOrigin URL: https://github.com/apache/spark/pull/46908 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [WIP][SPARK-37448][SQL] Multiple performance optimizations related to CurrentOrigin.withOrigin [spark]

2024-06-07 Thread via GitHub
JoshRosen commented on PR #46908: URL: https://github.com/apache/spark/pull/46908#issuecomment-2154434517 I'm switching this back to WIP pending some further performance profiling over a wider array of workloads. It turns out that the Scala compiler generates specialized

Re: [PR] [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable and Fix UT [spark]

2024-06-07 Thread via GitHub
panbingkun commented on code in PR #46912: URL: https://github.com/apache/spark/pull/46912#discussion_r1630887487 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -84,6 +83,17 @@ private[v2] trait V2JDBCTest extends

Re: [PR] [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable and Fix UT [spark]

2024-06-07 Thread via GitHub
panbingkun commented on code in PR #46912: URL: https://github.com/apache/spark/pull/46912#discussion_r1630882567 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -84,6 +83,17 @@ private[v2] trait V2JDBCTest extends

Re: [PR] [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable and Fix UT [spark]

2024-06-07 Thread via GitHub
panbingkun commented on code in PR #46912: URL: https://github.com/apache/spark/pull/46912#discussion_r1630876576 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -84,6 +83,17 @@ private[v2] trait V2JDBCTest extends

Re: [PR] [SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry [spark]

2024-06-07 Thread via GitHub
yaooqinn commented on PR #44976: URL: https://github.com/apache/spark/pull/44976#issuecomment-2154395124 +1 for https://github.com/apache/spark/pull/44976#issuecomment-2153612894 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable and Fix UT [spark]

2024-06-07 Thread via GitHub
panbingkun commented on code in PR #46912: URL: https://github.com/apache/spark/pull/46912#discussion_r1630876576 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -84,6 +83,17 @@ private[v2] trait V2JDBCTest extends

Re: [PR] [SPARK-48559][SQL] Fetch globalTempDatabase name directly without invoking initialization of GlobalaTempViewManager [spark]

2024-06-07 Thread via GitHub
cloud-fan closed pull request #46907: [SPARK-48559][SQL] Fetch globalTempDatabase name directly without invoking initialization of GlobalaTempViewManager URL: https://github.com/apache/spark/pull/46907 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-48559][SQL] Fetch globalTempDatabase name directly without invoking initialization of GlobalaTempViewManager [spark]

2024-06-07 Thread via GitHub
cloud-fan commented on PR #46907: URL: https://github.com/apache/spark/pull/46907#issuecomment-2154392683 the docker test failure is unrelated and the offending commit is already reverted. I'm merging it to master, thanks! -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable and Fix UT [spark]

2024-06-07 Thread via GitHub
panbingkun commented on code in PR #46912: URL: https://github.com/apache/spark/pull/46912#discussion_r1630870933 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala: ## @@ -131,13 +131,16 @@ class JDBCTableCatalog extends

Re: [PR] [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable and Fix UT [spark]

2024-06-07 Thread via GitHub
cloud-fan commented on code in PR #46912: URL: https://github.com/apache/spark/pull/46912#discussion_r1630867852 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala: ## @@ -84,6 +83,17 @@ private[v2] trait V2JDBCTest extends

Re: [PR] [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable and Fix UT [spark]

2024-06-07 Thread via GitHub
cloud-fan commented on code in PR #46912: URL: https://github.com/apache/spark/pull/46912#discussion_r1630866841 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala: ## @@ -131,13 +131,16 @@ class JDBCTableCatalog extends

Re: [PR] [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable and Fix UT [spark]

2024-06-07 Thread via GitHub
panbingkun commented on code in PR #46912: URL: https://github.com/apache/spark/pull/46912#discussion_r1630859747 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala: ## @@ -131,13 +131,16 @@ class JDBCTableCatalog extends

Re: [PR] [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable and Fix UT [spark]

2024-06-07 Thread via GitHub
panbingkun commented on code in PR #46912: URL: https://github.com/apache/spark/pull/46912#discussion_r1630859747 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala: ## @@ -131,13 +131,16 @@ class JDBCTableCatalog extends

Re: [PR] [SPARK-46393][SQL][TESTS] Fix UT [spark]

2024-06-07 Thread via GitHub
panbingkun commented on PR #46912: URL: https://github.com/apache/spark/pull/46912#issuecomment-2154369655 > Could you cherry-pick [82b4ad2](https://github.com/apache/spark/commit/82b4ad2af64845503604da70ff02748c3969c991) here to test the CI? Done. -- This is an automated message

Re: [PR] [SPARK-48561][PS][CONNECT] Throw `PandasNotImplementedError` for unsupported plotting functions [spark]

2024-06-07 Thread via GitHub
zhengruifeng commented on PR #46911: URL: https://github.com/apache/spark/pull/46911#issuecomment-2154366326 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48561][PS][CONNECT] Throw `PandasNotImplementedError` for unsupported plotting functions [spark]

2024-06-07 Thread via GitHub
zhengruifeng closed pull request #46911: [SPARK-48561][PS][CONNECT] Throw `PandasNotImplementedError` for unsupported plotting functions URL: https://github.com/apache/spark/pull/46911 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-46393][SQL][TESTS] Fix UT [spark]

2024-06-07 Thread via GitHub
panbingkun commented on PR #46912: URL: https://github.com/apache/spark/pull/46912#issuecomment-2154362713 > Could you cherry-pick [82b4ad2](https://github.com/apache/spark/commit/82b4ad2af64845503604da70ff02748c3969c991) here to test the CI? Okay. -- This is an automated message

Re: [PR] [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable [spark]

2024-06-07 Thread via GitHub
panbingkun commented on PR #46905: URL: https://github.com/apache/spark/pull/46905#issuecomment-2154361929 > Oops, My bad, can you cherry-pick [82b4ad2](https://github.com/apache/spark/commit/82b4ad2af64845503604da70ff02748c3969c991) into your new PR? I just reverted this to recover the CI

Re: [PR] [SPARK-46393][SQL][TESTS] Fix UT [spark]

2024-06-07 Thread via GitHub
yaooqinn commented on PR #46912: URL: https://github.com/apache/spark/pull/46912#issuecomment-2154361534 Could you cherry-pick 82b4ad2af64845503604da70ff02748c3969c991 here to test the CI? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable [spark]

2024-06-07 Thread via GitHub
yaooqinn commented on PR #46905: URL: https://github.com/apache/spark/pull/46905#issuecomment-2154358935 Oops, My bad, can you cherry-pick 82b4ad2af64845503604da70ff02748c3969c991 into your new PR? I just reverted this to recover the CI -- This is an automated message from the Apache Git

Re: [PR] [SPARK-46393][SQL][TESTS] Fix UT [spark]

2024-06-07 Thread via GitHub
panbingkun commented on PR #46912: URL: https://github.com/apache/spark/pull/46912#issuecomment-2154344944 cc @cloud-fan @yaooqinn @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48551][SQL] Perf improvement for escapePathName [spark]

2024-06-07 Thread via GitHub
yaooqinn commented on code in PR #46894: URL: https://github.com/apache/spark/pull/46894#discussion_r1630824536 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala: ## @@ -63,22 +63,26 @@ object ExternalCatalogUtils { bitSet }

Re: [PR] [SPARK-37448][SQL] Multiple performance optimizations related to CurrentOrigin.withOrigin [spark]

2024-06-07 Thread via GitHub
JoshRosen commented on code in PR #46908: URL: https://github.com/apache/spark/pull/46908#discussion_r1630802165 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -315,32 +317,34 @@ trait ColumnResolutionHelper extends

Re: [PR] [SPARK-46393][SQL][FOLLOWUP] Classify exceptions in JDBCTableCatalog.loadTable [spark]

2024-06-07 Thread via GitHub
panbingkun commented on PR #46905: URL: https://github.com/apache/spark/pull/46905#issuecomment-2154326493 @cloud-fan @yaooqinn It seems that some `UTs` have failed, as follows: https://github.com/panbingkun/spark/actions/runs/9411155017/job/25926739251

  1   2   >