[PR] [SPARK-45387][SQL]Optimize hive patition filter when the comparision dataType not match [spark]

2024-04-15 Thread via GitHub
lastbus opened a new pull request, #46073: URL: https://github.com/apache/spark/pull/46073 ### What changes were proposed in this pull request? During the PruneFileSourcePartitions process, we can optimize by casting the dataType of the constant to match the dataType of the corresponding

Re: [PR] [SPARK-47867][SQL] Support variant in JSON scan. [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on code in PR #46071: URL: https://github.com/apache/spark/pull/46071#discussion_r1566814075 ## sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -766,6 +769,17 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends

Re: [PR] [SPARK-47822][SQL] Prohibit Hash Expressions from hashing the Variant Data Type [spark]

2024-04-15 Thread via GitHub
harshmotw-db commented on PR #46017: URL: https://github.com/apache/spark/pull/46017#issuecomment-2058353524 @cloud-fan Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [Post-Refactor] [spark]

2024-04-15 Thread via GitHub
GideonPotok commented on PR #46040: URL: https://github.com/apache/spark/pull/46040#issuecomment-2058335965 @uros-db please re-review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-47822][SQL] Prohibit Hash Expressions from hashing the Variant Data Type [spark]

2024-04-15 Thread via GitHub
harshmotw-db commented on PR #46017: URL: https://github.com/apache/spark/pull/46017#issuecomment-2058330992 @cloud-fan It seems that they have completely revamped the `error-classes.json` file in [this PR](https://github.com/apache/spark/commit/c5b8e60e0d5956d9f648f77ae13a1558c99adf6b). I

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [Post-Refactor] [spark]

2024-04-15 Thread via GitHub
GideonPotok commented on code in PR #46040: URL: https://github.com/apache/spark/pull/46040#discussion_r1566782558 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -163,6 +163,155 @@ class CollationStringExpressionsSuite }) }

Re: [PR] [SPARK-47822][SQL] Prohibit Hash Expressions from hashing the Variant Data Type [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on PR #46017: URL: https://github.com/apache/spark/pull/46017#issuecomment-2058321342 @harshmotw-db can you fix the code conflicts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [Post-Refactor] [spark]

2024-04-15 Thread via GitHub
GideonPotok commented on code in PR #46040: URL: https://github.com/apache/spark/pull/46040#discussion_r1566772548 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -163,6 +163,155 @@ class CollationStringExpressionsSuite }) }

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-15 Thread via GitHub
CTCC1 commented on code in PR #46045: URL: https://github.com/apache/spark/pull/46045#discussion_r1566772319 ## python/pyspark/sql/connect/functions/builtin.py: ## @@ -2476,8 +2476,26 @@ def repeat(col: "ColumnOrName", n: Union["ColumnOrName", int]) -> Column: repeat.__doc__ =

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [Post-Refactor] [spark]

2024-04-15 Thread via GitHub
GideonPotok commented on code in PR #46040: URL: https://github.com/apache/spark/pull/46040#discussion_r1566773753 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -163,6 +163,155 @@ class CollationStringExpressionsSuite }) }

Re: [PR] [SPARK-47413][SQL] - add support to substr/left/right for collations [Post-Refactor] [spark]

2024-04-15 Thread via GitHub
GideonPotok commented on code in PR #46040: URL: https://github.com/apache/spark/pull/46040#discussion_r1566771920 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -163,6 +163,155 @@ class CollationStringExpressionsSuite }) }

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-15 Thread via GitHub
LuciferYang commented on code in PR #46045: URL: https://github.com/apache/spark/pull/46045#discussion_r1566755414 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -4197,6 +4197,20 @@ object functions { */ def split(str: Column, p

Re: [PR] [SPARK-47821][SQL] Implement is_variant_null expression [spark]

2024-04-15 Thread via GitHub
harshmotw-db commented on PR #46011: URL: https://github.com/apache/spark/pull/46011#issuecomment-2058281936 @cloud-fan Resolved -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] [SPARK-47420][SQL] Fix test output [spark]

2024-04-15 Thread via GitHub
cloud-fan closed pull request #46058: [SPARK-47420][SQL] Fix test output URL: https://github.com/apache/spark/pull/46058 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on code in PR #46045: URL: https://github.com/apache/spark/pull/46045#discussion_r1566741860 ## python/pyspark/sql/functions/builtin.py: ## @@ -10985,7 +10994,9 @@ def split(str: "ColumnOrName", pattern: str, limit: int = -1) -> Column: >>> df.select

Re: [PR] [SPARK-47420][SQL] Fix test output [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on PR #46058: URL: https://github.com/apache/spark/pull/46058#issuecomment-2058279111 the docker test failure is unrelated, merging to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47821][SQL] Implement is_variant_null expression [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on PR #46011: URL: https://github.com/apache/spark/pull/46011#issuecomment-2058278099 there are code conflicts again... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean [spark]

2024-04-15 Thread via GitHub
wForget commented on PR #45589: URL: https://github.com/apache/spark/pull/45589#issuecomment-2058275927 > @wForget can you help to create a 3.5 backport PR? thanks! Sure, I will create it as soon as possible, and thanks for your review. -- This is an automated message from the Apach

Re: [PR] [SPARK-47769][SQL] Add schema_of_variant_agg expression. [spark]

2024-04-15 Thread via GitHub
cloud-fan closed pull request #45934: [SPARK-47769][SQL] Add schema_of_variant_agg expression. URL: https://github.com/apache/spark/pull/45934 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-47769][SQL] Add schema_of_variant_agg expression. [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on PR #45934: URL: https://github.com/apache/spark/pull/45934#issuecomment-2058274239 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-47810][SQL] Replace equivalent expression to <=> in join condition [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on code in PR #45999: URL: https://github.com/apache/spark/pull/45999#discussion_r1566736409 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJoinCondition.scala: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on PR #45589: URL: https://github.com/apache/spark/pull/45589#issuecomment-2058271672 @wForget can you help to create a 3.5 backport PR? thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean [spark]

2024-04-15 Thread via GitHub
cloud-fan closed pull request #45589: [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean URL: https://github.com/apache/spark/pull/45589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on PR #45589: URL: https://github.com/apache/spark/pull/45589#issuecomment-2058270805 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566734238 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -160,6 +160,8 @@ case class DataFrameQueryContext( val pysparkFragment: St

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1565663205 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/origin.scala: ## @@ -76,6 +76,12 @@ object CurrentOrigin { value.get.copy(line = Some(line), startP

Re: [PR] [SPARK-46350][SS] Fix state removal for stream-stream join with one watermark and one time-interval condition [spark]

2024-04-15 Thread via GitHub
rangadi commented on code in PR #44323: URL: https://github.com/apache/spark/pull/44323#discussion_r1566712758 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinHelper.scala: ## @@ -219,10 +222,35 @@ object StreamingSymmetricHashJoinHe

Re: [PR] [WIP][SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566715537 ## python/pyspark/errors/utils.py: ## @@ -119,3 +124,59 @@ def get_message_template(self, error_class: str) -> str: message_template = main_message_templat

Re: [PR] [WIP][SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566715097 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -1319,6 +1289,9 @@ def test_dataframe_error_context(self): class DataFrameTests(DataFrameTestsMixin, ReusedSQL

Re: [PR] [WIP][SPARK-47818][CONNECT] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on code in PR #46012: URL: https://github.com/apache/spark/pull/46012#discussion_r1566710260 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -381,6 +405,53 @@ case class SessionHolder(userId: Strin

Re: [PR] [WIP][SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566707884 ## python/pyspark/errors/utils.py: ## @@ -119,3 +124,59 @@ def get_message_template(self, error_class: str) -> str: message_template = main_message_te

Re: [PR] [WIP][SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566706967 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -1319,6 +1289,9 @@ def test_dataframe_error_context(self): class DataFrameTests(DataFrameTestsMixin, Reus

Re: [PR] [SPARK-47745] Add License to Spark Operator repository [spark-kubernetes-operator]

2024-04-15 Thread via GitHub
viirya commented on PR #3: URL: https://github.com/apache/spark-kubernetes-operator/pull/3#issuecomment-2058226232 The vote was passed today. I created `kubernetes-operator-0.1.0` in Spark JIRA. All Spark k8s operator related JIRA tickets can use this version now. -- This is an automated

Re: [PR] [WIP][SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566705061 ## python/pyspark/errors/utils.py: ## @@ -119,3 +124,59 @@ def get_message_template(self, error_class: str) -> str: message_template = main_message_te

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean [spark]

2024-04-15 Thread via GitHub
wForget commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1566701178 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala: ## @@ -966,6 +966,22 @@ class DataSourceV2Suite extends QueryTest with SharedSparkSe

[PR] [DO-NOT-REVIEW] Speed up test_parity_listener [spark]

2024-04-15 Thread via GitHub
WweiL opened a new pull request, #46072: URL: https://github.com/apache/spark/pull/46072 This PR makes test_parity_listener run faster. The test was slow because of `TestListenerSparkV1` and `TestListenerSparkV2` makes server calls and has long wait time, and the test runs on both lis

Re: [PR] [SPARK-47862][PYTHON][CONNECT]Fix generation of proto files [spark]

2024-04-15 Thread via GitHub
HyukjinKwon closed pull request #46068: [SPARK-47862][PYTHON][CONNECT]Fix generation of proto files URL: https://github.com/apache/spark/pull/46068 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-47862][PYTHON][CONNECT]Fix generation of proto files [spark]

2024-04-15 Thread via GitHub
HyukjinKwon commented on PR #46068: URL: https://github.com/apache/spark/pull/46068#issuecomment-2058191383 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [WIP][SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566678165 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -160,6 +160,8 @@ case class DataFrameQueryContext( val pysparkFragment: Stri

Re: [PR] [SPARK-47862][PYTHON][CONNECT]Fix generation of proto files [spark]

2024-04-15 Thread via GitHub
grundprinzip commented on code in PR #46068: URL: https://github.com/apache/spark/pull/46068#discussion_r158212 ## dev/connect-gen-protos.sh: ## @@ -97,4 +100,4 @@ for f in `find gen/proto/python -name "*.py*"`; do done # Clean up everything. -rm -Rf gen +# rm -Rf gen R

Re: [PR] [SPARK-47857][SQL] Utilize `java.sql.RowId.getBytes` API directly for UTF8String [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on code in PR #46062: URL: https://github.com/apache/spark/pull/46062#discussion_r157372 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -467,12 +467,8 @@ object JdbcUtils extends Logging with SQLConfH

Re: [PR] [SPARK-47081][CONNECT][FOLLOW] Unflake Progress Execution [spark]

2024-04-15 Thread via GitHub
grundprinzip commented on PR #46060: URL: https://github.com/apache/spark/pull/46060#issuecomment-2058171185 > LGTM - pending tests. In my manual testing, the issue arises simply when running our existing tests and the tests will fail in the python client code in `reattach.py` that t

Re: [PR] [SPARK-46335][BUILD][3.5] Upgrade Maven to 3.9.6 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun closed pull request #46069: [SPARK-46335][BUILD][3.5] Upgrade Maven to 3.9.6 URL: https://github.com/apache/spark/pull/46069 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-46335][BUILD][3.5] Upgrade Maven to 3.9.6 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #46069: URL: https://github.com/apache/spark/pull/46069#issuecomment-2058170454 Thank you, @yaooqinn ! 😄 Merged to branch-3.5 for Apache Spark 3.5.2+ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] [SPARK-47861][BUILD] Upgrade `slf4j` to 2.0.13 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun closed pull request #46067: [SPARK-47861][BUILD] Upgrade `slf4j` to 2.0.13 URL: https://github.com/apache/spark/pull/46067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47861][BUILD] Upgrade `slf4j` to 2.0.13 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #46067: URL: https://github.com/apache/spark/pull/46067#issuecomment-2058169845 Thank you, @yaooqinn ! Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-47862][PYTHON][CONNECT]Fix generation of proto files [spark]

2024-04-15 Thread via GitHub
grundprinzip commented on code in PR #46068: URL: https://github.com/apache/spark/pull/46068#discussion_r154745 ## dev/connect-gen-protos.sh: ## @@ -97,4 +100,4 @@ for f in `find gen/proto/python -name "*.py*"`; do done # Clean up everything. -rm -Rf gen +# rm -Rf gen R

Re: [PR] [SPARK-47840][SS] Disable foldable propagation across Streaming Aggregate/Join nodes [spark]

2024-04-15 Thread via GitHub
HeartSaVioR closed pull request #46035: [SPARK-47840][SS] Disable foldable propagation across Streaming Aggregate/Join nodes URL: https://github.com/apache/spark/pull/46035 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47840][SS] Disable foldable propagation across Streaming Aggregate/Join nodes [spark]

2024-04-15 Thread via GitHub
HeartSaVioR commented on PR #46035: URL: https://github.com/apache/spark/pull/46035#issuecomment-2058168299 Thanks! Merging to master/3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-47857][SQL] Utilize `java.sql.RowId.getBytes` API directly for UTF8String [spark]

2024-04-15 Thread via GitHub
yaooqinn commented on code in PR #46062: URL: https://github.com/apache/spark/pull/46062#discussion_r150692 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -467,12 +467,8 @@ object JdbcUtils extends Logging with SQLConfHelper

Re: [PR] [SPARK-47862][PYTHON][CONNECT]Fix generation of proto files [spark]

2024-04-15 Thread via GitHub
grundprinzip commented on code in PR #46068: URL: https://github.com/apache/spark/pull/46068#discussion_r150150 ## dev/connect-gen-protos.sh: ## @@ -97,4 +100,4 @@ for f in `find gen/proto/python -name "*.py*"`; do done # Clean up everything. -rm -Rf gen +# rm -Rf gen R

Re: [PR] [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-15 Thread via GitHub
panbingkun commented on code in PR #46022: URL: https://github.com/apache/spark/pull/46022#discussion_r150002 ## connector/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala: ## @@ -325,7 +327,8 @@ private[spark] class DirectKafkaInpu

Re: [PR] [SPARK-47856][SQL] Document Mapping Spark SQL Data Types from Oracle and add tests [spark]

2024-04-15 Thread via GitHub
yaooqinn commented on PR #46059: URL: https://github.com/apache/spark/pull/46059#issuecomment-2058160869 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-47233][CONNECT][SS][2/2] Client & Server logic for Client side streaming query listener [spark]

2024-04-15 Thread via GitHub
HyukjinKwon closed pull request #46037: [SPARK-47233][CONNECT][SS][2/2] Client & Server logic for Client side streaming query listener URL: https://github.com/apache/spark/pull/46037 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-47233][CONNECT][SS][2/2] Client & Server logic for Client side streaming query listener [spark]

2024-04-15 Thread via GitHub
HyukjinKwon commented on PR #46037: URL: https://github.com/apache/spark/pull/46037#issuecomment-2058156554 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47840][SS] Disable foldable propagation across Streaming Aggregate/Join nodes [spark]

2024-04-15 Thread via GitHub
HeartSaVioR commented on PR #46035: URL: https://github.com/apache/spark/pull/46035#issuecomment-2058156425 @cloud-fan Could you please have a quick look at the change? I reviewed the test suite. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] [SPARK-47371] [SQL] XML: Ignore row tags found in CDATA [spark]

2024-04-15 Thread via GitHub
HyukjinKwon closed pull request #45487: [SPARK-47371] [SQL] XML: Ignore row tags found in CDATA URL: https://github.com/apache/spark/pull/45487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-47371] [SQL] XML: Ignore row tags found in CDATA [spark]

2024-04-15 Thread via GitHub
HyukjinKwon commented on PR #45487: URL: https://github.com/apache/spark/pull/45487#issuecomment-2058155599 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47857][SQL] Utilize `java.sql.RowId.getBytes` API directly for UTF8String [spark]

2024-04-15 Thread via GitHub
yaooqinn commented on code in PR #46062: URL: https://github.com/apache/spark/pull/46062#discussion_r1566653317 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -467,12 +467,8 @@ object JdbcUtils extends Logging with SQLConfHelper

Re: [PR] [SPARK-47866][SQL][TESTS] Use explicit GC in `PythonForeachWriterSuite` [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #46070: URL: https://github.com/apache/spark/pull/46070#issuecomment-2058151957 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-47866][SQL][TESTS] Use explicit GC in `PythonForeachWriterSuite` [spark]

2024-04-15 Thread via GitHub
HyukjinKwon closed pull request #46070: [SPARK-47866][SQL][TESTS] Use explicit GC in `PythonForeachWriterSuite` URL: https://github.com/apache/spark/pull/46070 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47866][SQL][TESTS] Use explicit GC in `PythonForeachWriterSuite` [spark]

2024-04-15 Thread via GitHub
HyukjinKwon commented on PR #46070: URL: https://github.com/apache/spark/pull/46070#issuecomment-2058150429 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-15 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2058148979 Because I called `PySparkCurrentOrigin` directly on the `DataFrameQueryContext` without utilizing `withOrigin` in the initial implementation. I realized it from recent review from the ref

Re: [PR] [SPARK-47866][SQL][TESTS] Use explicit GC in `PythonForeachWriterSuite` [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #46070: URL: https://github.com/apache/spark/pull/46070#issuecomment-2058148483 Could you review this PR, @HyukjinKwon ? This is a best-effort approach to mitigate Apple Silicon CI flakiness issue. -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-47861][BUILD] Upgrade `slf4j` to 2.0.13 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #46067: URL: https://github.com/apache/spark/pull/46067#issuecomment-2058142527 Could you review this `slf4j` dependency PR when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-15 Thread via GitHub
CTCC1 commented on code in PR #46045: URL: https://github.com/apache/spark/pull/46045#discussion_r1566639687 ## python/pyspark/sql/functions/builtin.py: ## @@ -10985,7 +10994,9 @@ def split(str: "ColumnOrName", pattern: str, limit: int = -1) -> Column: >>> df.select(split(

Re: [PR] [SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPathIfNecessary AnalysisException [spark]

2024-04-15 Thread via GitHub
pan3793 commented on PR #46028: URL: https://github.com/apache/spark/pull/46028#issuecomment-2058132732 cc @gengliangwang @LuciferYang @mridulm WDYT of this approach for stacktrace enhancement? Or do you have other suggestions? -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-46935][DOCS] Consolidate error documentation [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on PR #44971: URL: https://github.com/apache/spark/pull/44971#issuecomment-2058128551 let's fix conflicts and move forward, thanks for the work! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-47767][SQL] Show offset value in TakeOrderedAndProjectExec [spark]

2024-04-15 Thread via GitHub
guixiaowen commented on code in PR #45931: URL: https://github.com/apache/spark/pull/45931#discussion_r1566636044 ## sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala: ## @@ -358,7 +358,9 @@ case class TakeOrderedAndProjectExec( val orderByString = truncate

Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

2024-04-15 Thread via GitHub
cloud-fan closed pull request #44902: [SPARK-46810][DOCS] Align error class terminology with SQL standard URL: https://github.com/apache/spark/pull/44902 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on PR #44902: URL: https://github.com/apache/spark/pull/44902#issuecomment-2058126794 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on code in PR #46045: URL: https://github.com/apache/spark/pull/46045#discussion_r1566629222 ## python/pyspark/sql/functions/builtin.py: ## @@ -10972,6 +10976,11 @@ def split(str: "ColumnOrName", pattern: str, limit: int = -1) -> Column: .. versi

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on PR #46045: URL: https://github.com/apache/spark/pull/46045#issuecomment-2058120153 also cc @HyukjinKwon and @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47769][SQL] Add schema_of_variant_agg expression. [spark]

2024-04-15 Thread via GitHub
chenhao-db commented on PR #45934: URL: https://github.com/apache/spark/pull/45934#issuecomment-2058116518 @cloud-fan could you help merge it? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] [SPARK-47867][SQL] Support variant in JSON scan. [spark]

2024-04-15 Thread via GitHub
chenhao-db opened a new pull request, #46071: URL: https://github.com/apache/spark/pull/46071 ### What changes were proposed in this pull request? This PR adds support for the variant type in the JSON scan. As part of this PR we introduce one new JSON option: `spark.read.format

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-15 Thread via GitHub
HyukjinKwon commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2058107235 > perfectly sync the data between two separately operating TheadLocal, CurrentOrigin and PySparkCurrentOrigin. Why is that? -- This is an automated message from the Apache Gi

Re: [PR] [WIP][SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
ueshin commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566619646 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -160,6 +160,8 @@ case class DataFrameQueryContext( val pysparkFragment: Strin

Re: [PR] [SPARK-47673][SS] Implementing TTL for ListState [spark]

2024-04-15 Thread via GitHub
HeartSaVioR closed pull request #45932: [SPARK-47673][SS] Implementing TTL for ListState URL: https://github.com/apache/spark/pull/45932 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47673][SS] Implementing TTL for ListState [spark]

2024-04-15 Thread via GitHub
HeartSaVioR commented on PR #45932: URL: https://github.com/apache/spark/pull/45932#issuecomment-2058102492 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-15 Thread via GitHub
panbingkun commented on PR #46022: URL: https://github.com/apache/spark/pull/46022#issuecomment-2058093392 > @panbingkun Thanks for the works. LGTM except for some minor comments. Updated. Thank you for your review! -- This is an automated message from the Apache Git Service. To res

Re: [PR] [SPARK-47847][CORE] Deprecate spark.network.remoteReadNioBufferConversion [spark]

2024-04-15 Thread via GitHub
pan3793 commented on code in PR #46047: URL: https://github.com/apache/spark/pull/46047#discussion_r1566609420 ## core/src/main/scala/org/apache/spark/SparkConf.scala: ## @@ -640,7 +640,8 @@ private[spark] object SparkConf extends Logging { DeprecatedConfig("spark.blackli

Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-15 Thread via GitHub
gengliangwang closed pull request #45990: [SPARK-47804] Add Dataframe cache debug log URL: https://github.com/apache/spark/pull/45990 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-15 Thread via GitHub
gengliangwang commented on PR #45990: URL: https://github.com/apache/spark/pull/45990#issuecomment-2058080259 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-15 Thread via GitHub
gengliangwang commented on code in PR #45990: URL: https://github.com/apache/spark/pull/45990#discussion_r1566605060 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1609,6 +1609,19 @@ object SQLConf { .checkValues(StorageLevelMapper.values

[PR] [SPARK-47866][SQL][TESTS] Deflaky `PythonForeachWriterSuite` [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun opened a new pull request, #46070: URL: https://github.com/apache/spark/pull/46070 ### What changes were proposed in this pull request? This PR aims to reduce the flakiness of `PythonForeachWriterSuite` in CIs by invoking `System.gc` explicitly before each test. #

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-15 Thread via GitHub
CTCC1 commented on PR #46045: URL: https://github.com/apache/spark/pull/46045#issuecomment-2058059890 Actually my first PR here :) @zhengruifeng based on git blame you did something very similar before. Do you want to take a look? Thanks in advance! -- This is an automated message from

Re: [PR] [SPARK-47862][PYTHON][CONNECT]Fix generation of proto files [spark]

2024-04-15 Thread via GitHub
HyukjinKwon commented on code in PR #46068: URL: https://github.com/apache/spark/pull/46068#discussion_r1566577876 ## dev/connect-gen-protos.sh: ## @@ -97,4 +100,4 @@ for f in `find gen/proto/python -name "*.py*"`; do done # Clean up everything. -rm -Rf gen +# rm -Rf gen Re

Re: [PR] [SPARK-43394][BUILD] Upgrade maven to 3.8.8 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #41073: URL: https://github.com/apache/spark/pull/41073#issuecomment-2058030002 During Apache Spark 3.4.2 RC1 release, I found that `build/mvn versions:set` could be flaky. Let me backport this to branch-3.4 because this is the last and stable bug fix version o

[PR] [SPARK-46335][BUILD][3.5] Upgrade Maven to 3.9.6 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun opened a new pull request, #46069: URL: https://github.com/apache/spark/pull/46069 ### What changes were proposed in this pull request? This PR aims to upgrade `Apache Maven` to 3.9.6. ### Why are the changes needed? ### Does this PR introduce _any_

Re: [PR] [SPARK-47739][SQL] Register logical avro type [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #45895: URL: https://github.com/apache/spark/pull/45895#issuecomment-2058009387 To @milastdbx , why did you request a review to me when you didn't address my comment? ![Screenshot 2024-04-15 at 17 03 41](https://github.com/apache/spark/assets/9700541/518

Re: [PR] [SPARK-47828][CONNECT][PYTHON][3.5] DataFrameWriterV2.overwrite fails with invalid plan [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on PR #46050: URL: https://github.com/apache/spark/pull/46050#issuecomment-2058007822 thank you @dongjoon-hyun and @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-47855][CONNECT] Add `spark.sql.execution.arrow.pyspark.fallback.enabled` in the unsupported list [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on PR #46056: URL: https://github.com/apache/spark/pull/46056#issuecomment-2058007355 thank you @HyukjinKwon and @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-47852][PYTHON] Support `DataFrameQueryContext` for reverse operations [spark]

2024-04-15 Thread via GitHub
itholic commented on PR #46053: URL: https://github.com/apache/spark/pull/46053#issuecomment-2058007102 This can be covered by https://github.com/apache/spark/pull/46063 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47828][CONNECT][PYTHON][3.5] DataFrameWriterV2.overwrite fails with invalid plan [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun closed pull request #46050: [SPARK-47828][CONNECT][PYTHON][3.5] DataFrameWriterV2.overwrite fails with invalid plan URL: https://github.com/apache/spark/pull/46050 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47852][PYTHON] Support `DataFrameQueryContext` for reverse operations [spark]

2024-04-15 Thread via GitHub
itholic closed pull request #46053: [SPARK-47852][PYTHON] Support `DataFrameQueryContext` for reverse operations URL: https://github.com/apache/spark/pull/46053 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-47855][CONNECT] Add `spark.sql.execution.arrow.pyspark.fallback.enabled` in the unsupported list [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #46056: URL: https://github.com/apache/spark/pull/46056#issuecomment-2057989862 Merged to master. Thank you, @zhengruifeng and @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-47855][CONNECT] Add `spark.sql.execution.arrow.pyspark.fallback.enabled` in the unsupported list [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun closed pull request #46056: [SPARK-47855][CONNECT] Add `spark.sql.execution.arrow.pyspark.fallback.enabled` in the unsupported list URL: https://github.com/apache/spark/pull/46056 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-47840][SS] Disable foldable propagation across Streaming Aggregate/Join nodes [spark]

2024-04-15 Thread via GitHub
sahnib commented on code in PR #46035: URL: https://github.com/apache/spark/pull/46035#discussion_r1566536030 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -978,7 +985,14 @@ object FoldablePropagation extends Rule[LogicalPlan] {

Re: [PR] [SPARK-47678][CORE] Check `spark.shuffle.readHostLocalDisk` when reading shuffle blocks [spark]

2024-04-15 Thread via GitHub
viirya commented on PR #45803: URL: https://github.com/apache/spark/pull/45803#issuecomment-2057938423 Thank you @hiboyang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47860][BUILD][K8S] Upgrade `kubernetes-client` to 6.12.0 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun closed pull request #46066: [SPARK-47860][BUILD][K8S] Upgrade `kubernetes-client` to 6.12.0 URL: https://github.com/apache/spark/pull/46066 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

  1   2   3   4   >