Re: [PR] [SPARK-47821][SQL] Implement is_variant_null expression [spark]

2024-04-15 Thread via GitHub
harshmotw-db commented on PR #46011: URL: https://github.com/apache/spark/pull/46011#issuecomment-2058281936 @cloud-fan Resolved -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47420][SQL] Fix test output [spark]

2024-04-15 Thread via GitHub
cloud-fan closed pull request #46058: [SPARK-47420][SQL] Fix test output URL: https://github.com/apache/spark/pull/46058 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on code in PR #46045: URL: https://github.com/apache/spark/pull/46045#discussion_r1566741860 ## python/pyspark/sql/functions/builtin.py: ## @@ -10985,7 +10994,9 @@ def split(str: "ColumnOrName", pattern: str, limit: int = -1) -> Column: >>>

Re: [PR] [SPARK-47420][SQL] Fix test output [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on PR #46058: URL: https://github.com/apache/spark/pull/46058#issuecomment-2058279111 the docker test failure is unrelated, merging to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47821][SQL] Implement is_variant_null expression [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on PR #46011: URL: https://github.com/apache/spark/pull/46011#issuecomment-2058278099 there are code conflicts again... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean [spark]

2024-04-15 Thread via GitHub
wForget commented on PR #45589: URL: https://github.com/apache/spark/pull/45589#issuecomment-2058275927 > @wForget can you help to create a 3.5 backport PR? thanks! Sure, I will create it as soon as possible, and thanks for your review. -- This is an automated message from the

Re: [PR] [SPARK-47769][SQL] Add schema_of_variant_agg expression. [spark]

2024-04-15 Thread via GitHub
cloud-fan closed pull request #45934: [SPARK-47769][SQL] Add schema_of_variant_agg expression. URL: https://github.com/apache/spark/pull/45934 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47769][SQL] Add schema_of_variant_agg expression. [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on PR #45934: URL: https://github.com/apache/spark/pull/45934#issuecomment-2058274239 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47810][SQL] Replace equivalent expression to <=> in join condition [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on code in PR #45999: URL: https://github.com/apache/spark/pull/45999#discussion_r1566736409 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJoinCondition.scala: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on PR #45589: URL: https://github.com/apache/spark/pull/45589#issuecomment-2058271672 @wForget can you help to create a 3.5 backport PR? thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean [spark]

2024-04-15 Thread via GitHub
cloud-fan closed pull request #45589: [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean URL: https://github.com/apache/spark/pull/45589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on PR #45589: URL: https://github.com/apache/spark/pull/45589#issuecomment-2058270805 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566734238 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -160,6 +160,8 @@ case class DataFrameQueryContext( val pysparkFragment:

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1565663205 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/origin.scala: ## @@ -76,6 +76,12 @@ object CurrentOrigin { value.get.copy(line = Some(line),

Re: [PR] [SPARK-46350][SS] Fix state removal for stream-stream join with one watermark and one time-interval condition [spark]

2024-04-15 Thread via GitHub
rangadi commented on code in PR #44323: URL: https://github.com/apache/spark/pull/44323#discussion_r1566712758 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinHelper.scala: ## @@ -219,10 +222,35 @@ object

Re: [PR] [WIP][SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566715537 ## python/pyspark/errors/utils.py: ## @@ -119,3 +124,59 @@ def get_message_template(self, error_class: str) -> str: message_template =

Re: [PR] [WIP][SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566715097 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -1319,6 +1289,9 @@ def test_dataframe_error_context(self): class DataFrameTests(DataFrameTestsMixin,

Re: [PR] [WIP][SPARK-47818][CONNECT] Introduce plan cache in SparkConnectPlanner to improve performance of Analyze requests [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on code in PR #46012: URL: https://github.com/apache/spark/pull/46012#discussion_r1566710260 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -381,6 +405,53 @@ case class SessionHolder(userId:

Re: [PR] [WIP][SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566707884 ## python/pyspark/errors/utils.py: ## @@ -119,3 +124,59 @@ def get_message_template(self, error_class: str) -> str: message_template =

Re: [PR] [WIP][SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566706967 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -1319,6 +1289,9 @@ def test_dataframe_error_context(self): class DataFrameTests(DataFrameTestsMixin,

Re: [PR] [SPARK-47745] Add License to Spark Operator repository [spark-kubernetes-operator]

2024-04-15 Thread via GitHub
viirya commented on PR #3: URL: https://github.com/apache/spark-kubernetes-operator/pull/3#issuecomment-2058226232 The vote was passed today. I created `kubernetes-operator-0.1.0` in Spark JIRA. All Spark k8s operator related JIRA tickets can use this version now. -- This is an

Re: [PR] [WIP][SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566705061 ## python/pyspark/errors/utils.py: ## @@ -119,3 +124,59 @@ def get_message_template(self, error_class: str) -> str: message_template =

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean [spark]

2024-04-15 Thread via GitHub
wForget commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1566701178 ## sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2Suite.scala: ## @@ -966,6 +966,22 @@ class DataSourceV2Suite extends QueryTest with

[PR] [DO-NOT-REVIEW] Speed up test_parity_listener [spark]

2024-04-15 Thread via GitHub
WweiL opened a new pull request, #46072: URL: https://github.com/apache/spark/pull/46072 This PR makes test_parity_listener run faster. The test was slow because of `TestListenerSparkV1` and `TestListenerSparkV2` makes server calls and has long wait time, and the test runs on both

Re: [PR] [SPARK-47862][PYTHON][CONNECT]Fix generation of proto files [spark]

2024-04-15 Thread via GitHub
HyukjinKwon closed pull request #46068: [SPARK-47862][PYTHON][CONNECT]Fix generation of proto files URL: https://github.com/apache/spark/pull/46068 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47862][PYTHON][CONNECT]Fix generation of proto files [spark]

2024-04-15 Thread via GitHub
HyukjinKwon commented on PR #46068: URL: https://github.com/apache/spark/pull/46068#issuecomment-2058191383 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [WIP][SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566678165 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -160,6 +160,8 @@ case class DataFrameQueryContext( val pysparkFragment:

Re: [PR] [SPARK-47862][PYTHON][CONNECT]Fix generation of proto files [spark]

2024-04-15 Thread via GitHub
grundprinzip commented on code in PR #46068: URL: https://github.com/apache/spark/pull/46068#discussion_r158212 ## dev/connect-gen-protos.sh: ## @@ -97,4 +100,4 @@ for f in `find gen/proto/python -name "*.py*"`; do done # Clean up everything. -rm -Rf gen +# rm -Rf gen

Re: [PR] [SPARK-47857][SQL] Utilize `java.sql.RowId.getBytes` API directly for UTF8String [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on code in PR #46062: URL: https://github.com/apache/spark/pull/46062#discussion_r157372 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -467,12 +467,8 @@ object JdbcUtils extends Logging with

Re: [PR] [SPARK-47081][CONNECT][FOLLOW] Unflake Progress Execution [spark]

2024-04-15 Thread via GitHub
grundprinzip commented on PR #46060: URL: https://github.com/apache/spark/pull/46060#issuecomment-2058171185 > LGTM - pending tests. In my manual testing, the issue arises simply when running our existing tests and the tests will fail in the python client code in `reattach.py` that

Re: [PR] [SPARK-46335][BUILD][3.5] Upgrade Maven to 3.9.6 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun closed pull request #46069: [SPARK-46335][BUILD][3.5] Upgrade Maven to 3.9.6 URL: https://github.com/apache/spark/pull/46069 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46335][BUILD][3.5] Upgrade Maven to 3.9.6 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #46069: URL: https://github.com/apache/spark/pull/46069#issuecomment-2058170454 Thank you, @yaooqinn !  Merged to branch-3.5 for Apache Spark 3.5.2+ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47861][BUILD] Upgrade `slf4j` to 2.0.13 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun closed pull request #46067: [SPARK-47861][BUILD] Upgrade `slf4j` to 2.0.13 URL: https://github.com/apache/spark/pull/46067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47861][BUILD] Upgrade `slf4j` to 2.0.13 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #46067: URL: https://github.com/apache/spark/pull/46067#issuecomment-2058169845 Thank you, @yaooqinn ! Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47862][PYTHON][CONNECT]Fix generation of proto files [spark]

2024-04-15 Thread via GitHub
grundprinzip commented on code in PR #46068: URL: https://github.com/apache/spark/pull/46068#discussion_r154745 ## dev/connect-gen-protos.sh: ## @@ -97,4 +100,4 @@ for f in `find gen/proto/python -name "*.py*"`; do done # Clean up everything. -rm -Rf gen +# rm -Rf gen

Re: [PR] [SPARK-47840][SS] Disable foldable propagation across Streaming Aggregate/Join nodes [spark]

2024-04-15 Thread via GitHub
HeartSaVioR closed pull request #46035: [SPARK-47840][SS] Disable foldable propagation across Streaming Aggregate/Join nodes URL: https://github.com/apache/spark/pull/46035 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47840][SS] Disable foldable propagation across Streaming Aggregate/Join nodes [spark]

2024-04-15 Thread via GitHub
HeartSaVioR commented on PR #46035: URL: https://github.com/apache/spark/pull/46035#issuecomment-2058168299 Thanks! Merging to master/3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47857][SQL] Utilize `java.sql.RowId.getBytes` API directly for UTF8String [spark]

2024-04-15 Thread via GitHub
yaooqinn commented on code in PR #46062: URL: https://github.com/apache/spark/pull/46062#discussion_r150692 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -467,12 +467,8 @@ object JdbcUtils extends Logging with

Re: [PR] [SPARK-47862][PYTHON][CONNECT]Fix generation of proto files [spark]

2024-04-15 Thread via GitHub
grundprinzip commented on code in PR #46068: URL: https://github.com/apache/spark/pull/46068#discussion_r150150 ## dev/connect-gen-protos.sh: ## @@ -97,4 +100,4 @@ for f in `find gen/proto/python -name "*.py*"`; do done # Clean up everything. -rm -Rf gen +# rm -Rf gen

Re: [PR] [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-15 Thread via GitHub
panbingkun commented on code in PR #46022: URL: https://github.com/apache/spark/pull/46022#discussion_r150002 ## connector/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala: ## @@ -325,7 +327,8 @@ private[spark] class

Re: [PR] [SPARK-47856][SQL] Document Mapping Spark SQL Data Types from Oracle and add tests [spark]

2024-04-15 Thread via GitHub
yaooqinn commented on PR #46059: URL: https://github.com/apache/spark/pull/46059#issuecomment-2058160869 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47233][CONNECT][SS][2/2] Client & Server logic for Client side streaming query listener [spark]

2024-04-15 Thread via GitHub
HyukjinKwon closed pull request #46037: [SPARK-47233][CONNECT][SS][2/2] Client & Server logic for Client side streaming query listener URL: https://github.com/apache/spark/pull/46037 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47233][CONNECT][SS][2/2] Client & Server logic for Client side streaming query listener [spark]

2024-04-15 Thread via GitHub
HyukjinKwon commented on PR #46037: URL: https://github.com/apache/spark/pull/46037#issuecomment-2058156554 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47840][SS] Disable foldable propagation across Streaming Aggregate/Join nodes [spark]

2024-04-15 Thread via GitHub
HeartSaVioR commented on PR #46035: URL: https://github.com/apache/spark/pull/46035#issuecomment-2058156425 @cloud-fan Could you please have a quick look at the change? I reviewed the test suite. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47371] [SQL] XML: Ignore row tags found in CDATA [spark]

2024-04-15 Thread via GitHub
HyukjinKwon closed pull request #45487: [SPARK-47371] [SQL] XML: Ignore row tags found in CDATA URL: https://github.com/apache/spark/pull/45487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47371] [SQL] XML: Ignore row tags found in CDATA [spark]

2024-04-15 Thread via GitHub
HyukjinKwon commented on PR #45487: URL: https://github.com/apache/spark/pull/45487#issuecomment-2058155599 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47857][SQL] Utilize `java.sql.RowId.getBytes` API directly for UTF8String [spark]

2024-04-15 Thread via GitHub
yaooqinn commented on code in PR #46062: URL: https://github.com/apache/spark/pull/46062#discussion_r1566653317 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -467,12 +467,8 @@ object JdbcUtils extends Logging with

Re: [PR] [SPARK-47866][SQL][TESTS] Use explicit GC in `PythonForeachWriterSuite` [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #46070: URL: https://github.com/apache/spark/pull/46070#issuecomment-2058151957 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47866][SQL][TESTS] Use explicit GC in `PythonForeachWriterSuite` [spark]

2024-04-15 Thread via GitHub
HyukjinKwon closed pull request #46070: [SPARK-47866][SQL][TESTS] Use explicit GC in `PythonForeachWriterSuite` URL: https://github.com/apache/spark/pull/46070 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47866][SQL][TESTS] Use explicit GC in `PythonForeachWriterSuite` [spark]

2024-04-15 Thread via GitHub
HyukjinKwon commented on PR #46070: URL: https://github.com/apache/spark/pull/46070#issuecomment-2058150429 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-15 Thread via GitHub
itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2058148979 Because I called `PySparkCurrentOrigin` directly on the `DataFrameQueryContext` without utilizing `withOrigin` in the initial implementation. I realized it from recent review from the

Re: [PR] [SPARK-47866][SQL][TESTS] Use explicit GC in `PythonForeachWriterSuite` [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #46070: URL: https://github.com/apache/spark/pull/46070#issuecomment-2058148483 Could you review this PR, @HyukjinKwon ? This is a best-effort approach to mitigate Apple Silicon CI flakiness issue. -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-47861][BUILD] Upgrade `slf4j` to 2.0.13 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #46067: URL: https://github.com/apache/spark/pull/46067#issuecomment-2058142527 Could you review this `slf4j` dependency PR when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-15 Thread via GitHub
CTCC1 commented on code in PR #46045: URL: https://github.com/apache/spark/pull/46045#discussion_r1566639687 ## python/pyspark/sql/functions/builtin.py: ## @@ -10985,7 +10994,9 @@ def split(str: "ColumnOrName", pattern: str, limit: int = -1) -> Column: >>>

Re: [PR] [SPARK-47833][SQL][CORE] Supply caller stackstrace for checkAndGlobPathIfNecessary AnalysisException [spark]

2024-04-15 Thread via GitHub
pan3793 commented on PR #46028: URL: https://github.com/apache/spark/pull/46028#issuecomment-2058132732 cc @gengliangwang @LuciferYang @mridulm WDYT of this approach for stacktrace enhancement? Or do you have other suggestions? -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-46935][DOCS] Consolidate error documentation [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on PR #44971: URL: https://github.com/apache/spark/pull/44971#issuecomment-2058128551 let's fix conflicts and move forward, thanks for the work! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47767][SQL] Show offset value in TakeOrderedAndProjectExec [spark]

2024-04-15 Thread via GitHub
guixiaowen commented on code in PR #45931: URL: https://github.com/apache/spark/pull/45931#discussion_r1566636044 ## sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala: ## @@ -358,7 +358,9 @@ case class TakeOrderedAndProjectExec( val orderByString =

Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

2024-04-15 Thread via GitHub
cloud-fan closed pull request #44902: [SPARK-46810][DOCS] Align error class terminology with SQL standard URL: https://github.com/apache/spark/pull/44902 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-46810][DOCS] Align error class terminology with SQL standard [spark]

2024-04-15 Thread via GitHub
cloud-fan commented on PR #44902: URL: https://github.com/apache/spark/pull/44902#issuecomment-2058126794 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on code in PR #46045: URL: https://github.com/apache/spark/pull/46045#discussion_r1566629222 ## python/pyspark/sql/functions/builtin.py: ## @@ -10972,6 +10976,11 @@ def split(str: "ColumnOrName", pattern: str, limit: int = -1) -> Column: ..

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on PR #46045: URL: https://github.com/apache/spark/pull/46045#issuecomment-2058120153 also cc @HyukjinKwon and @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47769][SQL] Add schema_of_variant_agg expression. [spark]

2024-04-15 Thread via GitHub
chenhao-db commented on PR #45934: URL: https://github.com/apache/spark/pull/45934#issuecomment-2058116518 @cloud-fan could you help merge it? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [SPARK-47867][SQL] Support variant in JSON scan. [spark]

2024-04-15 Thread via GitHub
chenhao-db opened a new pull request, #46071: URL: https://github.com/apache/spark/pull/46071 ### What changes were proposed in this pull request? This PR adds support for the variant type in the JSON scan. As part of this PR we introduce one new JSON option:

Re: [PR] [SPARK-47274][PYTHON][SQL] Provide more useful context for PySpark DataFrame API errors [spark]

2024-04-15 Thread via GitHub
HyukjinKwon commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-2058107235 > perfectly sync the data between two separately operating TheadLocal, CurrentOrigin and PySparkCurrentOrigin. Why is that? -- This is an automated message from the Apache

Re: [PR] [WIP][SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-15 Thread via GitHub
ueshin commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1566619646 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -160,6 +160,8 @@ case class DataFrameQueryContext( val pysparkFragment:

Re: [PR] [SPARK-47673][SS] Implementing TTL for ListState [spark]

2024-04-15 Thread via GitHub
HeartSaVioR closed pull request #45932: [SPARK-47673][SS] Implementing TTL for ListState URL: https://github.com/apache/spark/pull/45932 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47673][SS] Implementing TTL for ListState [spark]

2024-04-15 Thread via GitHub
HeartSaVioR commented on PR #45932: URL: https://github.com/apache/spark/pull/45932#issuecomment-2058102492 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-15 Thread via GitHub
panbingkun commented on PR #46022: URL: https://github.com/apache/spark/pull/46022#issuecomment-2058093392 > @panbingkun Thanks for the works. LGTM except for some minor comments. Updated. Thank you for your review! -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-47847][CORE] Deprecate spark.network.remoteReadNioBufferConversion [spark]

2024-04-15 Thread via GitHub
pan3793 commented on code in PR #46047: URL: https://github.com/apache/spark/pull/46047#discussion_r1566609420 ## core/src/main/scala/org/apache/spark/SparkConf.scala: ## @@ -640,7 +640,8 @@ private[spark] object SparkConf extends Logging {

Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-15 Thread via GitHub
gengliangwang closed pull request #45990: [SPARK-47804] Add Dataframe cache debug log URL: https://github.com/apache/spark/pull/45990 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-15 Thread via GitHub
gengliangwang commented on PR #45990: URL: https://github.com/apache/spark/pull/45990#issuecomment-2058080259 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-15 Thread via GitHub
gengliangwang commented on code in PR #45990: URL: https://github.com/apache/spark/pull/45990#discussion_r1566605060 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1609,6 +1609,19 @@ object SQLConf {

[PR] [SPARK-47866][SQL][TESTS] Deflaky `PythonForeachWriterSuite` [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun opened a new pull request, #46070: URL: https://github.com/apache/spark/pull/46070 ### What changes were proposed in this pull request? This PR aims to reduce the flakiness of `PythonForeachWriterSuite` in CIs by invoking `System.gc` explicitly before each test.

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-15 Thread via GitHub
CTCC1 commented on PR #46045: URL: https://github.com/apache/spark/pull/46045#issuecomment-2058059890 Actually my first PR here :) @zhengruifeng based on git blame you did something very similar before. Do you want to take a look? Thanks in advance! -- This is an automated message

Re: [PR] [SPARK-47862][PYTHON][CONNECT]Fix generation of proto files [spark]

2024-04-15 Thread via GitHub
HyukjinKwon commented on code in PR #46068: URL: https://github.com/apache/spark/pull/46068#discussion_r1566577876 ## dev/connect-gen-protos.sh: ## @@ -97,4 +100,4 @@ for f in `find gen/proto/python -name "*.py*"`; do done # Clean up everything. -rm -Rf gen +# rm -Rf gen

Re: [PR] [SPARK-43394][BUILD] Upgrade maven to 3.8.8 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #41073: URL: https://github.com/apache/spark/pull/41073#issuecomment-2058030002 During Apache Spark 3.4.2 RC1 release, I found that `build/mvn versions:set` could be flaky. Let me backport this to branch-3.4 because this is the last and stable bug fix version

[PR] [SPARK-46335][BUILD][3.5] Upgrade Maven to 3.9.6 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun opened a new pull request, #46069: URL: https://github.com/apache/spark/pull/46069 ### What changes were proposed in this pull request? This PR aims to upgrade `Apache Maven` to 3.9.6. ### Why are the changes needed? ### Does this PR introduce _any_

Re: [PR] [SPARK-47739][SQL] Register logical avro type [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #45895: URL: https://github.com/apache/spark/pull/45895#issuecomment-2058009387 To @milastdbx , why did you request a review to me when you didn't address my comment? ![Screenshot 2024-04-15 at 17 03

Re: [PR] [SPARK-47828][CONNECT][PYTHON][3.5] DataFrameWriterV2.overwrite fails with invalid plan [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on PR #46050: URL: https://github.com/apache/spark/pull/46050#issuecomment-2058007822 thank you @dongjoon-hyun and @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47855][CONNECT] Add `spark.sql.execution.arrow.pyspark.fallback.enabled` in the unsupported list [spark]

2024-04-15 Thread via GitHub
zhengruifeng commented on PR #46056: URL: https://github.com/apache/spark/pull/46056#issuecomment-2058007355 thank you @HyukjinKwon and @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47852][PYTHON] Support `DataFrameQueryContext` for reverse operations [spark]

2024-04-15 Thread via GitHub
itholic commented on PR #46053: URL: https://github.com/apache/spark/pull/46053#issuecomment-2058007102 This can be covered by https://github.com/apache/spark/pull/46063 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47828][CONNECT][PYTHON][3.5] DataFrameWriterV2.overwrite fails with invalid plan [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun closed pull request #46050: [SPARK-47828][CONNECT][PYTHON][3.5] DataFrameWriterV2.overwrite fails with invalid plan URL: https://github.com/apache/spark/pull/46050 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47852][PYTHON] Support `DataFrameQueryContext` for reverse operations [spark]

2024-04-15 Thread via GitHub
itholic closed pull request #46053: [SPARK-47852][PYTHON] Support `DataFrameQueryContext` for reverse operations URL: https://github.com/apache/spark/pull/46053 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47855][CONNECT] Add `spark.sql.execution.arrow.pyspark.fallback.enabled` in the unsupported list [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #46056: URL: https://github.com/apache/spark/pull/46056#issuecomment-2057989862 Merged to master. Thank you, @zhengruifeng and @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47855][CONNECT] Add `spark.sql.execution.arrow.pyspark.fallback.enabled` in the unsupported list [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun closed pull request #46056: [SPARK-47855][CONNECT] Add `spark.sql.execution.arrow.pyspark.fallback.enabled` in the unsupported list URL: https://github.com/apache/spark/pull/46056 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-47840][SS] Disable foldable propagation across Streaming Aggregate/Join nodes [spark]

2024-04-15 Thread via GitHub
sahnib commented on code in PR #46035: URL: https://github.com/apache/spark/pull/46035#discussion_r1566536030 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -978,7 +985,14 @@ object FoldablePropagation extends Rule[LogicalPlan] {

Re: [PR] [SPARK-47678][CORE] Check `spark.shuffle.readHostLocalDisk` when reading shuffle blocks [spark]

2024-04-15 Thread via GitHub
viirya commented on PR #45803: URL: https://github.com/apache/spark/pull/45803#issuecomment-2057938423 Thank you @hiboyang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47860][BUILD][K8S] Upgrade `kubernetes-client` to 6.12.0 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun closed pull request #46066: [SPARK-47860][BUILD][K8S] Upgrade `kubernetes-client` to 6.12.0 URL: https://github.com/apache/spark/pull/46066 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47860][BUILD][K8S] Upgrade `kubernetes-client` to 6.12.0 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #46066: URL: https://github.com/apache/spark/pull/46066#issuecomment-2057913181 Thank you, @huaxingao . Merged to master for Apache Spark 4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47678][CORE] Check `spark.shuffle.readHostLocalDisk` when reading shuffle blocks [spark]

2024-04-15 Thread via GitHub
hiboyang commented on code in PR #45803: URL: https://github.com/apache/spark/pull/45803#discussion_r1566498567 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -417,7 +418,7 @@ final class ShuffleBlockFetcherIterator(

Re: [PR] [SPARK-47678][CORE] Check `spark.shuffle.readHostLocalDisk` when reading shuffle blocks [spark]

2024-04-15 Thread via GitHub
hiboyang closed pull request #45803: [SPARK-47678][CORE] Check `spark.shuffle.readHostLocalDisk` when reading shuffle blocks URL: https://github.com/apache/spark/pull/45803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47860][BUILD][K8S] Upgrade `kubernetes-client` to 6.12.0 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #46066: URL: https://github.com/apache/spark/pull/46066#issuecomment-2057907006 Could you review this K8s PR, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47860][BUILD][K8S] Upgrade `kubernetes-client` to 6.12.0 [spark]

2024-04-15 Thread via GitHub
dongjoon-hyun commented on PR #46066: URL: https://github.com/apache/spark/pull/46066#issuecomment-2057906801 All K8s-related unit tests and integration tests passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-15 Thread via GitHub
gengliangwang commented on code in PR #45990: URL: https://github.com/apache/spark/pull/45990#discussion_r1566492674 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -153,10 +153,42 @@ trait Logging { if (log.isDebugEnabled) log.debug(msg) }

Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-15 Thread via GitHub
gengliangwang commented on code in PR #45990: URL: https://github.com/apache/spark/pull/45990#discussion_r1566492674 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -153,10 +153,42 @@ trait Logging { if (log.isDebugEnabled) log.debug(msg) }

Re: [PR] [SPARK-47804] Add Dataframe cache debug log [spark]

2024-04-15 Thread via GitHub
anchovYu commented on code in PR #45990: URL: https://github.com/apache/spark/pull/45990#discussion_r1566490570 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1609,6 +1609,19 @@ object SQLConf {

Re: [PR] [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-15 Thread via GitHub
gengliangwang commented on PR #46022: URL: https://github.com/apache/spark/pull/46022#issuecomment-2057848786 @panbingkun Thanks for the works. LGTM except for some minor comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-15 Thread via GitHub
gengliangwang commented on code in PR #46022: URL: https://github.com/apache/spark/pull/46022#discussion_r1566462197 ## connector/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala: ## @@ -325,7 +327,8 @@ private[spark] class

Re: [PR] [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-15 Thread via GitHub
gengliangwang commented on code in PR #46022: URL: https://github.com/apache/spark/pull/46022#discussion_r1566459384 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/KafkaDataConsumer.scala: ## @@ -391,10 +392,12 @@ private[kafka010] class

Re: [PR] [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-15 Thread via GitHub
gengliangwang commented on code in PR #46022: URL: https://github.com/apache/spark/pull/46022#discussion_r1566459174 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/KafkaDataConsumer.scala: ## @@ -391,10 +392,12 @@ private[kafka010] class

  1   2   3   4   >