Re: [PR] Test guava used by the connect module to 33.2.1-jre [spark]

2024-07-10 Thread via GitHub
LuciferYang commented on PR #47296: URL: https://github.com/apache/spark/pull/47296#issuecomment-174096 Test first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[PR] Test guava used by the connect module to 33.2.1-jre [spark]

2024-07-10 Thread via GitHub
LuciferYang opened a new pull request, #47296: URL: https://github.com/apache/spark/pull/47296 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48850][DOCS][SS][SQL] Add documentation for new options added to State Data Source [spark]

2024-07-10 Thread via GitHub
HeartSaVioR commented on code in PR #47274: URL: https://github.com/apache/spark/pull/47274#discussion_r1673492943 ## docs/structured-streaming-state-data-source.md: ## @@ -144,16 +143,126 @@ The following configurations are optional: (none) Represents the target side to r

Re: [PR] [SPARK-47307][SQL] Add a config to optionally chunk base64 strings [spark]

2024-07-10 Thread via GitHub
yaooqinn commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-152362 Failed to get ahold of @ted-jenks, I'm pinging someone to take this over if you don't mind -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] [SPARK-48851][SQL] Change the value of `SCHEMA_NOT_FOUND` from `namespace` to `catalog.namespace` [spark]

2024-07-10 Thread via GitHub
panbingkun commented on PR #47276: URL: https://github.com/apache/spark/pull/47276#issuecomment-147757 > thanks, merging to master! Thank you for your review. I will fix similar scenarios (when `table` not exist), by adding `catalog name` for `the table` to make it more consis

Re: [PR] [SPARK-48856][SQL] Use isolated JobArtifactSet for each spark session [spark]

2024-07-10 Thread via GitHub
lifulong commented on code in PR #47281: URL: https://github.com/apache/spark/pull/47281#discussion_r1673481996 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -113,6 +113,8 @@ class SparkSession private( private[sql] val sessionUUID: String = UUID

Re: [PR] [SPARK-48864][SQL][TESTS] Refactor `HiveQuerySuite` and fix bug [spark]

2024-07-10 Thread via GitHub
panbingkun commented on code in PR #47293: URL: https://github.com/apache/spark/pull/47293#discussion_r1673470377 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala: ## @@ -1334,69 +1348,69 @@ class HiveQuerySuite extends HiveComparisonTest with

Re: [PR] [SPARK-48864][SQL][TESTS] Refactor `HiveQuerySuite` and fix bug [spark]

2024-07-10 Thread via GitHub
panbingkun commented on code in PR #47293: URL: https://github.com/apache/spark/pull/47293#discussion_r1673469988 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala: ## @@ -1167,43 +1177,47 @@ class HiveQuerySuite extends HiveComparisonTest with

Re: [PR] [SPARK-48864][SQL][TESTS] Refactor `HiveQuerySuite` and fix bug [spark]

2024-07-10 Thread via GitHub
panbingkun commented on code in PR #47293: URL: https://github.com/apache/spark/pull/47293#discussion_r1673460712 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala: ## @@ -854,12 +863,13 @@ class HiveQuerySuite extends HiveComparisonTest with S

Re: [PR] [SPARK-48864][SQL][TESTS] Refactor `HiveQuerySuite` and fix bug [spark]

2024-07-10 Thread via GitHub
panbingkun commented on code in PR #47293: URL: https://github.com/apache/spark/pull/47293#discussion_r1673460019 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala: ## @@ -804,31 +811,33 @@ class HiveQuerySuite extends HiveComparisonTest with S

Re: [PR] [SPARK-48864][SQL][TESTS] Refactor `HiveQuerySuite` and fix bug [spark]

2024-07-10 Thread via GitHub
panbingkun commented on code in PR #47293: URL: https://github.com/apache/spark/pull/47293#discussion_r1673458792 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala: ## @@ -679,15 +678,23 @@ class HiveQuerySuite extends HiveComparisonTest with S

Re: [PR] [SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL [spark]

2024-07-10 Thread via GitHub
szehon-ho commented on PR #46707: URL: https://github.com/apache/spark/pull/46707#issuecomment-109524 sounds good, thanks! i was planning to do follow up for write (INSERT) and can look at any other missing ones too. -- This is an automated message from the Apache Git Service. To res

Re: [PR] [SPARK-48866][SQL] Fix hints of valid charset in the error message of INVALID_PARAMETER_VALUE.CHARSET [spark]

2024-07-10 Thread via GitHub
yaooqinn commented on code in PR #47295: URL: https://github.com/apache/spark/pull/47295#discussion_r1673449865 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2584,7 +2584,7 @@ }, "CHARSET" : { "message" : [ - "expects one

Re: [PR] [SPARK-48863][SQL] Fix ClassCastException when parsing JSON with "spark.sql.json.enablePartialResults" enabled [spark]

2024-07-10 Thread via GitHub
HyukjinKwon closed pull request #47292: [SPARK-48863][SQL] Fix ClassCastException when parsing JSON with "spark.sql.json.enablePartialResults" enabled URL: https://github.com/apache/spark/pull/47292 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] [SPARK-48863][SQL] Fix ClassCastException when parsing JSON with "spark.sql.json.enablePartialResults" enabled [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on PR #47292: URL: https://github.com/apache/spark/pull/47292#issuecomment-102353 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48821][SQL] Support Update in DataFrameWriterV2 [spark]

2024-07-10 Thread via GitHub
szehon-ho commented on code in PR #47233: URL: https://github.com/apache/spark/pull/47233#discussion_r1673445233 ## sql/core/src/test/scala/org/apache/spark/sql/connector/UpdateDataFrameSuite.scala: ## @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[PR] [SPARK-48865][SQL] Add try_url_decode function [spark]

2024-07-10 Thread via GitHub
wForget opened a new pull request, #47294: URL: https://github.com/apache/spark/pull/47294 ### What changes were proposed in this pull request? Add a `try_url_decode` function that performs the same operation as `url_decode`, but returns a NULL value instead of raising an erro

Re: [PR] [SPARK-48841][SQL] Include `collationName` to `sql()` of `Collate` [spark]

2024-07-10 Thread via GitHub
panbingkun commented on PR #47265: URL: https://github.com/apache/spark/pull/47265#issuecomment-072693 The file conflict has been resolved, let it run GA again Thanks all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48851][SQL] Change the value of `SCHEMA_NOT_FOUND` from `namespace` to `catalog.namespace` [spark]

2024-07-10 Thread via GitHub
cloud-fan closed pull request #47276: [SPARK-48851][SQL] Change the value of `SCHEMA_NOT_FOUND` from `namespace` to `catalog.namespace` URL: https://github.com/apache/spark/pull/47276 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] [SPARK-48851][SQL] Change the value of `SCHEMA_NOT_FOUND` from `namespace` to `catalog.namespace` [spark]

2024-07-10 Thread via GitHub
cloud-fan commented on PR #47276: URL: https://github.com/apache/spark/pull/47276#issuecomment-054976 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-48652][SQL] Fix casting issue in Spark SQL when comparing string column to integer value [spark]

2024-07-10 Thread via GitHub
Zawa-ll commented on PR #47246: URL: https://github.com/apache/spark/pull/47246#issuecomment-038902 > When can this happen? Thanks for your question, @HyukjinKwon. This issue typically happens when we compare a string column to an integer value in Spark SQL. For instance, i

Re: [PR] [SPARK-48280][SQL][FOLLOWUP] Improve collation testing surface area using expression walking [spark]

2024-07-10 Thread via GitHub
HyukjinKwon closed pull request #47216: [SPARK-48280][SQL][FOLLOWUP] Improve collation testing surface area using expression walking URL: https://github.com/apache/spark/pull/47216 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48280][SQL][FOLLOWUP] Improve collation testing surface area using expression walking [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on PR #47216: URL: https://github.com/apache/spark/pull/47216#issuecomment-005593 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48592][INFRA] Add structured logging style script and GitHub workflow [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on code in PR #47239: URL: https://github.com/apache/spark/pull/47239#discussion_r1673384083 ## dev/structured-logging-style.py: ## @@ -0,0 +1,92 @@ +#!/usr/bin/env python3 Review Comment: Let's use underscore in the name `structured_logging_style.py`

Re: [PR] [SPARK-48592][INFRA] Add structured logging style script and GitHub workflow [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on code in PR #47239: URL: https://github.com/apache/spark/pull/47239#discussion_r1673383490 ## dev/structured-logging-style.py: ## @@ -0,0 +1,92 @@ +#!/usr/bin/env python3 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contri

Re: [PR] [SPARK-48592][INFRA] Add structured logging style script and GitHub workflow [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on code in PR #47239: URL: https://github.com/apache/spark/pull/47239#discussion_r1673382903 ## dev/structured-logging-style.py: ## @@ -0,0 +1,92 @@ +#!/usr/bin/env python3 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contri

Re: [PR] [SPARK-48828][DOCS] Update documentation to add `column` as alias of `col` [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on code in PR #47244: URL: https://github.com/apache/spark/pull/47244#discussion_r1673382372 ## python/pyspark/sql/functions/builtin.py: ## @@ -233,6 +233,7 @@ def lit(col: Any) -> Column: def col(col: str) -> Column: """ Returns a :class:`~pyspa

Re: [PR] [SPARK-48652][SQL] Fix casting issue in Spark SQL when comparing string column to integer value [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on PR #47246: URL: https://github.com/apache/spark/pull/47246#issuecomment-60 When can this happen? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-48832][CONNECT][TESTS] Add dedicated error tests for Spark Connect [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on code in PR #47250: URL: https://github.com/apache/spark/pull/47250#discussion_r1673359802 ## python/pyspark/errors/tests/test_errors.py: ## @@ -24,7 +24,7 @@ from pyspark.errors.utils import ErrorClassesReader -class ErrorsTest(unittest.TestCase): +

Re: [PR] [SPARK-48842][DOCS] Document non-determinism of max_by and min_by [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on code in PR #47266: URL: https://github.com/apache/spark/pull/47266#discussion_r1673358709 ## python/pyspark/sql/functions/builtin.py: ## @@ -1271,6 +1271,11 @@ def max_by(col: "ColumnOrName", ord: "ColumnOrName") -> Column: .. versionchanged:: 3.4.

Re: [PR] [SPARK-48856][SQL] Use isolated JobArtifactSet for each spark session [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on code in PR #47281: URL: https://github.com/apache/spark/pull/47281#discussion_r1673351143 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -113,6 +113,8 @@ class SparkSession private( private[sql] val sessionUUID: String = U

[PR] [SPARK-48864][SQL][TESTS] Refactor `HiveQuerySuite` and fix bug [spark]

2024-07-10 Thread via GitHub
panbingkun opened a new pull request, #47293: URL: https://github.com/apache/spark/pull/47293 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48858][PYTHON] Remove deprecated `setDaemon` method call of `Thread` in `log_communication.py` [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on PR #47282: URL: https://github.com/apache/spark/pull/47282#issuecomment-2221941350 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48858][PYTHON] Remove deprecated `setDaemon` method call of `Thread` in `log_communication.py` [spark]

2024-07-10 Thread via GitHub
HyukjinKwon closed pull request #47282: [SPARK-48858][PYTHON] Remove deprecated `setDaemon` method call of `Thread` in `log_communication.py` URL: https://github.com/apache/spark/pull/47282 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] [SPARK-48463][ML] Make StringIndexer supporting nested input columns [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on code in PR #47283: URL: https://github.com/apache/spark/pull/47283#discussion_r1673348039 ## mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala: ## @@ -124,14 +124,34 @@ private[feature] trait StringIndexerBase extends Params with HasH

Re: [PR] [SPARK-48784][SQL] Add ::: syntax as a shorthand for try_cast [spark]

2024-07-10 Thread via GitHub
cloud-fan commented on PR #47186: URL: https://github.com/apache/spark/pull/47186#issuecomment-2221936907 @dongjoon-hyun The parser extension is known to be cumbersome. We need to provide a fully functional ANTLR parser which will be run before Spark's own parser and fallback to Spark's par

Re: [PR] [SPARK-48529][SQL] Introduction of Labels in SQL Scripting [spark]

2024-07-10 Thread via GitHub
cloud-fan closed pull request #47146: [SPARK-48529][SQL] Introduction of Labels in SQL Scripting URL: https://github.com/apache/spark/pull/47146 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-48529][SQL] Introduction of Labels in SQL Scripting [spark]

2024-07-10 Thread via GitHub
cloud-fan commented on PR #47146: URL: https://github.com/apache/spark/pull/47146#issuecomment-2221918694 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-48726][SS] Create the StateSchemaV3 file format, and write this out for the TransformWithStateExec operator [spark]

2024-07-10 Thread via GitHub
HeartSaVioR closed pull request #47104: [SPARK-48726][SS] Create the StateSchemaV3 file format, and write this out for the TransformWithStateExec operator URL: https://github.com/apache/spark/pull/47104 -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] [SPARK-48726][SS] Create the StateSchemaV3 file format, and write this out for the TransformWithStateExec operator [spark]

2024-07-10 Thread via GitHub
HeartSaVioR commented on PR #47104: URL: https://github.com/apache/spark/pull/47104#issuecomment-2221898074 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48863][SQL] Fix ClassCastException when parsing JSON with "spark.sql.json.enablePartialResults" enabled [spark]

2024-07-10 Thread via GitHub
sadikovi commented on PR #47292: URL: https://github.com/apache/spark/pull/47292#issuecomment-2221889241 cc @HyukjinKwon @cloud-fan @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [SPARK-48863][SQL] Fix ClassCastException when parsing JSON with "spark.sql.json.enablePartialResults" enabled [spark]

2024-07-10 Thread via GitHub
sadikovi opened a new pull request, #47292: URL: https://github.com/apache/spark/pull/47292 ### What changes were proposed in this pull request? This PR fixes a bug in a corner case of JSON parsing when `spark.sql.json.enablePartialResults` is enabled. When running

Re: [PR] [SPARK-46743][SQL][FOLLOW UP] Count bug after ScalarSubqery is folded if it has an empty relation [spark]

2024-07-10 Thread via GitHub
cloud-fan commented on code in PR #47290: URL: https://github.com/apache/spark/pull/47290#discussion_r1673309467 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -89,6 +89,12 @@ object ConstantFolding extends Rule[LogicalPlan] {

Re: [PR] [SPARK-48763][FOLLOWUP] Make `dev/lint-scala` error message more accurate [spark]

2024-07-10 Thread via GitHub
panbingkun commented on PR #47291: URL: https://github.com/apache/spark/pull/47291#issuecomment-2221864596 > Merged to master. So quickly, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48763][FOLLOWUP] Make `dev/lint-scala` error message more accurate [spark]

2024-07-10 Thread via GitHub
HyukjinKwon closed pull request #47291: [SPARK-48763][FOLLOWUP] Make `dev/lint-scala` error message more accurate URL: https://github.com/apache/spark/pull/47291 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-48763][FOLLOWUP] Make `dev/lint-scala` error message more accurate [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on PR #47291: URL: https://github.com/apache/spark/pull/47291#issuecomment-2221863341 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48763][FOLLOWUP] Make `dev/lint-scala` error message more accurate [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on PR #47291: URL: https://github.com/apache/spark/pull/47291#issuecomment-2221863229 thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] [SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL [spark]

2024-07-10 Thread via GitHub
cloud-fan commented on PR #46707: URL: https://github.com/apache/spark/pull/46707#issuecomment-2221863093 @shardulm94 feel free to raise PRs! I think it makes sense to support DML as the `DataFrameWriter` API can also specify options -- This is an automated message from the Apache Git

Re: [PR] [SPARK-48763][FOLLOWUP] Make `dev/lint-scala` error message more accurate [spark]

2024-07-10 Thread via GitHub
panbingkun commented on PR #47291: URL: https://github.com/apache/spark/pull/47291#issuecomment-2221858031 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-48763][FOLLOWUP] Make `dev/lint-scala` error message more accurate [spark]

2024-07-10 Thread via GitHub
panbingkun opened a new pull request, #47291: URL: https://github.com/apache/spark/pull/47291 ### What changes were proposed in this pull request? The pr is followuping https://github.com/apache/spark/pull/47157 ### Why are the changes needed? ### Does this PR int

Re: [PR] [SPARK-48459][CONNECT][PYTHON][FOLLOWUP] Ignore to_plan from with_origin [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on PR #47284: URL: https://github.com/apache/spark/pull/47284#issuecomment-2221815121 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48459][CONNECT][PYTHON][FOLLOWUP] Ignore to_plan from with_origin [spark]

2024-07-10 Thread via GitHub
HyukjinKwon closed pull request #47284: [SPARK-48459][CONNECT][PYTHON][FOLLOWUP] Ignore to_plan from with_origin URL: https://github.com/apache/spark/pull/47284 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL [spark]

2024-07-10 Thread via GitHub
shardulm94 commented on PR #46707: URL: https://github.com/apache/spark/pull/46707#issuecomment-2221815885 Thanks @szehon-ho for this! This is really useful for `SELECT` statements. However we also have `UPDATE` and `DELETE` statements which can lead to a scans. Should we also add also add

Re: [PR] [SPARK-48862][PYTHON][CONNECT] Avoid calling `_proto_to_string` when INFO level is not enabled [spark]

2024-07-10 Thread via GitHub
HyukjinKwon closed pull request #47289: [SPARK-48862][PYTHON][CONNECT] Avoid calling `_proto_to_string` when INFO level is not enabled URL: https://github.com/apache/spark/pull/47289 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-48862][PYTHON][CONNECT] Avoid calling `_proto_to_string` when INFO level is not enabled [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on PR #47289: URL: https://github.com/apache/spark/pull/47289#issuecomment-2221812593 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48860][TESTS] Update `ui-test` to use `ws` 8.18.0 [spark]

2024-07-10 Thread via GitHub
HyukjinKwon closed pull request #47287: [SPARK-48860][TESTS] Update `ui-test` to use `ws` 8.18.0 URL: https://github.com/apache/spark/pull/47287 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-48860][TESTS] Update `ui-test` to use `ws` 8.18.0 [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on PR #47287: URL: https://github.com/apache/spark/pull/47287#issuecomment-2221812037 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48860][TESTS] Update `ui-test` to use `ws` 8.18.0 [spark]

2024-07-10 Thread via GitHub
yaooqinn commented on PR #47287: URL: https://github.com/apache/spark/pull/47287#issuecomment-2221809245 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] [SPARK-48821][SQL] Support Update in DataFrameWriterV2 [spark]

2024-07-10 Thread via GitHub
huaxingao commented on code in PR #47233: URL: https://github.com/apache/spark/pull/47233#discussion_r1673252385 ## sql/core/src/test/scala/org/apache/spark/sql/connector/UpdateDataFrameSuite.scala: ## @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] [SPARK-46743][SQL][FOLLOW UP] Count bug after ScalarSubqery is folded if it has an empty relation [spark]

2024-07-10 Thread via GitHub
andylam-db commented on PR #47290: URL: https://github.com/apache/spark/pull/47290#issuecomment-2221776878 Ping for reviews! @agubichev @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]

2024-07-10 Thread via GitHub
pky-c commented on PR #42352: URL: https://github.com/apache/spark/pull/42352#issuecomment-2221776003 We are also waiting for this feature and hope to see it in Spark 4.x -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[PR] [SPARK-46743][SQL][FOLLOW UP] Count bug after ScalarSubqery is folded if it has an empty relation [spark]

2024-07-10 Thread via GitHub
andylam-db opened a new pull request, #47290: URL: https://github.com/apache/spark/pull/47290 ### What changes were proposed in this pull request? In this PR https://github.com/apache/spark/pull/45125, we handled the case where an Aggregate is folded into a Project, causing a

[PR] [SPARK-48862][PYTHON][CONNECT] Avoid calling `_proto_to_string` when INFO level is not enabled [spark]

2024-07-10 Thread via GitHub
ueshin opened a new pull request, #47289: URL: https://github.com/apache/spark/pull/47289 ### What changes were proposed in this pull request? Avoid calling `_proto_to_string` when INFO level is not enabled. ### Why are the changes needed? We should avoid `_proto_to_strin

Re: [PR] [WIP] improve StaticInvoke [spark]

2024-07-10 Thread via GitHub
github-actions[bot] commented on PR #45795: URL: https://github.com/apache/spark/pull/45795#issuecomment-2221755974 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-48763][CONNECT][BUILD] Move connect server and common to builtin module [spark]

2024-07-10 Thread via GitHub
HyukjinKwon closed pull request #47157: [SPARK-48763][CONNECT][BUILD] Move connect server and common to builtin module URL: https://github.com/apache/spark/pull/47157 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-48763][CONNECT][BUILD] Move connect server and common to builtin module [spark]

2024-07-10 Thread via GitHub
HyukjinKwon commented on PR #47157: URL: https://github.com/apache/spark/pull/47157#issuecomment-2221750618 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48760][SQL] Fix CatalogV2Util.applyClusterByChanges [spark]

2024-07-10 Thread via GitHub
zedtang commented on code in PR #47288: URL: https://github.com/apache/spark/pull/47288#discussion_r1673185804 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/DescribeTableSuiteBase.scala: ## @@ -181,8 +181,9 @@ trait DescribeTableSuiteBase extends QueryTest wi

[PR] [SPARK-48760][SQL] Fix CatalogV2Util.applyClusterByChanges [spark]

2024-07-10 Thread via GitHub
zedtang opened a new pull request, #47288: URL: https://github.com/apache/spark/pull/47288 ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/47156/ introduced a bug in CatalogV2Util.applyClusterByChanges that it will remove the existing

Re: [PR] [SPARK-48700] [SQL] Mode expression for complex types (all collations) [spark]

2024-07-10 Thread via GitHub
GideonPotok commented on PR #47154: URL: https://github.com/apache/spark/pull/47154#issuecomment-2221670607 @uros-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] [SPARK-48860][TESTS] Update `ui-test` to use `ws` 8.18.0 [spark]

2024-07-10 Thread via GitHub
dongjoon-hyun commented on PR #47287: URL: https://github.com/apache/spark/pull/47287#issuecomment-2221590241 Could you review this `ui-test` update PR, @yaooqinn ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-48834][SQL] Disable variant input/output to python scalar UDFs, UDTFs, UDAFs during query compilation [spark]

2024-07-10 Thread via GitHub
richardc-db commented on code in PR #47253: URL: https://github.com/apache/spark/pull/47253#discussion_r1673072352 ## python/pyspark/sql/types.py: ## @@ -194,16 +194,7 @@ def fromDDL(cls, ddl: str) -> "DataType": >>> DataType.fromDDL("b: string, a: int") Struct

Re: [PR] [SPARK-48834][SQL] Disable variant input/output to python scalar UDFs, UDTFs, UDAFs during query compilation [spark]

2024-07-10 Thread via GitHub
richardc-db commented on code in PR #47253: URL: https://github.com/apache/spark/pull/47253#discussion_r1673073035 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala: ## @@ -63,6 +63,32 @@ trait PythonFuncExpression extends NonSQLExpression

Re: [PR] [SPARK-48834][SQL] Disable variant input/output to python scalar UDFs, UDTFs, UDAFs during query compilation [spark]

2024-07-10 Thread via GitHub
richardc-db commented on code in PR #47253: URL: https://github.com/apache/spark/pull/47253#discussion_r1673072226 ## python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py: ## @@ -748,6 +750,31 @@ def check_vectorized_udf_return_scalar(self): with self.assertRai

[PR] Metadata v2 purging [spark]

2024-07-10 Thread via GitHub
ericm-db opened a new pull request, #47286: URL: https://github.com/apache/spark/pull/47286 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How wa

Re: [PR] [WIP][SQL] Scala, Python, and R APIs for string validation functions [spark]

2024-07-10 Thread via GitHub
uros-db closed pull request #47255: [WIP][SQL] Scala, Python, and R APIs for string validation functions URL: https://github.com/apache/spark/pull/47255 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] [WIP][SQL] Scala, Python, and R APIs for string validation functions [spark]

2024-07-10 Thread via GitHub
uros-db opened a new pull request, #47285: URL: https://github.com/apache/spark/pull/47285 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[PR] [SPARK-48459][CONNECT][PYTHON][FOLLOWUP] Ignore to_plan from with_origin [spark]

2024-07-10 Thread via GitHub
ueshin opened a new pull request, #47284: URL: https://github.com/apache/spark/pull/47284 ### What changes were proposed in this pull request? Ignores `connect.Column.to_plan` from `with_origin`. ### Why are the changes needed? Capturing call site on `connect.Column.to_pl

Re: [PR] [SPARK-48495][SQL][DOCS] Describe shredding scheme for Variant [spark]

2024-07-10 Thread via GitHub
shaeqahmed commented on PR #46831: URL: https://github.com/apache/spark/pull/46831#issuecomment-2221416607 @cashmand Ah that makes sense, since marking those intermediate columns as required means the writer does not have to write an extra definition level. Thanks for updating the doc, this

Re: [PR] [SPARK-48592][INFRA] Add structured logging style script and GitHub workflow [spark]

2024-07-10 Thread via GitHub
asl3 commented on code in PR #47239: URL: https://github.com/apache/spark/pull/47239#discussion_r1672940133 ## dev/structured-logging-style.py: ## @@ -0,0 +1,92 @@ +#!/usr/bin/env python3 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor l

Re: [PR] [SPARK-48834][SQL] Disable variant input/output to python scalar UDFs, UDTFs, UDAFs during query compilation [spark]

2024-07-10 Thread via GitHub
allisonwang-db commented on PR #47253: URL: https://github.com/apache/spark/pull/47253#issuecomment-2221249061 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-48834][SQL] Disable variant input/output to python scalar UDFs, UDTFs, UDAFs during query compilation [spark]

2024-07-10 Thread via GitHub
allisonwang-db commented on code in PR #47253: URL: https://github.com/apache/spark/pull/47253#discussion_r1672808771 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -954,6 +954,16 @@ "The input of can't be type data." ] }, +

Re: [PR] [SPARK-48842][DOCS] Document non-determinism of max_by and min_by [spark]

2024-07-10 Thread via GitHub
allisonwang-db commented on code in PR #47266: URL: https://github.com/apache/spark/pull/47266#discussion_r1672804446 ## python/pyspark/sql/functions/builtin.py: ## @@ -1271,6 +1271,11 @@ def max_by(col: "ColumnOrName", ord: "ColumnOrName") -> Column: .. versionchanged:: 3

Re: [PR] [SPARK-48821][SQL] Support Update in DataFrameWriterV2 [spark]

2024-07-10 Thread via GitHub
szehon-ho commented on code in PR #47233: URL: https://github.com/apache/spark/pull/47233#discussion_r1672741213 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -4136,6 +4136,58 @@ class Dataset[T] private[sql]( new MergeIntoWriter[T](table, this, condi

Re: [PR] [SPARK-48835] Introduce versoning to jdbc connectors [spark]

2024-07-10 Thread via GitHub
milastdbx commented on PR #47181: URL: https://github.com/apache/spark/pull/47181#issuecomment-2221079932 @yaooqinn can you please provide feedback on this PR ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-48835] Introduce versoning to jdbc connectors [spark]

2024-07-10 Thread via GitHub
kostabiz-db commented on PR #47181: URL: https://github.com/apache/spark/pull/47181#issuecomment-2220945694 @milastdbx missing Salesforce dialect -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48823][DOCS] Improve clarity in `lag` docstring [spark]

2024-07-10 Thread via GitHub
yaooqinn commented on PR #47236: URL: https://github.com/apache/spark/pull/47236#issuecomment-2220883963 Sorry, I reverted this by 0fdebcca1857b2aaa7d89964654b54cf2da7b21c because the new format is incorrect. -- This is an automated message from the Apache Git Service. To respond to the m

[PR] [SPARK-48463][ML] Make StringIndexer supporting nested input columns [spark]

2024-07-10 Thread via GitHub
WeichenXu123 opened a new pull request, #47283: URL: https://github.com/apache/spark/pull/47283 ### What changes were proposed in this pull request? Make StringIndexer supporting nested input columns ### Why are the changes needed? User demand. ### Does thi

Re: [PR] [SPARK-48844][SQL] USE INVALID_EMPTY_LOCATION instead of UNSUPPORTED_DATASOURCE_FOR_DIRECT_QUERY when path is empty [spark]

2024-07-10 Thread via GitHub
yaooqinn commented on PR #47267: URL: https://github.com/apache/spark/pull/47267#issuecomment-2220853595 Thanks a lot @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48784][SQL] Add ::: syntax as a shorthand for try_cast [spark]

2024-07-10 Thread via GitHub
srielau commented on PR #47186: URL: https://github.com/apache/spark/pull/47186#issuecomment-2220843355 To bypass the worry about scala: How about `?::` It is arguably more descriptive. And a leading `?` could be used for other operations like ` a ?/ b` (instead of try_divide(a, b)) -- T

Re: [PR] [SPARK-48844][SQL] USE INVALID_EMPTY_LOCATION instead of UNSUPPORTED_DATASOURCE_FOR_DIRECT_QUERY when path is empty [spark]

2024-07-10 Thread via GitHub
dongjoon-hyun closed pull request #47267: [SPARK-48844][SQL] USE INVALID_EMPTY_LOCATION instead of UNSUPPORTED_DATASOURCE_FOR_DIRECT_QUERY when path is empty URL: https://github.com/apache/spark/pull/47267 -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] [SPARK-48784][SQL] Add ::: syntax as a shorthand for try_cast [spark]

2024-07-10 Thread via GitHub
dongjoon-hyun commented on PR #47186: URL: https://github.com/apache/spark/pull/47186#issuecomment-2220778827 > Do you mean adding a config to fail this syntax by default? No~ I meant to provide this as a built-in extension to enable this syntax. And, by default, the extension and thi

Re: [PR] [SPARK-48529][SQL] Introduction of Labels in SQL Scripting [spark]

2024-07-10 Thread via GitHub
miland-db commented on code in PR #47146: URL: https://github.com/apache/spark/pull/47146#discussion_r1672486165 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingLogicalOperators.scala: ## @@ -52,4 +52,6 @@ case class SingleStatement(parsedPlan: Lo

Re: [PR] [SPARK-48855][K8S][TESTS] Make `ExecutorPodsAllocatorSuite` independent from default allocation batch size [spark]

2024-07-10 Thread via GitHub
dongjoon-hyun commented on PR #47279: URL: https://github.com/apache/spark/pull/47279#issuecomment-2220757810 Thank you so much, @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-48529][SQL] Introduction of Labels in SQL Scripting [spark]

2024-07-10 Thread via GitHub
cloud-fan commented on code in PR #47146: URL: https://github.com/apache/spark/pull/47146#discussion_r1672432017 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingLogicalOperators.scala: ## @@ -52,4 +52,6 @@ case class SingleStatement(parsedPlan: Lo

Re: [PR] [SPARK-48828][DOCS] Update documentation to add `column` as alias of `col` [spark]

2024-07-10 Thread via GitHub
yaooqinn commented on PR #47244: URL: https://github.com/apache/spark/pull/47244#issuecomment-2220703354 Hi @thomhart31 The use of MINOR and JIRA ID in the PR title is mutually exclusive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-48823][DOCS] Improve clarity in `lag` docstring [spark]

2024-07-10 Thread via GitHub
yaooqinn commented on PR #47236: URL: https://github.com/apache/spark/pull/47236#issuecomment-2220693665 Merged to master, Thank you @thomhart31 @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48823][DOCS] Improve clarity in `lag` docstring [spark]

2024-07-10 Thread via GitHub
yaooqinn closed pull request #47236: [SPARK-48823][DOCS] Improve clarity in `lag` docstring URL: https://github.com/apache/spark/pull/47236 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [MINOR][DOCS] Add example to `countDistinct` [spark]

2024-07-10 Thread via GitHub
yaooqinn closed pull request #47235: [MINOR][DOCS] Add example to `countDistinct` URL: https://github.com/apache/spark/pull/47235 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] [MINOR][DOCS] Add example to `countDistinct` [spark]

2024-07-10 Thread via GitHub
yaooqinn commented on PR #47235: URL: https://github.com/apache/spark/pull/47235#issuecomment-2220685250 Merged to master, thank you @thomhart31 @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48855][K8S][TESTS] Make `ExecutorPodsAllocatorSuite` independent from default allocation batch size [spark]

2024-07-10 Thread via GitHub
yaooqinn closed pull request #47279: [SPARK-48855][K8S][TESTS] Make `ExecutorPodsAllocatorSuite` independent from default allocation batch size URL: https://github.com/apache/spark/pull/47279 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

  1   2   >