Re: [PR] [SPARK-45796][SQL] Support MODE() WITHIN GROUP (ORDER BY col) [spark]

2023-12-08 Thread via GitHub
beliefer commented on code in PR #44184: URL: https://github.com/apache/spark/pull/44184#discussion_r1421301333 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -54,52 +53,22 @@ case class Mode( child: Expression,

Re: [PR] [SPARK-46337][SQL][CONNECT][PYTHON] Make `CTESubstitution` retain the `PLAN_ID_TAG` [spark]

2023-12-08 Thread via GitHub
HyukjinKwon commented on PR #44268: URL: https://github.com/apache/spark/pull/44268#issuecomment-1848278646 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46337][SQL][CONNECT][PYTHON] Make `CTESubstitution` retain the `PLAN_ID_TAG` [spark]

2023-12-08 Thread via GitHub
HyukjinKwon closed pull request #44268: [SPARK-46337][SQL][CONNECT][PYTHON] Make `CTESubstitution` retain the `PLAN_ID_TAG` URL: https://github.com/apache/spark/pull/44268 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-45597][PYTHON][SQL] Support creating table using a Python data source in SQL (single wrapper) [spark]

2023-12-08 Thread via GitHub
HyukjinKwon commented on PR #44233: URL: https://github.com/apache/spark/pull/44233#issuecomment-1848275925 Update: we're offline discussing. I will make another POC PR. We will write up the summary in the final PR description. -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for limit outside of window [spark]

2023-12-08 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1421267218 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +72,55 @@ object InferWindowGroupLimit extends

[PR] [SPARK-46339][SS] Directory with batch number name should not be treated as metadata log [spark]

2023-12-08 Thread via GitHub
viirya opened a new pull request, #44272: URL: https://github.com/apache/spark/pull/44272 ### What changes were proposed in this pull request? This patch adds a filter to two existing `CheckpointFileManager` implementations' `list method to filter out directories. This

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for limit outside of window [spark]

2023-12-08 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1421267218 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +72,55 @@ object InferWindowGroupLimit extends

[PR] [SPARK-46338][PS][TESTS] Re-enable the `get_item` test for `BasicIndexingTests` [spark]

2023-12-08 Thread via GitHub
itholic opened a new pull request, #44271: URL: https://github.com/apache/spark/pull/44271 ### What changes were proposed in this pull request? This PR proposes to re-enable the `get_item` test for `BasicIndexingTests`. ### Why are the changes needed? To

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for limit outside of window [spark]

2023-12-08 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1421267218 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +72,55 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-46282][PYTHON][DOCS] Create a Standalone Page for DataFrame API in PySpark Documentation [spark]

2023-12-08 Thread via GitHub
itholic commented on code in PR #44201: URL: https://github.com/apache/spark/pull/44201#discussion_r1421267139 ## python/docs/source/index.rst: ## @@ -72,20 +76,25 @@ DataFrames, Structured Streaming, Machine Learning (MLlib) and Spark Core. :alt: Spark Core and

Re: [PR] [SPARK-46322][PYTHON][DOCS] Replace external link with internal link for error documentation [spark]

2023-12-08 Thread via GitHub
itholic commented on code in PR #44251: URL: https://github.com/apache/spark/pull/44251#discussion_r1421264545 ## python/pyspark/errors_doc_gen.py: ## @@ -43,7 +43,7 @@ def generate_errors_doc(output_rst_file_path: str) -> None: This is a list of common, named error classes

Re: [PR] [SPARK-46322][PYTHON][DOCS] Replace external link with internal link for error documentation [spark]

2023-12-08 Thread via GitHub
itholic commented on code in PR #44251: URL: https://github.com/apache/spark/pull/44251#discussion_r1421264545 ## python/pyspark/errors_doc_gen.py: ## @@ -43,7 +43,7 @@ def generate_errors_doc(output_rst_file_path: str) -> None: This is a list of common, named error classes

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for limit outside of window [spark]

2023-12-08 Thread via GitHub
cloud-fan commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1421259543 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +72,55 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-08 Thread via GitHub
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421258729 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -317,34 +342,24 @@ class UnboundedOffsetWindowFunctionFrame(

Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-08 Thread via GitHub
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421255750 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -284,7 +307,9 @@ class FrameLessOffsetWindowFunctionFrame( }

Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-08 Thread via GitHub
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421255704 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +225,19 @@ class FrameLessOffsetWindowFunctionFrame(

Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-08 Thread via GitHub
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1421255497 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +225,19 @@ class FrameLessOffsetWindowFunctionFrame(

Re: [PR] [SPARK-45796][SQL] Support MODE() WITHIN GROUP (ORDER BY col) [spark]

2023-12-08 Thread via GitHub
cloud-fan commented on PR #44184: URL: https://github.com/apache/spark/pull/44184#issuecomment-1848226804 can we link some SQL doc from other databases? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-45796][SQL] Support MODE() WITHIN GROUP (ORDER BY col) [spark]

2023-12-08 Thread via GitHub
cloud-fan commented on code in PR #44184: URL: https://github.com/apache/spark/pull/44184#discussion_r1421253816 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -146,8 +114,86 @@ case class Mode( override def

Re: [PR] [SPARK-45796][SQL] Support MODE() WITHIN GROUP (ORDER BY col) [spark]

2023-12-08 Thread via GitHub
cloud-fan commented on code in PR #44184: URL: https://github.com/apache/spark/pull/44184#discussion_r1421253652 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -54,52 +53,22 @@ case class Mode( child: Expression,

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for limit outside of window [spark]

2023-12-08 Thread via GitHub
beliefer commented on PR #44145: URL: https://github.com/apache/spark/pull/44145#issuecomment-1848225342 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [WIP][Used for discussion and decision-making] Improve the java text block with java15 feature. [spark]

2023-12-08 Thread via GitHub
beliefer commented on PR #44270: URL: https://github.com/apache/spark/pull/44270#issuecomment-1848224659 cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-45796][SQL] Support MODE() WITHIN GROUP (ORDER BY col) [spark]

2023-12-08 Thread via GitHub
beliefer commented on PR #44184: URL: https://github.com/apache/spark/pull/44184#issuecomment-1848223733 @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-46285][SQL] Add foreachWithSubqueries [spark]

2023-12-08 Thread via GitHub
beliefer commented on code in PR #44206: URL: https://github.com/apache/spark/pull/44206#discussion_r1421252671 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/LogicalPlanSuite.scala: ## @@ -145,4 +146,16 @@ class LogicalPlanSuite extends SparkFunSuite {

Re: [PR] [WIP][Used for discussion and decision-making] Improve the java text block with java15 feature. [spark]

2023-12-08 Thread via GitHub
beliefer commented on PR #44270: URL: https://github.com/apache/spark/pull/44270#issuecomment-1848217893 @dongjoon-hyun @srowen @LuciferYang What are your suggestions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] [WIP][Used for discussion and decision-making] Improve the java text block with java15 feature. [spark]

2023-12-08 Thread via GitHub
beliefer opened a new pull request, #44270: URL: https://github.com/apache/spark/pull/44270 ### What changes were proposed in this pull request? Now, Spark upgraded to Java17 which contains many new feature. The text block is a feature released in Java15. When using text blocks, the

Re: [PR] [SPARK-46285][SQL] Add foreachWithSubqueries [spark]

2023-12-08 Thread via GitHub
cloud-fan commented on code in PR #44206: URL: https://github.com/apache/spark/pull/44206#discussion_r1421248570 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/LogicalPlanSuite.scala: ## @@ -145,4 +146,16 @@ class LogicalPlanSuite extends SparkFunSuite {

Re: [PR] [SPARK-46246] EXECUTE IMMEDIATE SQL support [spark]

2023-12-08 Thread via GitHub
dtenedor commented on code in PR #44093: URL: https://github.com/apache/spark/pull/44093#discussion_r1421247769 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -1005,6 +1005,12 @@ ], "sqlState" : "42702" }, +

Re: [PR] [SPARK-46285][SQL] Add foreachWithSubqueries [spark]

2023-12-08 Thread via GitHub
cloud-fan commented on code in PR #44206: URL: https://github.com/apache/spark/pull/44206#discussion_r1421248561 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/LogicalPlanSuite.scala: ## @@ -145,4 +146,16 @@ class LogicalPlanSuite extends SparkFunSuite {

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for limit outside of window [spark]

2023-12-08 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1421248218 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -17,21 +17,25 @@ package

[PR] [SPARK-45597][PYTHON][SQL] Support creating table using a Python data source in SQL [spark]

2023-12-08 Thread via GitHub
cloud-fan opened a new pull request, #44269: URL: https://github.com/apache/spark/pull/44269 ### What changes were proposed in this pull request? This PR adds a general framework to support any user-defined data source (name is not finalized yet) as a v2 source. A

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for limit outside of window [spark]

2023-12-08 Thread via GitHub
beliefer commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1421239656 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -17,21 +17,25 @@ package

Re: [PR] [SPARK-46337][SQL][CONNECT][PYTHON] Make `CTESubstitution` retain the `PLAN_ID_TAG` [spark]

2023-12-08 Thread via GitHub
zhengruifeng commented on PR #44268: URL: https://github.com/apache/spark/pull/44268#issuecomment-1848147038 cc @hvanhovell @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-46337][SQL][CONNECT][PYTHON] Make `CTESubstitution` retain the `PLAN_ID_TAG` [spark]

2023-12-08 Thread via GitHub
zhengruifeng opened a new pull request, #44268: URL: https://github.com/apache/spark/pull/44268 ### What changes were proposed in this pull request? Make `CTESubstitution` retain the `PLAN_ID_TAG` ### Why are the changes needed? before this PR: ``` df1 =

Re: [PR] [SPARK-46334][INFRA][PS] Upgrade `Pandas` to 2.1.4 [spark]

2023-12-08 Thread via GitHub
dongjoon-hyun commented on PR #44266: URL: https://github.com/apache/spark/pull/44266#issuecomment-1848098757 Merged to master. Thank you, @bjornjorgensen . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-46334][INFRA][PS] Upgrade `Pandas` to 2.1.4 [spark]

2023-12-08 Thread via GitHub
dongjoon-hyun closed pull request #44266: [SPARK-46334][INFRA][PS] Upgrade `Pandas` to 2.1.4 URL: https://github.com/apache/spark/pull/44266 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46335][BUILD] Upgrade Maven to 3.9.6 for MNG-7913 [spark]

2023-12-08 Thread via GitHub
dongjoon-hyun commented on PR #44267: URL: https://github.com/apache/spark/pull/44267#issuecomment-1848098386 Thank you, @LuciferYang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46335][BUILD] Upgrade Maven to 3.9.6 for MNG-7913 [spark]

2023-12-08 Thread via GitHub
LuciferYang commented on PR #44267: URL: https://github.com/apache/spark/pull/44267#issuecomment-1848091312 Merged into master for Spark 4.0. Thanks @dongjoon-hyun ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-46335][BUILD] Upgrade Maven to 3.9.6 for MNG-7913 [spark]

2023-12-08 Thread via GitHub
LuciferYang closed pull request #44267: [SPARK-46335][BUILD] Upgrade Maven to 3.9.6 for MNG-7913 URL: https://github.com/apache/spark/pull/44267 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46132][CORE] Support key password for JKS keys for RPC SSL [spark]

2023-12-08 Thread via GitHub
hasnain-db commented on PR #44264: URL: https://github.com/apache/spark/pull/44264#issuecomment-1848016777 hm, CI had worked fine on my local branch before I submitted this. Will submit an empty commit to retrigger CI -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-46285][SQL] Add foreachWithSubqueries [spark]

2023-12-08 Thread via GitHub
amaliujia commented on PR #44206: URL: https://github.com/apache/spark/pull/44206#issuecomment-1847998786 @cloud-fan I updated the test case. Please take a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-42789][SQL] Rewrite multiple GetJsonObjects to a JsonTuple if their json expressions are the same [spark]

2023-12-08 Thread via GitHub
github-actions[bot] commented on PR #40419: URL: https://github.com/apache/spark/pull/40419#issuecomment-1847997664 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-41006][K8S] Generate new ConfigMap names for each run [spark]

2023-12-08 Thread via GitHub
github-actions[bot] commented on PR #40491: URL: https://github.com/apache/spark/pull/40491#issuecomment-1847997646 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-44998] Do not retry when FileNotFoundException occurs [spark]

2023-12-08 Thread via GitHub
github-actions[bot] commented on PR #42711: URL: https://github.com/apache/spark/pull/42711#issuecomment-1847997629 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-46324][SQL][PYTHON] Fix the output name of pyspark.sql.functions.user and session_user [spark]

2023-12-08 Thread via GitHub
HyukjinKwon commented on PR #44253: URL: https://github.com/apache/spark/pull/44253#issuecomment-1847988831 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-46335][BUILD] Upgrade Maven to 3.9.6 for MNG-7913 [spark]

2023-12-08 Thread via GitHub
dongjoon-hyun commented on PR #44267: URL: https://github.com/apache/spark/pull/44267#issuecomment-1847984729 cc @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45597][PYTHON][SQL] Support creating table using a Python data source in SQL (single wrapper) [spark]

2023-12-08 Thread via GitHub
HyukjinKwon commented on PR #44233: URL: https://github.com/apache/spark/pull/44233#issuecomment-1847983279 Was investigating pros and cons. For now, this one is more likely the one but waiting @cloud-fan 's sign off :-). -- This is an automated message from the Apache Git Service. To

[PR] [SPARK-46335][BUILD] Upgrade Maven to 3.9.6 for MNG-7913 [spark]

2023-12-08 Thread via GitHub
dongjoon-hyun opened a new pull request, #44267: URL: https://github.com/apache/spark/pull/44267 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-46275][3.4] Protobuf: Return null in permissive mode when deserialization fails [spark]

2023-12-08 Thread via GitHub
rangadi commented on PR #44265: URL: https://github.com/apache/spark/pull/44265#issuecomment-1847932270 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-46275][3.4] Protobuf: Return null in permissive mode when deserialization fails [spark]

2023-12-08 Thread via GitHub
dongjoon-hyun closed pull request #44265: [SPARK-46275][3.4] Protobuf: Return null in permissive mode when deserialization fails URL: https://github.com/apache/spark/pull/44265 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46334][INFRA][PS] Upgrade `Pandas` to 2.1.4 [spark]

2023-12-08 Thread via GitHub
dongjoon-hyun commented on PR #44266: URL: https://github.com/apache/spark/pull/44266#issuecomment-1847927910 Could you re-trigger the failed pipelines, @bjornjorgensen ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46325][CONNECT] Remove unnecessary override functions when constructing `WrappedCloseableIterator` in `ResponseValidator#wrapIterator` [spark]

2023-12-08 Thread via GitHub
dongjoon-hyun closed pull request #44255: [SPARK-46325][CONNECT] Remove unnecessary override functions when constructing `WrappedCloseableIterator` in `ResponseValidator#wrapIterator` URL: https://github.com/apache/spark/pull/44255 -- This is an automated message from the Apache Git

Re: [PR] [SPARK-46332][SQL] Migrate `CatalogNotFoundException` to the error class `CATALOG_NOT_FOUND` [spark]

2023-12-08 Thread via GitHub
dongjoon-hyun closed pull request #44259: [SPARK-46332][SQL] Migrate `CatalogNotFoundException` to the error class `CATALOG_NOT_FOUND` URL: https://github.com/apache/spark/pull/44259 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-45597][PYTHON][SQL] Support creating table using a Python data source in SQL (single wrapper) [spark]

2023-12-08 Thread via GitHub
dongjoon-hyun commented on PR #44233: URL: https://github.com/apache/spark/pull/44233#issuecomment-1847827890 Just a question. So, which one is the final decision, this PR or #43784 ? Do we need to collect more opinions? -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-46324][SQL][PYTHON] Fix the output name of pyspark.sql.functions.user and session_user [spark]

2023-12-08 Thread via GitHub
dongjoon-hyun commented on PR #44253: URL: https://github.com/apache/spark/pull/44253#issuecomment-1847826582 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46324][SQL][PYTHON] Fix the output name of pyspark.sql.functions.user and session_user [spark]

2023-12-08 Thread via GitHub
dongjoon-hyun closed pull request #44253: [SPARK-46324][SQL][PYTHON] Fix the output name of pyspark.sql.functions.user and session_user URL: https://github.com/apache/spark/pull/44253 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-46312][CORE] Use `lower_camel_case` in `store_types.proto` [spark]

2023-12-08 Thread via GitHub
dongjoon-hyun commented on PR #44242: URL: https://github.com/apache/spark/pull/44242#issuecomment-1847805850 Thank you, @gengliangwang . BTW, according to [your previous comment](https://github.com/apache/spark/pull/43609#issuecomment-1787964921), I'm staring to look at this area.

Re: [PR] [SPARK-46275][3.4] Protobuf: Return null in permissive mode when deserialization fails [spark]

2023-12-08 Thread via GitHub
rangadi commented on PR #44265: URL: https://github.com/apache/spark/pull/44265#issuecomment-1847769153 @HyukjinKwon please merge this into 3.4 when you get a chance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [MINOR][DOCS] Fix documentation for spark.sql.legacy.doLooseUpcast in SQL migration guide [spark]

2023-12-08 Thread via GitHub
amytsai-stripe commented on PR #44262: URL: https://github.com/apache/spark/pull/44262#issuecomment-1847734256 @MaxGekk I updated this PR to also move the note to the Spark 2.4 to Spark 3.0 section! -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-46328][SQL] Allocate capacity of array list of TColumns by columns size in TRowSet generation [spark]

2023-12-08 Thread via GitHub
dongjoon-hyun closed pull request #44258: [SPARK-46328][SQL] Allocate capacity of array list of TColumns by columns size in TRowSet generation URL: https://github.com/apache/spark/pull/44258 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[PR] [SPARK-46275][3.4][Chrry-pick] Protobuf: Return null in permissive mode when deserialization fails [spark]

2023-12-08 Thread via GitHub
rangadi opened a new pull request, #44265: URL: https://github.com/apache/spark/pull/44265 This is a cherry-pick of #44214 into 3.4 branch. From the original PR: ### What changes were proposed in this pull request? This updates the the behavior of `from_protobuf()` built

Re: [PR] [MINOR][DOCS] Fix documentation for spark.sql.legacy.doLooseUpcast in SQL migration guide [spark]

2023-12-08 Thread via GitHub
MaxGekk commented on PR #44262: URL: https://github.com/apache/spark/pull/44262#issuecomment-1847690533 @HyukjinKwon @manuzhang Could you take a look at the PR, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-38211][SQL][DOCS] Add SQL migration guide on restoring loose upcast from string to other types [spark]

2023-12-08 Thread via GitHub
MaxGekk commented on code in PR #35519: URL: https://github.com/apache/spark/pull/35519#discussion_r1420919590 ## docs/sql-migration-guide.md: ## @@ -420,7 +420,7 @@ license: | need to specify a value with units like "30s" now, to avoid being interpreted as milliseconds;

Re: [PR] [SPARK-46332][SQL] Migrate `CatalogNotFoundException` to the error class `CATALOG_NOT_FOUND` [spark]

2023-12-08 Thread via GitHub
MaxGekk commented on PR #44259: URL: https://github.com/apache/spark/pull/44259#issuecomment-1847664511 All tests passed: https://github.com/MaxGekk/spark/actions/runs/7141477462/job/19458149893 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-46090][SQL] Support plan fragment level SQL configs in AQE [spark]

2023-12-08 Thread via GitHub
cloud-fan commented on code in PR #44013: URL: https://github.com/apache/spark/pull/44013#discussion_r1420896403 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRuleContext.scala: ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-46312][CORE] Use `lower_camel_case` in `store_types.proto` [spark]

2023-12-08 Thread via GitHub
gengliangwang commented on PR #44242: URL: https://github.com/apache/spark/pull/44242#issuecomment-1847655584 Late LGTM! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-46132][CORE] Support key password for JKS keys for RPC SSL [spark]

2023-12-08 Thread via GitHub
hasnain-db commented on PR #44264: URL: https://github.com/apache/spark/pull/44264#issuecomment-1847608849 cc: @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] [SPARK-46132][CORE] Support key password for JKS keys for RPC SSL [spark]

2023-12-08 Thread via GitHub
hasnain-db opened a new pull request, #44264: URL: https://github.com/apache/spark/pull/44264 ### What changes were proposed in this pull request? Add support for a separate key password in addition to the key store password for JKS keys. This is needed for keys which may have a key

Re: [PR] [SPARK-46179][SQL] Add CrossDbmsQueryTestSuites, which allows generating golden files with Postgres/other DBMS [spark]

2023-12-08 Thread via GitHub
dtenedor commented on code in PR #44084: URL: https://github.com/apache/spark/pull/44084#discussion_r1420855965 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala: ## @@ -349,20 +349,16 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession

[PR] [WIP][SQL] Replace `IllegalStateException` by `SparkException.internalError` in catalyst [spark]

2023-12-08 Thread via GitHub
MaxGekk opened a new pull request, #44263: URL: https://github.com/apache/spark/pull/44263 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-46324][SQL][PYTHON] Fix the output name of pyspark.sql.functions.user and session_user [spark]

2023-12-08 Thread via GitHub
dongjoon-hyun commented on PR #44253: URL: https://github.com/apache/spark/pull/44253#issuecomment-1847569277 Could you re-trigger the failed pipelines once more? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-46246] EXECUTE IMMEDIATE SQL support [spark]

2023-12-08 Thread via GitHub
srielau commented on code in PR #44093: URL: https://github.com/apache/spark/pull/44093#discussion_r1420801268 ## sql/core/src/test/resources/sql-tests/inputs/execute-immediate.sql: ## @@ -0,0 +1,96 @@ +-- Automatically generated by SQLQueryTestSuite +-- !query +CREATE

Re: [PR] [SPARK-46275] Protobuf: Return null in permissive mode when deserialization fails. [spark]

2023-12-08 Thread via GitHub
rangadi commented on PR #44214: URL: https://github.com/apache/spark/pull/44214#issuecomment-1847497554 Thanks @HyukjinKwon for merging. I will create a backport. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] [MINOR][DOCS] Fix documentation for spark.sql.legacy.doLooseUpcast in SQL migration guide [spark]

2023-12-08 Thread via GitHub
amytsai-stripe opened a new pull request, #44262: URL: https://github.com/apache/spark/pull/44262 ### What changes were proposed in this pull request? Fixes a typo in the SQL migration guide documentation for `spark.sql.legacy.doLooseUpcast`. ### Why are the changes

Re: [PR] [SPARK-46331][SQL] Removing CodegenFallback from subset of DateTime expressions and version() expression [spark]

2023-12-08 Thread via GitHub
MaxGekk commented on code in PR #44261: URL: https://github.com/apache/spark/pull/44261#discussion_r1420624345 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala: ## @@ -79,30 +79,22 @@ class DateExpressionsSuite extends

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2023-12-08 Thread via GitHub
Ngone51 commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1420598351 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2603,4 +2603,13 @@ package object config { .stringConf .toSequence

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for limit outside of window [spark]

2023-12-08 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1420540736 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala: ## @@ -1276,6 +1276,147 @@ class DataFrameWindowFunctionsSuite extends QueryTest

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for limit outside of window [spark]

2023-12-08 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1420502309 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala: ## @@ -1276,6 +1276,111 @@ class DataFrameWindowFunctionsSuite extends QueryTest

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for limit outside of window [spark]

2023-12-08 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1420501839 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +72,58 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for limit outside of window [spark]

2023-12-08 Thread via GitHub
zml1206 commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1420494712 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +72,58 @@ object InferWindowGroupLimit extends

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2023-12-08 Thread via GitHub
Ngone51 commented on PR #43954: URL: https://github.com/apache/spark/pull/43954#issuecomment-1847130056 To fix the tests, I have to move (https://github.com/apache/spark/pull/43954/commits/fe70ba95a67800b798443b8fa873d2b24efa2067) the "abort stage" call back into `cancelTasks()` with the

Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

2023-12-08 Thread via GitHub
pan3793 commented on code in PR #44211: URL: https://github.com/apache/spark/pull/44211#discussion_r1420397903 ## connector/kinesis-asl/pom.xml: ## @@ -54,14 +54,38 @@ test - com.amazonaws + software.amazon.kinesis amazon-kinesis-client

Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

2023-12-08 Thread via GitHub
pan3793 commented on code in PR #44211: URL: https://github.com/apache/spark/pull/44211#discussion_r1420397903 ## connector/kinesis-asl/pom.xml: ## @@ -54,14 +54,38 @@ test - com.amazonaws + software.amazon.kinesis amazon-kinesis-client

Re: [PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-08 Thread via GitHub
beliefer commented on code in PR #44145: URL: https://github.com/apache/spark/pull/44145#discussion_r1420347586 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -68,10 +72,58 @@ object InferWindowGroupLimit extends

Re: [PR] [WIP][SPARK-43403][UI] Ensure old SparkUI in HistoryServer has been detached before loading new one [spark]

2023-12-08 Thread via GitHub
zhouyifan279 commented on PR #41105: URL: https://github.com/apache/spark/pull/41105#issuecomment-1847080341 @LuciferYang @mridulm because of issue [SPARK-46330](https://issues.apache.org/jira/browse/SPARK-46330), I removed the timeout parameter in waiting old Spark UI detaching.

[PR] [SPARK-46331][SQL] Removing CodegenFallback from subset of DateTime expressions and version() expression [spark]

2023-12-08 Thread via GitHub
dbatomic opened a new pull request, #44261: URL: https://github.com/apache/spark/pull/44261 ### What changes were proposed in this pull request? This PR moves us a bit closer to removing CodegenFallback class and instead of it relying on RuntimeReplaceable with

Re: [PR] [SPARK-46327][PS][CONNECT][TESTS] Reorganize `SeriesStringTests` [spark]

2023-12-08 Thread via GitHub
HyukjinKwon closed pull request #44257: [SPARK-46327][PS][CONNECT][TESTS] Reorganize `SeriesStringTests` URL: https://github.com/apache/spark/pull/44257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-46327][PS][CONNECT][TESTS] Reorganize `SeriesStringTests` [spark]

2023-12-08 Thread via GitHub
HyukjinKwon commented on PR #44257: URL: https://github.com/apache/spark/pull/44257#issuecomment-1847065815 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45796][SQL] Support MODE() WITHIN GROUP (ORDER BY col) [spark]

2023-12-08 Thread via GitHub
beliefer commented on code in PR #44184: URL: https://github.com/apache/spark/pull/44184#discussion_r1420337431 ## sql/core/src/test/resources/sql-tests/results/group-by.sql.out: ## @@ -1152,16 +1152,15 @@ SELECT mode(col, 'true') FROM VALUES (-10), (0), (10) AS tab(col) --

Re: [PR] [SPARK-45796][SQL] Support MODE() WITHIN GROUP (ORDER BY col) [spark]

2023-12-08 Thread via GitHub
beliefer commented on code in PR #44184: URL: https://github.com/apache/spark/pull/44184#discussion_r1420334594 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -18,22 +18,21 @@ package

Re: [PR] [SPARK-46326][PYTHON][TESTS] Test missing cases for functions (pyspark.sql.functions) [spark]

2023-12-08 Thread via GitHub
HyukjinKwon commented on PR #44256: URL: https://github.com/apache/spark/pull/44256#issuecomment-1847047417 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46326][PYTHON][TESTS] Test missing cases for functions (pyspark.sql.functions) [spark]

2023-12-08 Thread via GitHub
HyukjinKwon closed pull request #44256: [SPARK-46326][PYTHON][TESTS] Test missing cases for functions (pyspark.sql.functions) URL: https://github.com/apache/spark/pull/44256 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46282][PYTHON][DOCS] Create a Standalone Page for DataFrame API in PySpark Documentation [spark]

2023-12-08 Thread via GitHub
bjornjorgensen commented on code in PR #44201: URL: https://github.com/apache/spark/pull/44201#discussion_r1420313384 ## python/docs/source/index.rst: ## @@ -72,20 +76,25 @@ DataFrames, Structured Streaming, Machine Learning (MLlib) and Spark Core. :alt: Spark Core

Re: [PR] [SPARK-46282][PYTHON][DOCS] Create a Standalone Page for DataFrame API in PySpark Documentation [spark]

2023-12-08 Thread via GitHub
bjornjorgensen commented on code in PR #44201: URL: https://github.com/apache/spark/pull/44201#discussion_r1420313384 ## python/docs/source/index.rst: ## @@ -72,20 +76,25 @@ DataFrames, Structured Streaming, Machine Learning (MLlib) and Spark Core. :alt: Spark Core

Re: [PR] [WIP][SPARK-43403][UI] Ensure old SparkUI in HistoryServer has been detached before loading new one [spark]

2023-12-08 Thread via GitHub
zhouyifan279 commented on PR #41105: URL: https://github.com/apache/spark/pull/41105#issuecomment-1847023452 > After deploy this patch in our product environment, TimeoutException was thrown occasionally when opening a running Spark UI in HistoryServer. It happened in Application

Re: [PR] [SPARK-46202][CONNECT] Expose new ArtifactManager APIs to support custom target directories [spark]

2023-12-08 Thread via GitHub
vicennial commented on PR #44109: URL: https://github.com/apache/spark/pull/44109#issuecomment-1847023164 cc @hvanhovell @nija-at -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46202][CONNECT] Expose new ArtifactManager APIs to support custom target directories [spark]

2023-12-08 Thread via GitHub
vicennial commented on PR #44109: URL: https://github.com/apache/spark/pull/44109#issuecomment-1847013631 I've updated the APIs to be simpler to make them simpler to use (String-based source/target). It is now easier to reason about what the directory-strcuture would look like. -- This

[PR] [SPARK-46330] Loading of Spark UI blocks for a long time when HybridStore enabled [spark]

2023-12-08 Thread via GitHub
zhouyifan279 opened a new pull request, #44260: URL: https://github.com/apache/spark/pull/44260 ### What changes were proposed in this pull request? Reduce blocked time during loading Spark UI when HybridStore is enabled. ### Why are the changes needed? When

Re: [PR] [SPARK-39176][PYSPARK] Fixed a problem with pyspark serializing pre-1970 datetime in windows [spark]

2023-12-08 Thread via GitHub
dingsl-giser commented on PR #36566: URL: https://github.com/apache/spark/pull/36566#issuecomment-1846881641 @HyukjinKwon This problem still exists in the new version. Can it be merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-39176][PYSPARK] Fixed a problem with pyspark serializing pre-1970 datetime in windows [spark]

2023-12-08 Thread via GitHub
dingsl-giser commented on PR #36566: URL: https://github.com/apache/spark/pull/36566#issuecomment-1846877240 This problem hasn't been solved yet? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

  1   2   >