Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-03 Thread via GitHub
beliefer commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1413464275 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -317,32 +339,17 @@ class UnboundedOffsetWindowFunctionFrame(

Re: [PR] [SPARK-46236][PYTHON][DOCS] Using brighter color for docs h3 title for better visibility [spark]

2023-12-03 Thread via GitHub
HyukjinKwon closed pull request #44152: [SPARK-46236][PYTHON][DOCS] Using brighter color for docs h3 title for better visibility URL: https://github.com/apache/spark/pull/44152 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46236][PYTHON][DOCS] Using brighter color for docs h3 title for better visibility [spark]

2023-12-03 Thread via GitHub
HyukjinKwon commented on PR #44152: URL: https://github.com/apache/spark/pull/44152#issuecomment-1837977863 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46229][PYTHON][CONNECT] Add applyInArrow to groupBy and cogroup in Spark Connect [spark]

2023-12-03 Thread via GitHub
HyukjinKwon closed pull request #44146: [SPARK-46229][PYTHON][CONNECT] Add applyInArrow to groupBy and cogroup in Spark Connect URL: https://github.com/apache/spark/pull/44146 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46229][PYTHON][CONNECT] Add applyInArrow to groupBy and cogroup in Spark Connect [spark]

2023-12-03 Thread via GitHub
HyukjinKwon commented on PR #44146: URL: https://github.com/apache/spark/pull/44146#issuecomment-1837974869 I manually tested the latest changes. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-45527][CORE] Use fraction to do the resource calculation [spark]

2023-12-03 Thread via GitHub
wbo4958 commented on code in PR #43494: URL: https://github.com/apache/spark/pull/43494#discussion_r1413397783 ## core/src/main/scala/org/apache/spark/scheduler/ExecutorResourcesAmounts.scala: ## @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-46043][SQL] Support create table using DSv2 sources [spark]

2023-12-03 Thread via GitHub
cloud-fan commented on code in PR #43949: URL: https://github.com/apache/spark/pull/43949#discussion_r1413451050 ## sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala: ## @@ -61,7 +61,19 @@ case class DataSourceV2Relation(

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

2023-12-03 Thread via GitHub
cloud-fan commented on code in PR #43910: URL: https://github.com/apache/spark/pull/43910#discussion_r1413434804 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -417,6 +429,17 @@ case class PercentileDisc(

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

2023-12-03 Thread via GitHub
cloud-fan commented on code in PR #43910: URL: https://github.com/apache/spark/pull/43910#discussion_r1413434642 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -374,6 +375,17 @@ case class PercentileCont(left:

[PR] [SPARK-46236][PYTHON][DOCS] Using brighter color for docs h3 title for better visibility [spark]

2023-12-03 Thread via GitHub
panbingkun opened a new pull request, #44152: URL: https://github.com/apache/spark/pull/44152 ### What changes were proposed in this pull request? The pr aims to using brighter color for docs `h3 title` for better visibility. ### Why are the changes needed? Before:

Re: [PR] [SPARK-46232][PYTHON] Migrate all remaining ValueError into PySpark error framework. [spark]

2023-12-03 Thread via GitHub
dongjoon-hyun commented on PR #44149: URL: https://github.com/apache/spark/pull/44149#issuecomment-1837939566 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46232][PYTHON] Migrate all remaining ValueError into PySpark error framework. [spark]

2023-12-03 Thread via GitHub
dongjoon-hyun closed pull request #44149: [SPARK-46232][PYTHON] Migrate all remaining ValueError into PySpark error framework. URL: https://github.com/apache/spark/pull/44149 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-37518][SQL] Inject an early scan pushdown rule [spark]

2023-12-03 Thread via GitHub
advancedxy commented on PR #34779: URL: https://github.com/apache/spark/pull/34779#issuecomment-1837935968 @beliefer @cloud-fan we found it useful to inject custom early pushdown rules in practice, such as to rewrite some transform expression that's not yet identified by Spark. Do you

Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-03 Thread via GitHub
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1413423471 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -317,32 +339,17 @@ class UnboundedOffsetWindowFunctionFrame(

Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-03 Thread via GitHub
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1413422198 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +225,16 @@ class FrameLessOffsetWindowFunctionFrame(

Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-03 Thread via GitHub
cloud-fan commented on code in PR #43958: URL: https://github.com/apache/spark/pull/43958#discussion_r1413421839 ## sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala: ## @@ -196,24 +225,16 @@ class FrameLessOffsetWindowFunctionFrame(

Re: [PR] [SPARK-46230][PYTHON] Migrate `RetriesExceeded` into PySpark error. [spark]

2023-12-03 Thread via GitHub
itholic commented on code in PR #44147: URL: https://github.com/apache/spark/pull/44147#discussion_r1413419030 ## python/pyspark/sql/connect/client/retries.py: ## @@ -248,13 +249,6 @@ def __iter__(self) -> Generator[AttemptManager, None, None]: yield

Re: [PR] [SPARK-46230][PYTHON] Migrate `RetriesExceeded` and `RetryException` into PySpark error. [spark]

2023-12-03 Thread via GitHub
itholic commented on code in PR #44147: URL: https://github.com/apache/spark/pull/44147#discussion_r1413416382 ## python/pyspark/sql/connect/client/retries.py: ## @@ -248,13 +249,6 @@ def __iter__(self) -> Generator[AttemptManager, None, None]: yield

Re: [PR] [SPARK-46234][PYTHON] Introduce `PySparkKeyError` for PySpark error framework [spark]

2023-12-03 Thread via GitHub
itholic commented on PR #44151: URL: https://github.com/apache/spark/pull/44151#issuecomment-1837913433 Thanks @HyukjinKwon and @dongjoon-hyun for the review on recent series of PySpark error framework related PRs. This would be the last one for migrating non-PySpark errors from

[PR] [SPARK-46234][PYTHON] Introduce `PySparkKeyError` for PySpark error framework [spark]

2023-12-03 Thread via GitHub
itholic opened a new pull request, #44151: URL: https://github.com/apache/spark/pull/44151 ### What changes were proposed in this pull request? This PR proposes to introduce `PySparkKeyError` for error framework, and migrate Python built-in `KeyError` into `PySparkKeyError`.

Re: [PR] [SPARK-46090][SQL] Support plan fragment level SQL configs in AQE [spark]

2023-12-03 Thread via GitHub
ulysses-you commented on PR #44013: URL: https://github.com/apache/spark/pull/44013#issuecomment-1837910213 thank you @dongjoon-hyun it's done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46230][PYTHON] Migrate `RetriesExceeded` and `RetryException` into PySpark error. [spark]

2023-12-03 Thread via GitHub
HyukjinKwon commented on code in PR #44147: URL: https://github.com/apache/spark/pull/44147#discussion_r1413413050 ## python/pyspark/sql/connect/client/retries.py: ## @@ -248,13 +249,6 @@ def __iter__(self) -> Generator[AttemptManager, None, None]: yield

Re: [PR] [SPARK-46230][PYTHON] Migrate `RetriesExceeded` and `RetryException` into PySpark error. [spark]

2023-12-03 Thread via GitHub
HyukjinKwon commented on code in PR #44147: URL: https://github.com/apache/spark/pull/44147#discussion_r1413412724 ## python/pyspark/sql/connect/client/retries.py: ## @@ -248,13 +249,6 @@ def __iter__(self) -> Generator[AttemptManager, None, None]: yield

Re: [PR] [SPARK-45888][SS] Apply error class framework to State (Metadata) Data Source [spark]

2023-12-03 Thread via GitHub
HeartSaVioR commented on PR #44025: URL: https://github.com/apache/spark/pull/44025#issuecomment-1837905878 @MaxGekk @beliefer Would you mind taking another round of review? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event [spark]

2023-12-03 Thread via GitHub
dongjoon-hyun commented on PR #44090: URL: https://github.com/apache/spark/pull/44090#issuecomment-1837902973 Thank you, @jiangxb1987 and @mridulm . Merged to master/3.4/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event [spark]

2023-12-03 Thread via GitHub
dongjoon-hyun closed pull request #44090: [SPARK-46182][CORE] Track `lastTaskFinishTime` using the exact task finished event URL: https://github.com/apache/spark/pull/44090 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-40559][PYTHON][DOCS][FOLLOW-UP] Fix the docstring and document both applyInArrows [spark]

2023-12-03 Thread via GitHub
dongjoon-hyun commented on PR #44139: URL: https://github.com/apache/spark/pull/44139#issuecomment-1837900043 Merged to master~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-40559][PYTHON][DOCS][FOLLOW-UP] Fix the docstring and document both applyInArrows [spark]

2023-12-03 Thread via GitHub
dongjoon-hyun closed pull request #44139: [SPARK-40559][PYTHON][DOCS][FOLLOW-UP] Fix the docstring and document both applyInArrows URL: https://github.com/apache/spark/pull/44139 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] [SPARK-46233][PYTHON] Migrate all remaining `AttributeError` into PySpark error framework [spark]

2023-12-03 Thread via GitHub
itholic opened a new pull request, #44150: URL: https://github.com/apache/spark/pull/44150 ### What changes were proposed in this pull request? This PR proposes to migrate all remaining `AttributeError` from `pyspark/sql/*` into PySpark error framework, `PySparkAttributeError` with

Re: [PR] [SPARK-45649][SQL] Unify the prepare framework for OffsetWindowFunctionFrame [spark]

2023-12-03 Thread via GitHub
beliefer commented on PR #43958: URL: https://github.com/apache/spark/pull/43958#issuecomment-1837895892 In fact, not only can it reduce code, but it can also reuse code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46069][SQL] Support unwrap timestamp type to date type [spark]

2023-12-03 Thread via GitHub
cloud-fan commented on code in PR #43982: URL: https://github.com/apache/spark/pull/43982#discussion_r1413403259 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -329,6 +334,48 @@ object

Re: [PR] [SPARK-46230][PYTHON] Migrate `RetriesExceeded` and `RetryException` into PySpark error. [spark]

2023-12-03 Thread via GitHub
itholic commented on code in PR #44147: URL: https://github.com/apache/spark/pull/44147#discussion_r1413392482 ## python/pyspark/sql/connect/client/retries.py: ## @@ -248,13 +249,6 @@ def __iter__(self) -> Generator[AttemptManager, None, None]: yield

Re: [PR] [SPARK-46050][SQL]: Allow additional rules for the Substitution batch [spark]

2023-12-03 Thread via GitHub
jzhuge commented on PR #43952: URL: https://github.com/apache/spark/pull/43952#issuecomment-1837877316 If this is approved, add `SparkSessionExtensions.injectSubstitutionRule` in a follow-up? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-46090][SQL] Support plan fragment level SQL configs in AQE [spark]

2023-12-03 Thread via GitHub
dongjoon-hyun commented on PR #44013: URL: https://github.com/apache/spark/pull/44013#issuecomment-1837875645 I merged it. Could you rebase this PR, @ulysses-you ?  - #44142 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-46050][SQL]: Allow additional rules for the Substitution batch [spark]

2023-12-03 Thread via GitHub
jzhuge commented on PR #43952: URL: https://github.com/apache/spark/pull/43952#issuecomment-1837873547 We might want to hold off this PR a bit. see https://github.com/apache/spark/pull/39796#discussion_r1091732398 -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-46069][SQL][FOLLOWUP] Make sure the cast expression is date type when unwrap timestamp type to date type [spark]

2023-12-03 Thread via GitHub
wangyum commented on code in PR #44134: URL: https://github.com/apache/spark/pull/44134#discussion_r1413388859 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -139,7 +139,7 @@ object UnwrapCastInBinaryComparison

Re: [PR] [SPARK-46218][BUILD] Upgrade commons-cli to 1.6.0 [spark]

2023-12-03 Thread via GitHub
dongjoon-hyun closed pull request #44132: [SPARK-46218][BUILD] Upgrade commons-cli to 1.6.0 URL: https://github.com/apache/spark/pull/44132 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46212][CORE][SQL][SS][CONNECT][MLLIB][GRAPHX][DSTREAM][PROTOBUF][EXAMPLES] Use other functions to simplify the code pattern of `s.c.MapOps#view.mapValues` [spark]

2023-12-03 Thread via GitHub
cloud-fan commented on PR #44122: URL: https://github.com/apache/spark/pull/44122#issuecomment-1837867673 late LGTM, thanks for simplifying it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-46227][SQL] Move `withSQLConf` from `SQLHelper` to `SQLConfHelper` [spark]

2023-12-03 Thread via GitHub
dongjoon-hyun commented on PR #44142: URL: https://github.com/apache/spark/pull/44142#issuecomment-1837864780 Thank you, @ulysses-you , @beliefer , @cloud-fan . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-46227][SQL] Move `withSQLConf` from `SQLHelper` to `SQLConfHelper` [spark]

2023-12-03 Thread via GitHub
dongjoon-hyun commented on PR #44142: URL: https://github.com/apache/spark/pull/44142#issuecomment-1837864639 Merged to master for Apache Spark 4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-46227][SQL] Move `withSQLConf` from `SQLHelper` to `SQLConfHelper` [spark]

2023-12-03 Thread via GitHub
dongjoon-hyun closed pull request #44142: [SPARK-46227][SQL] Move `withSQLConf` from `SQLHelper` to `SQLConfHelper` URL: https://github.com/apache/spark/pull/44142 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-46227][SQL] Move `withSQLConf` from `SQLHelper` to `SQLConfHelper` [spark]

2023-12-03 Thread via GitHub
dongjoon-hyun commented on code in PR #44142: URL: https://github.com/apache/spark/pull/44142#discussion_r1413385139 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SQLConfHelper.scala: ## @@ -29,4 +30,32 @@ trait SQLConfHelper { * See [[SQLConf.get]] for more

Re: [PR] [SPARK-46227][SQL] Move `withSQLConf` from `SQLHelper` to `SQLConfHelper` [spark]

2023-12-03 Thread via GitHub
cloud-fan commented on code in PR #44142: URL: https://github.com/apache/spark/pull/44142#discussion_r1413384686 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SQLConfHelper.scala: ## @@ -29,4 +30,32 @@ trait SQLConfHelper { * See [[SQLConf.get]] for more

Re: [PR] [SPARK-45602][CORE][SQL][SS][YARN][K8S] Replace `s.c.MapOps.filterKeys` with `s.c.MapOps.view.filterKeys` [spark]

2023-12-03 Thread via GitHub
LuciferYang commented on code in PR #43445: URL: https://github.com/apache/spark/pull/43445#discussion_r1413384665 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousStream.scala: ## @@ -102,7 +102,7 @@ class KafkaContinuousStream( }

Re: [PR] [SPARK-45602][CORE][SQL][SS][YARN][K8S] Replace `s.c.MapOps.filterKeys` with `s.c.MapOps.view.filterKeys` [spark]

2023-12-03 Thread via GitHub
cloud-fan commented on code in PR #43445: URL: https://github.com/apache/spark/pull/43445#discussion_r1413383937 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousStream.scala: ## @@ -102,7 +102,7 @@ class KafkaContinuousStream( }

[PR] [SPARK-46232][PYTHON] Migrate all remaining ValueError into PySpark error framework. [spark]

2023-12-03 Thread via GitHub
itholic opened a new pull request, #44149: URL: https://github.com/apache/spark/pull/44149 ### What changes were proposed in this pull request? This PR proposes to migrate all remaining `ValueError` into PySpark error framework, `PySparkValueError` with assigning dedicated error

Re: [PR] [SPARK-46230][PYTHON] Migrate `RetriesExceeded` and `RetryException` into PySpark error. [spark]

2023-12-03 Thread via GitHub
HyukjinKwon commented on code in PR #44147: URL: https://github.com/apache/spark/pull/44147#discussion_r1413358036 ## python/pyspark/sql/connect/client/retries.py: ## @@ -248,13 +249,6 @@ def __iter__(self) -> Generator[AttemptManager, None, None]: yield

Re: [PR] [SPARK-40559][PYTHON][DOCS][FOLLOW-UP] Fix the docstring and document both applyInArrows [spark]

2023-12-03 Thread via GitHub
HyukjinKwon commented on code in PR #44139: URL: https://github.com/apache/spark/pull/44139#discussion_r1413351391 ## python/pyspark/sql/pandas/group_ops.py: ## @@ -652,9 +651,9 @@ def applyInArrow( Parameters -- func : function -a

Re: [PR] [SPARK-45602][CORE][SQL][SS][YARN][K8S] Replace `s.c.MapOps.filterKeys` with `s.c.MapOps.view.filterKeys` [spark]

2023-12-03 Thread via GitHub
LuciferYang commented on code in PR #43445: URL: https://github.com/apache/spark/pull/43445#discussion_r1413338062 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousStream.scala: ## @@ -102,7 +102,7 @@ class KafkaContinuousStream( }

Re: [PR] [SPARK-45602][CORE][SQL][SS][YARN][K8S] Replace `s.c.MapOps.filterKeys` with `s.c.MapOps.view.filterKeys` [spark]

2023-12-03 Thread via GitHub
LuciferYang commented on code in PR #43445: URL: https://github.com/apache/spark/pull/43445#discussion_r1413338062 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousStream.scala: ## @@ -102,7 +102,7 @@ class KafkaContinuousStream( }

Re: [PR] [SPARK-46227][SQL] Move `withSQLConf` from `SQLHelper` to `SQLConfHelper` [spark]

2023-12-03 Thread via GitHub
beliefer commented on code in PR #44142: URL: https://github.com/apache/spark/pull/44142#discussion_r1413330931 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SQLConfHelper.scala: ## @@ -29,4 +30,32 @@ trait SQLConfHelper { * See [[SQLConf.get]] for more

Re: [PR] [SPARK-32246][BUILD][INFRA] Add new Github Action to run Kinesis tests [spark]

2023-12-03 Thread via GitHub
junyuc25 commented on PR #43736: URL: https://github.com/apache/spark/pull/43736#issuecomment-1837784593 Hi @dongjoon-hyun , I updated the PR as per your comments. Could you take a look when you get a chance? Thanks. -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-32246][BUILD][INFRA] Add new Github Action to run Kinesis tests [spark]

2023-12-03 Thread via GitHub
LuciferYang commented on PR #43736: URL: https://github.com/apache/spark/pull/43736#issuecomment-1837784357 > > > Note that currently there are totally 57 tests in the Kinesis-asl module, and this PR enabled 35 of them. > > > > > > If those 22 test cases are ignored, can we

Re: [PR] [SPARK-32246][BUILD][INFRA] Add new Github Action to run Kinesis tests [spark]

2023-12-03 Thread via GitHub
junyuc25 commented on code in PR #43736: URL: https://github.com/apache/spark/pull/43736#discussion_r1413330286 ## pom.xml: ## @@ -202,6 +202,7 @@ 4.1.17 14.0.1 3.1.9 +2.2.11 Review Comment: Same as above comment. -- This is an automated message from

Re: [PR] [SPARK-32246][BUILD][INFRA] Add new Github Action to run Kinesis tests [spark]

2023-12-03 Thread via GitHub
junyuc25 commented on PR #43736: URL: https://github.com/apache/spark/pull/43736#issuecomment-1837782246 > > Note that currently there are totally 57 tests in the Kinesis-asl module, and this PR enabled 35 of them. > > If those 22 test cases are ignored, can we safely change the

Re: [PR] [SPARK-TBD][PYTHON][CONNECT] Forbid Recursive Error handling [spark]

2023-12-03 Thread via GitHub
HyukjinKwon commented on code in PR #44144: URL: https://github.com/apache/spark/pull/44144#discussion_r1413323952 ## python/pyspark/sql/connect/client/core.py: ## @@ -544,6 +544,25 @@ def fromProto(cls, pb: pb2.ConfigResponse) -> "ConfigResult": ) +class

Re: [PR] [SPARK-46215][CORE] Improve `FileSystemPersistenceEngine` to allow nonexistent parents [spark]

2023-12-03 Thread via GitHub
LuciferYang commented on PR #44127: URL: https://github.com/apache/spark/pull/44127#issuecomment-1837767547 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] [SPARK-46231][PYTHON] Migrate all remaining `NotImplementedError` & `TypeError` into PySpark error framework [spark]

2023-12-03 Thread via GitHub
itholic opened a new pull request, #44148: URL: https://github.com/apache/spark/pull/44148 ### What changes were proposed in this pull request? This PR proposes to igrate all remaining `NotImplementedError` and `TypeError` into PySpark error framework, `PySparkNotImplementedError`

Re: [PR] [WIP][SPARK-46229][PYTHON][CONNECT] Add applyInArrow to groupBy and cogroup in Spark Connect [spark]

2023-12-03 Thread via GitHub
zhengruifeng commented on PR #44146: URL: https://github.com/apache/spark/pull/44146#issuecomment-1837758466 LGTM pending CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-46230][PYTHON] Migrate `RetriesExceeded` and `RetryException` into PySpark error. [spark]

2023-12-03 Thread via GitHub
itholic opened a new pull request, #44147: URL: https://github.com/apache/spark/pull/44147 ### What changes were proposed in this pull request? This PR proposes to migrate `RetriesExceeded` and `RetryException` into PySpark error. ### Why are the changes needed? All

[PR] [WIP][SPARK-46229][PYTHON][CONNECT] Add applyInArrow to groupBy and cogroup in Spark Connect [spark]

2023-12-03 Thread via GitHub
HyukjinKwon opened a new pull request, #44146: URL: https://github.com/apache/spark/pull/44146 ### What changes were proposed in this pull request? This PR implements Spark Connect version of https://github.com/apache/spark/pull/38624. ### Why are the changes needed?

[PR] [SPARK-46228][SQL] Insert window group limit node for cumulative aggregation with limit [spark]

2023-12-03 Thread via GitHub
zml1206 opened a new pull request, #44145: URL: https://github.com/apache/spark/pull/44145 ### What changes were proposed in this pull request? Insert node WindowGroupLimit to filter out unnecessary rows based on cumulative aggregation with limit. it supports following

Re: [PR] [SPARK-TBD] Forbid Recursive Error handling [spark]

2023-12-03 Thread via GitHub
cdkrot commented on PR #44144: URL: https://github.com/apache/spark/pull/44144#issuecomment-1837751998 cc @HyukjinKwon, also @nija-at @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] [SPARK-TBD] Forbid Recursive Error handling [spark]

2023-12-03 Thread via GitHub
cdkrot opened a new pull request, #44144: URL: https://github.com/apache/spark/pull/44144 ### What changes were proposed in this pull request? Add safe guards into SparkConnectClient to forbid recursion during error handling. ### Why are the changes needed? There is a

[PR] [SPARK-46226][PYTHON] Migrate all remaining `RuntimeError`s into PySpark error framework. [spark]

2023-12-03 Thread via GitHub
itholic opened a new pull request, #44143: URL: https://github.com/apache/spark/pull/44143 ### What changes were proposed in this pull request? This PR proposes to igrate all remaining `RuntimeError`s into PySpark error framework, `PySparkRuntimeError` with assigning dedicated error

Re: [PR] [SPARK-46090][SQL] Support plan fragment level SQL configs in AQE [spark]

2023-12-03 Thread via GitHub
ulysses-you commented on PR #44013: URL: https://github.com/apache/spark/pull/44013#issuecomment-1837735524 thank you @dongjoon-hyun, will rebase and address comments after https://github.com/apache/spark/pull/44142 -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-46227][SQL] Move `withSQLConf` from SQLHelper trait to `SQLConfHelper` trait [spark]

2023-12-03 Thread via GitHub
ulysses-you commented on PR #44142: URL: https://github.com/apache/spark/pull/44142#issuecomment-1837734888 cc @dongjoon-hyun @cloud-fan thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] [SPARK-46227][SQL] Move `withSQLConf` from SQLHelper trait to `SQLConfHelper` trait [spark]

2023-12-03 Thread via GitHub
ulysses-you opened a new pull request, #44142: URL: https://github.com/apache/spark/pull/44142 ### What changes were proposed in this pull request? This pr moves method `withSQLConf` from `SQLHelper` in catalyst test module to `SQLConfHelper` trait in catalyst module. To make

Re: [PR] [SPARK-46224][PYTHON][MLLIB][TESTS] Test string representation of TestResult (pyspark.mllib.stat.test) [spark]

2023-12-03 Thread via GitHub
HyukjinKwon closed pull request #44138: [SPARK-46224][PYTHON][MLLIB][TESTS] Test string representation of TestResult (pyspark.mllib.stat.test) URL: https://github.com/apache/spark/pull/44138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-46224][PYTHON][MLLIB][TESTS] Test string representation of TestResult (pyspark.mllib.stat.test) [spark]

2023-12-03 Thread via GitHub
HyukjinKwon commented on PR #44138: URL: https://github.com/apache/spark/pull/44138#issuecomment-1837731866 The linter failure is by https://github.com/apache/spark/pull/44141. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-40559][PYTHON][FOLLOW-UP] Fix linter for `getfullargspec` [spark]

2023-12-03 Thread via GitHub
HyukjinKwon closed pull request #44141: [SPARK-40559][PYTHON][FOLLOW-UP] Fix linter for `getfullargspec` URL: https://github.com/apache/spark/pull/44141 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-40559][PYTHON][FOLLOW-UP] Fix linter for `getfullargspec` [spark]

2023-12-03 Thread via GitHub
HyukjinKwon commented on PR #44141: URL: https://github.com/apache/spark/pull/44141#issuecomment-1837731440 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-40559][PYTHON] Fix linter for `getfullargspec` [spark]

2023-12-03 Thread via GitHub
HyukjinKwon opened a new pull request, #44141: URL: https://github.com/apache/spark/pull/44141 ### What changes were proposed in this pull request? This PR proposes to use `inspect.getfullargspec` instead of unimported `getfullargspec` ### Why are the changes needed? To

Re: [PR] [SPARK-46069][SQL][FOLLOWUP] Make sure the cast expression is date type when unwrap timestamp type to date type [spark]

2023-12-03 Thread via GitHub
dongjoon-hyun commented on PR #44134: URL: https://github.com/apache/spark/pull/44134#issuecomment-1837729449 Oh, @wankunde . It's a compilation failure. ``` [error]

Re: [PR] [SPARK-46207][SQL] Support MergeInto in DataFrameWriterV2 [spark]

2023-12-03 Thread via GitHub
HyukjinKwon commented on code in PR #44119: URL: https://github.com/apache/spark/pull/44119#discussion_r1413295966 ## sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriterV2.scala: ## @@ -167,6 +173,63 @@ final class DataFrameWriterV2[T] private[sql](table: String, ds:

Re: [PR] [SPARK-46223][PS] Test SparkPandasNotImplementedError with cleaning up unused code [spark]

2023-12-03 Thread via GitHub
HyukjinKwon closed pull request #44137: [SPARK-46223][PS] Test SparkPandasNotImplementedError with cleaning up unused code URL: https://github.com/apache/spark/pull/44137 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46048][PYTHON][CONNECT][FOLLOWUP] Correct the string representation [spark]

2023-12-03 Thread via GitHub
HyukjinKwon closed pull request #44140: [SPARK-46048][PYTHON][CONNECT][FOLLOWUP] Correct the string representation URL: https://github.com/apache/spark/pull/44140 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-46223][PS] Test SparkPandasNotImplementedError with cleaning up unused code [spark]

2023-12-03 Thread via GitHub
HyukjinKwon commented on PR #44137: URL: https://github.com/apache/spark/pull/44137#issuecomment-1837723211 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46048][PYTHON][CONNECT][FOLLOWUP] Correct the string representation [spark]

2023-12-03 Thread via GitHub
HyukjinKwon commented on PR #44140: URL: https://github.com/apache/spark/pull/44140#issuecomment-1837722763 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46048][PYTHON][CONNECT][FOLLOWUP] Correct the string representation [spark]

2023-12-03 Thread via GitHub
zhengruifeng commented on PR #44140: URL: https://github.com/apache/spark/pull/44140#issuecomment-1837720246 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-46048][PYTHON][CONNECT][FOLLOWUP] Correct the string representation [spark]

2023-12-03 Thread via GitHub
zhengruifeng opened a new pull request, #44140: URL: https://github.com/apache/spark/pull/44140 ### What changes were proposed in this pull request? Correct the string representation ### Why are the changes needed? to fix a minor issue ### Does this PR introduce _any_

Re: [PR] [SPARK-46220][SQL] Restrict charsets in `decode()` [spark]

2023-12-03 Thread via GitHub
HyukjinKwon closed pull request #44131: [SPARK-46220][SQL] Restrict charsets in `decode()` URL: https://github.com/apache/spark/pull/44131 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46220][SQL] Restrict charsets in `decode()` [spark]

2023-12-03 Thread via GitHub
HyukjinKwon commented on PR #44131: URL: https://github.com/apache/spark/pull/44131#issuecomment-1837708274 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-23015][WINDOWS] Mitigate bug in Windows where starting multiple Spark instances within the same second causes a failure [spark]

2023-12-03 Thread via GitHub
panbingkun commented on PR #43706: URL: https://github.com/apache/spark/pull/43706#issuecomment-1837705608 > friendly ping @panbingkun , if convenient, such as through offline communication, please help to verify this patch on Windows. Thanks ~ Okay, I'll verify it a little later.

Re: [PR] [SPARK-46090][SQL] Support plan fragment level SQL configs in AQE [spark]

2023-12-03 Thread via GitHub
ulysses-you commented on code in PR #44013: URL: https://github.com/apache/spark/pull/44013#discussion_r1413283368 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -194,16 +212,16 @@ case class AdaptiveSparkPlanExec( }

Re: [PR] [SPARK-46090][SQL] Support plan fragment level SQL configs in AQE [spark]

2023-12-03 Thread via GitHub
ulysses-you commented on code in PR #44013: URL: https://github.com/apache/spark/pull/44013#discussion_r1413283368 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -194,16 +212,16 @@ case class AdaptiveSparkPlanExec( }

Re: [PR] [SPARK-46222][PYTHON][TESTS] Test invalid error class (pyspark.errors.utils) [spark]

2023-12-03 Thread via GitHub
HyukjinKwon closed pull request #44136: [SPARK-46222][PYTHON][TESTS] Test invalid error class (pyspark.errors.utils) URL: https://github.com/apache/spark/pull/44136 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-46222][PYTHON][TESTS] Test invalid error class (pyspark.errors.utils) [spark]

2023-12-03 Thread via GitHub
HyukjinKwon commented on PR #44136: URL: https://github.com/apache/spark/pull/44136#issuecomment-1837700136 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-40559][PYTHON][DOCS][FOLLOW-UP] Fix the docstring and document both applyInArrows [spark]

2023-12-03 Thread via GitHub
HyukjinKwon opened a new pull request, #44139: URL: https://github.com/apache/spark/pull/44139 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/38624 that documents both applyInArrows with a docstring fix. ###

Re: [PR] [SPARK-46069][SQL][FOLLOWUP] Make sure the cast expression is date type when unwrap timestamp type to date type [spark]

2023-12-03 Thread via GitHub
wankunde commented on PR #44134: URL: https://github.com/apache/spark/pull/44134#issuecomment-1837699068 > Hi, @wankunde . > > * The follow-up should not reuse the original PR title. Please use a proper PR title for your code itself. > > ``` > - Cast(fromExp, _, timeZoneId,

Re: [PR] [SPARK-46208][PS][DOCS] Adding a link for latest Pandas API specifications. [spark]

2023-12-03 Thread via GitHub
itholic commented on code in PR #44115: URL: https://github.com/apache/spark/pull/44115#discussion_r1413282421 ## python/docs/source/reference/pyspark.pandas/index.rst: ## @@ -23,7 +23,7 @@ Pandas API on Spark This page gives an overview of all public pandas API on Spark.

Re: [PR] [SPARK-46206][PS] Use a narrower scope exception for SQL processor [spark]

2023-12-03 Thread via GitHub
itholic commented on code in PR #44114: URL: https://github.com/apache/spark/pull/44114#discussion_r1413282037 ## python/pyspark/pandas/sql_processor.py: ## @@ -206,9 +206,7 @@ def _get_local_scope() -> Dict[str, Any]: # Get 2 scopes above (_get_local_scope -> sql -> ...)

Re: [PR] [SPARK-46206][PS] Use a narrower scope exception for SQL processor [spark]

2023-12-03 Thread via GitHub
itholic commented on code in PR #44114: URL: https://github.com/apache/spark/pull/44114#discussion_r1413282037 ## python/pyspark/pandas/sql_processor.py: ## @@ -206,9 +206,7 @@ def _get_local_scope() -> Dict[str, Any]: # Get 2 scopes above (_get_local_scope -> sql -> ...)

Re: [PR] [SPARK-46221][PS][DOCS] Change `to_spark_io` to `spark.to_spark_io` in `quickstart_ps.ipynb` [spark]

2023-12-03 Thread via GitHub
itholic commented on PR #44135: URL: https://github.com/apache/spark/pull/44135#issuecomment-1837695311 Late LGTM. Thanks for the fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46090][SQL] Support plan fragment level SQL configs in AQE [spark]

2023-12-03 Thread via GitHub
ulysses-you commented on code in PR #44013: URL: https://github.com/apache/spark/pull/44013#discussion_r1413280745 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SQLConfHelper.scala: ## @@ -29,4 +30,32 @@ trait SQLConfHelper { * See [[SQLConf.get]] for more

Re: [PR] [SPARK-46090][SQL] Support plan fragment level SQL configs in AQE [spark]

2023-12-03 Thread via GitHub
ulysses-you commented on code in PR #44013: URL: https://github.com/apache/spark/pull/44013#discussion_r1413280992 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -194,16 +212,16 @@ case class AdaptiveSparkPlanExec( }

Re: [PR] [SPARK-39911][SQL] Optimize global Sort to RepartitionByExpression [spark]

2023-12-03 Thread via GitHub
ulysses-you commented on PR #37330: URL: https://github.com/apache/spark/pull/37330#issuecomment-1837693411 hi @maytasm , `Sort global` is semantics equal to `Sort local + RepartitionByExpression`. For your case, if there is only single value on the sort column, then Sort global would also

Re: [PR] [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup [spark]

2023-12-03 Thread via GitHub
HyukjinKwon closed pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup URL: https://github.com/apache/spark/pull/38624 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup [spark]

2023-12-03 Thread via GitHub
HyukjinKwon commented on PR #38624: URL: https://github.com/apache/spark/pull/38624#issuecomment-1837692882 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45975][SQL][TESTS][3.5] Reset storeAssignmentPolicy to original [spark]

2023-12-03 Thread via GitHub
wForget commented on PR #44126: URL: https://github.com/apache/spark/pull/44126#issuecomment-1837692534 Sorry for not replying in time during the holiday, and thanks to @LuciferYang for this fix. -- This is an automated message from the Apache Git Service. To respond to the message,

  1   2   >