[GitHub] [spark] yaooqinn commented on a diff in pull request #42575: [WIP][SPARK-44863][UI] Add a button to download thread dump as a txt in Spark UI

2023-08-20 Thread via GitHub
yaooqinn commented on code in PR #42575: URL: https://github.com/apache/spark/pull/42575#discussion_r1299641664 ## core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala: ## @@ -67,18 +69,17 @@ private[ui] class ExecutorThreadDumpPage( Updated at

[GitHub] [spark] ion-elgreco commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup

2023-08-20 Thread via GitHub
ion-elgreco commented on PR #38624: URL: https://github.com/apache/spark/pull/38624#issuecomment-1685688494 > I get that `cogroup` might not be possible tho. But we can just convert pandas back to arrow batches easily. Is this really required for some scenario? IIRC this is only useful for

[GitHub] [spark] yaooqinn commented on pull request #42481: [SPARK-44801][SQL][UI] Capture analyzing failed queries in Listener and UI

2023-08-20 Thread via GitHub
yaooqinn commented on PR #42481: URL: https://github.com/apache/spark/pull/42481#issuecomment-1685688005 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] yaooqinn closed pull request #42481: [SPARK-44801][SQL][UI] Capture analyzing failed queries in Listener and UI

2023-08-20 Thread via GitHub
yaooqinn closed pull request #42481: [SPARK-44801][SQL][UI] Capture analyzing failed queries in Listener and UI URL: https://github.com/apache/spark/pull/42481 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on pull request #42580: [SPARK-44888][SQL][TESTS] Re-generate golden files of `SQLQueryTestSuite` for Java 21

2023-08-20 Thread via GitHub
LuciferYang commented on PR #42580: URL: https://github.com/apache/spark/pull/42580#issuecomment-1685685065 cc @dongjoon-hyun FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #42450: [SPARK-44773][SQL] Code-gen CodegenFallback expression in WholeStageCodegen if possible

2023-08-20 Thread via GitHub
cloud-fan commented on code in PR #42450: URL: https://github.com/apache/spark/pull/42450#discussion_r1299615448 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala: ## @@ -368,6 +368,15 @@ abstract class Expression extends

[GitHub] [spark] LuciferYang opened a new pull request, #42580: [SPARK-44888][SQL][TESTS] Update the golden files of `SQLQueryTestSuite` for Java 21

2023-08-20 Thread via GitHub
LuciferYang opened a new pull request, #42580: URL: https://github.com/apache/spark/pull/42580 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] HyukjinKwon closed pull request #42579: [SPARK-44887][DOCS] Fix wildcard import `from pyspark.sql.functions import *` in `Quick Start` Examples

2023-08-20 Thread via GitHub
HyukjinKwon closed pull request #42579: [SPARK-44887][DOCS] Fix wildcard import `from pyspark.sql.functions import *` in `Quick Start` Examples URL: https://github.com/apache/spark/pull/42579 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HyukjinKwon commented on pull request #42579: [SPARK-44887][DOCS] Fix wildcard import `from pyspark.sql.functions import *` in `Quick Start` Examples

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42579: URL: https://github.com/apache/spark/pull/42579#issuecomment-1685620282 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42462: [SPARK-44751][SQL] XML FileFormat Interface implementation

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42462: URL: https://github.com/apache/spark/pull/42462#discussion_r1299583398 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlGenerator.scala: ## @@ -83,21 +86,21 @@ private[xml] object StaxXmlGenerator { def

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42462: [SPARK-44751][SQL] XML FileFormat Interface implementation

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42462: URL: https://github.com/apache/spark/pull/42462#discussion_r1299583398 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlGenerator.scala: ## @@ -83,21 +86,21 @@ private[xml] object StaxXmlGenerator { def

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42462: [SPARK-44751][SQL] XML FileFormat Interface implementation

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42462: URL: https://github.com/apache/spark/pull/42462#discussion_r1299581325 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -7227,6 +7227,150 @@ object functions { */ def to_csv(e: Column):

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42462: [SPARK-44751][SQL] XML FileFormat Interface implementation

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42462: URL: https://github.com/apache/spark/pull/42462#discussion_r1299580956 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -392,6 +392,46 @@ class DataFrameReader private[sql]

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42462: [SPARK-44751][SQL] XML FileFormat Interface implementation

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42462: URL: https://github.com/apache/spark/pull/42462#discussion_r1299580676 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -392,6 +392,46 @@ class DataFrameReader private[sql]

[GitHub] [spark] HyukjinKwon commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #38624: URL: https://github.com/apache/spark/pull/38624#issuecomment-1685602103 adding @viirya @ueshin @BryanCutler in case you guys have some thought on this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] cloud-fan commented on a diff in pull request #41782: [SPARK-44239][SQL] Free memory allocated by large vectors when vectors are reset

2023-08-20 Thread via GitHub
cloud-fan commented on code in PR #41782: URL: https://github.com/apache/spark/pull/41782#discussion_r1299573483 ## sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java: ## @@ -955,4 +986,8 @@ protected WritableColumnVector(int capacity,

[GitHub] [spark] HyukjinKwon commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #38624: URL: https://github.com/apache/spark/pull/38624#issuecomment-1685601694 Yeah, I meant `df.repartition(grouping_cols).mapInArrow() ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] itholic closed pull request #42528: [SPARK-44844][BUILD] Exclude `python/build/*` path for local `lint-python` testing

2023-08-20 Thread via GitHub
itholic closed pull request #42528: [SPARK-44844][BUILD] Exclude `python/build/*` path for local `lint-python` testing URL: https://github.com/apache/spark/pull/42528 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] itholic commented on pull request #42528: [SPARK-44844][BUILD] Exclude `python/build/*` path for local `lint-python` testing

2023-08-20 Thread via GitHub
itholic commented on PR #42528: URL: https://github.com/apache/spark/pull/42528#issuecomment-1685600378 IIRC they were generated when I upgrade the pip packages by running `pip install -r dev/requirements.txt`, but seems not to be reproducible now for some reason. Let me just close this

[GitHub] [spark] HyukjinKwon commented on pull request #42377: [SPARK-44622][SQL][CONNECT] Implement error enrichment and setting server-side stacktrace

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42377: URL: https://github.com/apache/spark/pull/42377#issuecomment-1685594500 Would be great if we have the user-facing exception (and stacktrace) example at the PR description. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] cloud-fan closed pull request #41335: [SPARK-43205][DOCS][SQL][FOLLOWUP] IDENTIFIER clause docs

2023-08-20 Thread via GitHub
cloud-fan closed pull request #41335: [SPARK-43205][DOCS][SQL][FOLLOWUP] IDENTIFIER clause docs URL: https://github.com/apache/spark/pull/41335 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #41335: [SPARK-43205][DOCS][SQL][FOLLOWUP] IDENTIFIER clause docs

2023-08-20 Thread via GitHub
cloud-fan commented on PR #41335: URL: https://github.com/apache/spark/pull/41335#issuecomment-1685582467 the test failure is unrelated, thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] pan3793 commented on a diff in pull request #42575: [WIP][SPARK-44863][UI] Add a button to download thread dump as a txt in Spark UI

2023-08-20 Thread via GitHub
pan3793 commented on code in PR #42575: URL: https://github.com/apache/spark/pull/42575#discussion_r1299557854 ## core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala: ## @@ -67,18 +69,17 @@ private[ui] class ExecutorThreadDumpPage( Updated at

[GitHub] [spark] cloud-fan closed pull request #41100: [SPARK-43420][SQL] Make DisableUnnecessaryBucketedScan smart with table cache

2023-08-20 Thread via GitHub
cloud-fan closed pull request #41100: [SPARK-43420][SQL] Make DisableUnnecessaryBucketedScan smart with table cache URL: https://github.com/apache/spark/pull/41100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on a diff in pull request #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default

2023-08-20 Thread via GitHub
cloud-fan commented on code in PR #40390: URL: https://github.com/apache/spark/pull/40390#discussion_r1299552593 ## sql/core/src/test/scala/org/apache/spark/sql/sources/DisableUnnecessaryBucketedScanSuite.scala: ## @@ -244,7 +244,8 @@ abstract class

[GitHub] [spark] cloud-fan commented on a diff in pull request #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default

2023-08-20 Thread via GitHub
cloud-fan commented on code in PR #40390: URL: https://github.com/apache/spark/pull/40390#discussion_r1299551915 ## sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala: ## @@ -512,6 +512,9 @@ class CachedTableSuite extends QueryTest with SQLTestUtils *

[GitHub] [spark] wankunde commented on a diff in pull request #42450: [SPARK-44773][SQL] Code-gen CodegenFallback expression in WholeStageCodegen if possible

2023-08-20 Thread via GitHub
wankunde commented on code in PR #42450: URL: https://github.com/apache/spark/pull/42450#discussion_r1299551600 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodegenFallback.scala: ## @@ -46,21 +46,54 @@ trait CodegenFallback extends

[GitHub] [spark] gengliangwang commented on pull request #42553: [SPARK-44864] Align streaming statistics link format with other page links

2023-08-20 Thread via GitHub
gengliangwang commented on PR #42553: URL: https://github.com/apache/spark/pull/42553#issuecomment-1685566875 TBH `%s/%s/statistics?id=%s` is more "restful". (And, of course it would be totally restful if it is `%s/%s/statistics/%s`, but we can't make such changes.) -- This is an

[GitHub] [spark] cloud-fan commented on a diff in pull request #41782: [SPARK-44239][SQL] Free memory allocated by large vectors when vectors are reset

2023-08-20 Thread via GitHub
cloud-fan commented on code in PR #41782: URL: https://github.com/apache/spark/pull/41782#discussion_r1299550767 ## sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java: ## @@ -846,7 +849,14 @@ public final void addElementsAppended(int num)

[GitHub] [spark] cloud-fan commented on a diff in pull request #41782: [SPARK-44239][SQL] Free memory allocated by large vectors when vectors are reset

2023-08-20 Thread via GitHub
cloud-fan commented on code in PR #41782: URL: https://github.com/apache/spark/pull/41782#discussion_r1299550466 ## sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java: ## @@ -84,9 +84,7 @@ public long valuesNativeAddress() { return

[GitHub] [spark] HyukjinKwon commented on pull request #42455: [DRAFT] Fix Spark Connect Behavior for Default Session

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42455: URL: https://github.com/apache/spark/pull/42455#issuecomment-1685563329 Fixed in https://github.com/apache/spark/pull/42464 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon closed pull request #42455: [DRAFT] Fix Spark Connect Behavior for Default Session

2023-08-20 Thread via GitHub
HyukjinKwon closed pull request #42455: [DRAFT] Fix Spark Connect Behavior for Default Session URL: https://github.com/apache/spark/pull/42455 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #42467: [SPARK-44780][DOC] SQL temporary variables

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42467: URL: https://github.com/apache/spark/pull/42467#issuecomment-1685562669  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on a diff in pull request #42450: [SPARK-44773][SQL] Code-gen CodegenFallback expression in WholeStageCodegen if possible

2023-08-20 Thread via GitHub
cloud-fan commented on code in PR #42450: URL: https://github.com/apache/spark/pull/42450#discussion_r1299549288 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodegenFallback.scala: ## @@ -46,21 +46,54 @@ trait CodegenFallback extends

[GitHub] [spark] HyukjinKwon closed pull request #42471: [SPARK-44785][SQL][CONNECT] Convert common alreadyExistsExceptions and noSuchExceptions

2023-08-20 Thread via GitHub
HyukjinKwon closed pull request #42471: [SPARK-44785][SQL][CONNECT] Convert common alreadyExistsExceptions and noSuchExceptions URL: https://github.com/apache/spark/pull/42471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #42471: [SPARK-44785][SQL][CONNECT] Convert common alreadyExistsExceptions and noSuchExceptions

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42471: URL: https://github.com/apache/spark/pull/42471#issuecomment-1685562030 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #42534: [SPARK-44868][SQL] Convert datetime to string by `to_char`/`to_varchar`

2023-08-20 Thread via GitHub
cloud-fan commented on PR #42534: URL: https://github.com/apache/spark/pull/42534#issuecomment-1685560046 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42475: [SPARK-44793][SQL] Fixing pipelineTime metric for WholeStageCodegen

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42475: URL: https://github.com/apache/spark/pull/42475#discussion_r1299546504 ## sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenEvaluatorFactory.scala: ## @@ -41,7 +41,7 @@ class WholeStageCodegenEvaluatorFactory(

[GitHub] [spark] HyukjinKwon commented on pull request #42498: [SPARK-44814][CONNECT][PYTHON]Test to protect from faulty protobuf versions

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42498: URL: https://github.com/apache/spark/pull/42498#issuecomment-1685557081 Seems like it dose trigger sth :-). https://github.com/grundprinzip/spark/actions/runs/5870189292/job/15916811394#step:12:1425 -- This is an automated message from the Apache Git

[GitHub] [spark] goodwanghan commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup

2023-08-20 Thread via GitHub
goodwanghan commented on PR #38624: URL: https://github.com/apache/spark/pull/38624#issuecomment-1685556987 > qq, can't we workaround by `df.repartitionByExpression().mapInArrow()` for `groupby` case? Hi @HyukjinKwon i understand what you mean, I am curious if df.repartition will

[GitHub] [spark] hdaikoku commented on pull request #42572: [SPARK-44881[COMMON]Executor stucked on retrying to fetch shuffle data when `java.lang.OutOfMemoryError. unable to create native thread` exc

2023-08-20 Thread via GitHub
hdaikoku commented on PR #42572: URL: https://github.com/apache/spark/pull/42572#issuecomment-1685554405 This seems to be the same issue as https://github.com/apache/spark/pull/42426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] HyukjinKwon commented on pull request #42528: [SPARK-44844][BUILD] Exclude `python/build/*` path for local `lint-python` testing

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42528: URL: https://github.com/apache/spark/pull/42528#issuecomment-1685554336 how were `python/build/*` generated? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] wankunde commented on a diff in pull request #42450: [SPARK-44773][SQL] Code-gen CodegenFallback expression in WholeStageCodegen if possible

2023-08-20 Thread via GitHub
wankunde commented on code in PR #42450: URL: https://github.com/apache/spark/pull/42450#discussion_r1299537222 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodegenFallback.scala: ## @@ -46,21 +46,54 @@ trait CodegenFallback extends

[GitHub] [spark] itholic commented on a diff in pull request #42551: [SPARK-43563][SPARK-43459][SPARK-43451][SPARK-43506] Remove `squeeze` from `read_csv` & enabling more tests.

2023-08-20 Thread via GitHub
itholic commented on code in PR #42551: URL: https://github.com/apache/spark/pull/42551#discussion_r1299537975 ## python/pyspark/pandas/namespace.py: ## @@ -985,11 +975,6 @@ def read_excel( * If list of string, then indicates list of column names to be parsed.

[GitHub] [spark] zhengruifeng commented on pull request #42579: [SPARK-44887][DOCS] Fix wildcard import `from pyspark.sql.functions import *` in `Quick Start` Examples

2023-08-20 Thread via GitHub
zhengruifeng commented on PR #42579: URL: https://github.com/apache/spark/pull/42579#issuecomment-1685552671 cc @HyukjinKwon @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #42534: [SPARK-44868][SQL] Convert datetime to string by `to_char`/`to_varchar`

2023-08-20 Thread via GitHub
HyukjinKwon closed pull request #42534: [SPARK-44868][SQL] Convert datetime to string by `to_char`/`to_varchar` URL: https://github.com/apache/spark/pull/42534 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #42534: [SPARK-44868][SQL] Convert datetime to string by `to_char`/`to_varchar`

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42534: URL: https://github.com/apache/spark/pull/42534#issuecomment-1685552067 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] wankunde commented on a diff in pull request #42450: [SPARK-44773][SQL] Code-gen CodegenFallback expression in WholeStageCodegen if possible

2023-08-20 Thread via GitHub
wankunde commented on code in PR #42450: URL: https://github.com/apache/spark/pull/42450#discussion_r1299537222 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodegenFallback.scala: ## @@ -46,21 +46,54 @@ trait CodegenFallback extends

[GitHub] [spark] zhengruifeng commented on pull request #42579: [SPARK-44887][DOCS] Fix wildcard import `from pyspark.sql.functions import *` in `Quick Start` Examples

2023-08-20 Thread via GitHub
zhengruifeng commented on PR #42579: URL: https://github.com/apache/spark/pull/42579#issuecomment-1685551050 there are two wildcard import under `docs`: ``` (spark_dev_310) ➜ spark git:(master) ag -i 'import \*' docs docs/sql-ref-datatypes.md 117:from pyspark.sql.types import *

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42541: Spark 44854

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42541: URL: https://github.com/apache/spark/pull/42541#discussion_r1299534962 ## python/pyspark/sql/types.py: ## @@ -442,7 +442,7 @@ def needConversion(self) -> bool: def toInternal(self, dt: datetime.timedelta) -> Optional[int]:

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42541: Spark 44854

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42541: URL: https://github.com/apache/spark/pull/42541#discussion_r1299533479 ## python/pyspark/sql/types.py: ## @@ -442,7 +442,7 @@ def needConversion(self) -> bool: def toInternal(self, dt: datetime.timedelta) -> Optional[int]:

[GitHub] [spark] zhengruifeng opened a new pull request, #42579: [SPARK-44887][DOCS] Fix wildcard import `from pyspark.sql.functions import *` in `Quick Start` Examples

2023-08-20 Thread via GitHub
zhengruifeng opened a new pull request, #42579: URL: https://github.com/apache/spark/pull/42579 ### What changes were proposed in this pull request? Fix wildcard import `from pyspark.sql.functions import *` ### Why are the changes needed? to follow the [PEP 8 - Style Guide

[GitHub] [spark] hdaikoku commented on pull request #42426: [SPARK-44756][CORE] Executor hangs when RetryingBlockTransferor fails to initiate retry

2023-08-20 Thread via GitHub
hdaikoku commented on PR #42426: URL: https://github.com/apache/spark/pull/42426#issuecomment-1685547323 > To make sure I understand correctly - there is an OOM which is thrown, which happens to be within `initiateRetry` and so shuffle fetch stalled indefinitely, and so task appeared to be

[GitHub] [spark] HyukjinKwon commented on pull request #42541: Spark 44854

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42541: URL: https://github.com/apache/spark/pull/42541#issuecomment-1685546533 Seems pretty good - mind retriggering https://github.com/hdaly0/spark/runs/15986658618 please? Also please fix the PR title (see also

[GitHub] [spark] itholic opened a new pull request, #42578: [SPARK-44841][FOLLOWUP] Add migration guide for the behavior change

2023-08-20 Thread via GitHub
itholic opened a new pull request, #42578: URL: https://github.com/apache/spark/pull/42578 ### What changes were proposed in this pull request? This PR followups for https://github.com/apache/spark/pull/42525. ### Why are the changes needed? To fill

[GitHub] [spark] cloud-fan commented on a diff in pull request #42450: [SPARK-44773][SQL] Code-gen CodegenFallback expression in WholeStageCodegen if possible

2023-08-20 Thread via GitHub
cloud-fan commented on code in PR #42450: URL: https://github.com/apache/spark/pull/42450#discussion_r1299530738 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodegenFallback.scala: ## @@ -46,21 +46,54 @@ trait CodegenFallback extends

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42550: [SPARK-44861][CONNECT] jsonignore SparkListenerConnectOperationStarted.planRequest

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42550: URL: https://github.com/apache/spark/pull/42550#discussion_r1299527368 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecuteEventsManager.scala: ## @@ -278,6 +278,7 @@ case class

[GitHub] [spark] wankunde commented on a diff in pull request #42450: [SPARK-44773][SQL] Code-gen CodegenFallback expression in WholeStageCodegen if possible

2023-08-20 Thread via GitHub
wankunde commented on code in PR #42450: URL: https://github.com/apache/spark/pull/42450#discussion_r1299526542 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -150,15 +150,15 @@ class EquivalentExpressions( // 1.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42550: [SPARK-44861][CONNECT] jsonignore SparkListenerConnectOperationStarted.planRequest

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42550: URL: https://github.com/apache/spark/pull/42550#discussion_r1299526592 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecuteEventsManager.scala: ## @@ -278,6 +278,7 @@ case class

[GitHub] [spark] wankunde commented on a diff in pull request #42450: [SPARK-44773][SQL] Code-gen CodegenFallback expression in WholeStageCodegen if possible

2023-08-20 Thread via GitHub
wankunde commented on code in PR #42450: URL: https://github.com/apache/spark/pull/42450#discussion_r1299526542 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -150,15 +150,15 @@ class EquivalentExpressions( // 1.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42551: [SPARK-43563][SPARK-43459][SPARK-43451][SPARK-43506] Remove `squeeze` from `read_csv` & enabling more tests.

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42551: URL: https://github.com/apache/spark/pull/42551#discussion_r1299518780 ## python/pyspark/pandas/namespace.py: ## @@ -985,11 +975,6 @@ def read_excel( * If list of string, then indicates list of column names to be parsed.

[GitHub] [spark] imback82 opened a new pull request, #42577: [SPARK-XXXXX][SQL] Introduce CLUSTER BY clause for CREATE/REPLACE TABLE

2023-08-20 Thread via GitHub
imback82 opened a new pull request, #42577: URL: https://github.com/apache/spark/pull/42577 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] HyukjinKwon commented on pull request #42553: [SPARK-44864] Align streaming statistics link format with other page links

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42553: URL: https://github.com/apache/spark/pull/42553#issuecomment-1685526203 cc @gengliangwang and @sarutak FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #42554: Make StreamingRelationV2 support metadata column

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42554: URL: https://github.com/apache/spark/pull/42554#issuecomment-1685525915 Mind filing a JIRA please? See also https://spark.apache.org/contributing.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] HyukjinKwon commented on pull request #42556: [SPARK-44867][CONNECT][DOCS] Refactor Spark Connect Docs to incorporate Scala setup

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42556: URL: https://github.com/apache/spark/pull/42556#issuecomment-1685525416 @allanf-db FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zekai-li commented on a diff in pull request #42529: [SPARK-44845][YARN][DEPLOY]Fixed file system uri comparison function

2023-08-20 Thread via GitHub
zekai-li commented on code in PR #42529: URL: https://github.com/apache/spark/pull/42529#discussion_r1299512887 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala: ## @@ -670,7 +670,7 @@ class ClientSuite extends SparkFunSuite with Matchers

[GitHub] [spark] dzypersonal commented on pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2023-08-20 Thread via GitHub
dzypersonal commented on PR #36162: URL: https://github.com/apache/spark/pull/36162#issuecomment-1685522302 > It helps in two cases @weixiuli - the example you gave (generated input (like range()), etc where there is no input metrics). It also helps when reading shuffle input where there

[GitHub] [spark] HyukjinKwon commented on pull request #42566: [SPARK-44873][3.4] Support alter view with nested columns in Hive client

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42566: URL: https://github.com/apache/spark/pull/42566#issuecomment-1685515524 Yeah, let's probably don't backport to 3.4 although it's sort of safe. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] zekai-li commented on a diff in pull request #42529: [SPARK-44845][YARN][DEPLOY]Fixed file system uri comparison function

2023-08-20 Thread via GitHub
zekai-li commented on code in PR #42529: URL: https://github.com/apache/spark/pull/42529#discussion_r1299504495 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala: ## @@ -670,7 +670,7 @@ class ClientSuite extends SparkFunSuite with Matchers

[GitHub] [spark] HyukjinKwon commented on pull request #42575: [WIP][SPARK-44863][UI] Add a button to download thread dump as a txt in Spark UI

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42575: URL: https://github.com/apache/spark/pull/42575#issuecomment-1685501692 also @sarutak and @jasonli-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] wankunde commented on a diff in pull request #41782: [SPARK-44239][SQL] Free memory allocated by large vectors when vectors are reset

2023-08-20 Thread via GitHub
wankunde commented on code in PR #41782: URL: https://github.com/apache/spark/pull/41782#discussion_r1299497876 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -487,6 +487,25 @@ object SQLConf { .intConf .createWithDefault(1) +

[GitHub] [spark] srowen commented on a diff in pull request #42428: [SPARK-44742][PYTHON][DOCS] Add Spark version drop down to the PySpark doc site

2023-08-20 Thread via GitHub
srowen commented on code in PR #42428: URL: https://github.com/apache/spark/pull/42428#discussion_r1299494936 ## python/docs/source/_static/versions.json: ## @@ -0,0 +1,278 @@ +[ Review Comment: Yes, let's just start with latest versions even, as a convenience to switch.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42548: [WIP][SPARK-44750][PYTHON][CONNECT] Apply configuration to sparksession during creation

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42548: URL: https://github.com/apache/spark/pull/42548#discussion_r1299494819 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -3347,6 +3347,22 @@ def test_can_create_multiple_sessions_to_different_remotes(self):

[GitHub] [spark] HyukjinKwon commented on pull request #42548: [WIP][SPARK-44750][PYTHON][CONNECT] Apply configuration to sparksession during creation

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42548: URL: https://github.com/apache/spark/pull/42548#issuecomment-1685489213 Three of them are actually runtime configurations :-). Some of `spark.connect.*` are runtime and others are static so we might need to clarify them tho. -- This is an automated

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42521: [SPARK-44435][SS][CONNECT][DRAFT] Tests for foreachBatch and Listener

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42521: URL: https://github.com/apache/spark/pull/42521#discussion_r1299493549 ## python/pyspark/sql/tests/connect/streaming/test_parity_listener.py: ## @@ -19,38 +19,153 @@ import time from

[GitHub] [spark] HyukjinKwon commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #38624: URL: https://github.com/apache/spark/pull/38624#issuecomment-1685484092 I get that `cogroup` might not be possible tho. But we can just convert pandas back to arrow batches easily. Is this really required for some scenario? IIRC this is only useful for

[GitHub] [spark] HyukjinKwon commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #38624: URL: https://github.com/apache/spark/pull/38624#issuecomment-1685483237 qq, can't we workaround by `df.repartitionByExpression().mapInArrow()` for `groupby` case? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41782: [SPARK-44239][SQL] Free memory allocated by large vectors when vectors are reset

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #41782: URL: https://github.com/apache/spark/pull/41782#discussion_r1299491118 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -487,6 +487,25 @@ object SQLConf { .intConf .createWithDefault(1)

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41782: [SPARK-44239][SQL] Free memory allocated by large vectors when vectors are reset

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #41782: URL: https://github.com/apache/spark/pull/41782#discussion_r1299490804 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -487,6 +487,25 @@ object SQLConf { .intConf .createWithDefault(1)

[GitHub] [spark] zhengruifeng commented on pull request #42556: [SPARK-44867][CONNECT][DOCS] Refactor Spark Connect Docs to incorporate Scala setup

2023-08-20 Thread via GitHub
zhengruifeng commented on PR #42556: URL: https://github.com/apache/spark/pull/42556#issuecomment-1685477162 cc @grundprinzip @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #42563: [SPARK-44877][CONNECT][PYTHON] Support python protobuf functions for Spark Connect

2023-08-20 Thread via GitHub
zhengruifeng commented on PR #42563: URL: https://github.com/apache/spark/pull/42563#issuecomment-1685475524 merged to master and branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng closed pull request #42563: [SPARK-44877][CONNECT][PYTHON] Support python protobuf functions for Spark Connect

2023-08-20 Thread via GitHub
zhengruifeng closed pull request #42563: [SPARK-44877][CONNECT][PYTHON] Support python protobuf functions for Spark Connect URL: https://github.com/apache/spark/pull/42563 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40085: [SPARK-42492][SQL] Add new function filter_value

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #40085: URL: https://github.com/apache/spark/pull/40085#discussion_r1299485759 ## python/pyspark/sql/functions.py: ## @@ -13068,6 +13068,46 @@ def _invoke_higher_order_function( return Column(cast(JVMView, sc._jvm).Column(expr(*jcols +

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42428: [SPARK-44742][PYTHON][DOCS] Add Spark version drop down to the PySpark doc site

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42428: URL: https://github.com/apache/spark/pull/42428#discussion_r1299471107 ## python/docs/source/_templates/version-switcher.html: ## @@ -0,0 +1,60 @@ + Review Comment: Let's put the license header: ``` ``` -- This

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42428: [SPARK-44742][PYTHON][DOCS] Add Spark version drop down to the PySpark doc site

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42428: URL: https://github.com/apache/spark/pull/42428#discussion_r1299470607 ## python/docs/source/_static/versions.json: ## @@ -0,0 +1,278 @@ +[ Review Comment: I wonder if we better remove EOL releases ... but no strong opinion WDYT

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42392: [SPARK-44717][PYTHON][PS] Respect TimestampNTZ in resampling

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42392: URL: https://github.com/apache/spark/pull/42392#discussion_r1299469771 ## python/pyspark/pandas/tests/test_resample.py: ## @@ -252,14 +254,32 @@ def test_dataframe_resample(self): self._test_resample(self.pdf5, self.psdf5,

[GitHub] [spark] zhengruifeng commented on pull request #42526: [SPARK-44842][SPARK-43812][PS] Support stat functions for pandas 2.0.0 and enabling tests.

2023-08-20 Thread via GitHub
zhengruifeng commented on PR #42526: URL: https://github.com/apache/spark/pull/42526#issuecomment-1685452613 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #42526: [SPARK-44842][SPARK-43812][PS] Support stat functions for pandas 2.0.0 and enabling tests.

2023-08-20 Thread via GitHub
zhengruifeng closed pull request #42526: [SPARK-44842][SPARK-43812][PS] Support stat functions for pandas 2.0.0 and enabling tests. URL: https://github.com/apache/spark/pull/42526 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] zhengruifeng commented on pull request #42547: [SPARK-44858][PYTHON][DOCS] Refine dostring of DataFrame.isEmpty

2023-08-20 Thread via GitHub
zhengruifeng commented on PR #42547: URL: https://github.com/apache/spark/pull/42547#issuecomment-1685449633 merged to master and branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng closed pull request #42547: [SPARK-44858][PYTHON][DOCS] Refine dostring of DataFrame.isEmpty

2023-08-20 Thread via GitHub
zhengruifeng closed pull request #42547: [SPARK-44858][PYTHON][DOCS] Refine dostring of DataFrame.isEmpty URL: https://github.com/apache/spark/pull/42547 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon closed pull request #42255: [SPARK-40178][SQL][COONECT] Support coalesce hints with ease for PySpark and R

2023-08-20 Thread via GitHub
HyukjinKwon closed pull request #42255: [SPARK-40178][SQL][COONECT] Support coalesce hints with ease for PySpark and R URL: https://github.com/apache/spark/pull/42255 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #42255: [SPARK-40178][SQL][COONECT] Support coalesce hints with ease for PySpark and R

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42255: URL: https://github.com/apache/spark/pull/42255#issuecomment-1685447669 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on pull request #42575: [WIP][SPARK-44863][UI] Add a button to download thread dump as a txt in Spark UI

2023-08-20 Thread via GitHub
zhengruifeng commented on PR #42575: URL: https://github.com/apache/spark/pull/42575#issuecomment-1685447587 cc @gengliangwang @gatorsmile -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42513: [SPARK-44827][PYTHON][TESTS] Fix test when ansi mode enabled

2023-08-20 Thread via GitHub
HyukjinKwon commented on code in PR #42513: URL: https://github.com/apache/spark/pull/42513#discussion_r1299463703 ## python/pyspark/sql/dataframe.py: ## @@ -3793,6 +3793,8 @@ def union(self, other: "DataFrame") -> "DataFrame": Example 2: Combining two DataFrames with

[GitHub] [spark] HyukjinKwon closed pull request #42569: [SPARK-44879][PYTHON][DOCS] Refine the docstring of spark.createDataFrame

2023-08-20 Thread via GitHub
HyukjinKwon closed pull request #42569: [SPARK-44879][PYTHON][DOCS] Refine the docstring of spark.createDataFrame URL: https://github.com/apache/spark/pull/42569 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #42569: [SPARK-44879][PYTHON][DOCS] Refine the docstring of spark.createDataFrame

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42569: URL: https://github.com/apache/spark/pull/42569#issuecomment-1685443463 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #42568: [SPARK-44876][PYTHON] Fix Arrow-optimized Python UDF on Spark Connect

2023-08-20 Thread via GitHub
HyukjinKwon closed pull request #42568: [SPARK-44876][PYTHON] Fix Arrow-optimized Python UDF on Spark Connect URL: https://github.com/apache/spark/pull/42568 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #42568: [SPARK-44876][PYTHON] Fix Arrow-optimized Python UDF on Spark Connect

2023-08-20 Thread via GitHub
HyukjinKwon commented on PR #42568: URL: https://github.com/apache/spark/pull/42568#issuecomment-1685440028 Merged to master and branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] github-actions[bot] closed pull request #41113: [SPARK-43400][SQL] Add Primary Key syntax support

2023-08-20 Thread via GitHub
github-actions[bot] closed pull request #41113: [SPARK-43400][SQL] Add Primary Key syntax support URL: https://github.com/apache/spark/pull/41113 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] github-actions[bot] closed pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain`

2023-08-20 Thread via GitHub
github-actions[bot] closed pull request #40467: [SPARK-42584][CONNECT] Improve output of `Column.explain` URL: https://github.com/apache/spark/pull/40467 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

  1   2   >