[GitHub] [spark] LuciferYang commented on pull request #42480: Test Java 17 + Pyspark

2023-08-13 Thread via GitHub
LuciferYang commented on PR #42480: URL: https://github.com/apache/spark/pull/42480#issuecomment-1676775356 [5a92ee2](https://github.com/apache/spark/pull/42480/commits/5a92ee2fea3eed657f34e846c1a5d708c097f461) revert SPARK-44705 for test Java 17 -- This is an automated message from the

[GitHub] [spark] cloud-fan commented on pull request #42482: [SPARK-43885][SQL][FOLLOWUP] Instruction#dataType should not fail

2023-08-13 Thread via GitHub
cloud-fan commented on PR #42482: URL: https://github.com/apache/spark/pull/42482#issuecomment-1676763466 cc @aokolnychyi @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] cloud-fan opened a new pull request, #42482: [SPARK-43885][SQL][FOLLOWUP] Instruction#dataType should not fail

2023-08-13 Thread via GitHub
cloud-fan opened a new pull request, #42482: URL: https://github.com/apache/spark/pull/42482 ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/41448 . As an optimizer rule, the produced plan should be resolved and r

[GitHub] [spark] cloud-fan commented on a diff in pull request #42467: [SPARK-44780][DOC] SQL temporary variables

2023-08-13 Thread via GitHub
cloud-fan commented on code in PR #42467: URL: https://github.com/apache/spark/pull/42467#discussion_r1293027327 ## docs/sql-ref-syntax-aux-set-var.md: ## @@ -0,0 +1,98 @@ +--- +layout: global +title: SET VAR +displayTitle: SET VAR +license: | + Licensed to the Apache Software

[GitHub] [spark] cloud-fan commented on a diff in pull request #42467: [SPARK-44780][DOC] SQL temporary variables

2023-08-13 Thread via GitHub
cloud-fan commented on code in PR #42467: URL: https://github.com/apache/spark/pull/42467#discussion_r1293027030 ## docs/sql-ref-syntax-aux-set-var.md: ## @@ -0,0 +1,98 @@ +--- +layout: global +title: SET VAR +displayTitle: SET VAR +license: | + Licensed to the Apache Software

[GitHub] [spark] yaooqinn opened a new pull request, #42481: [SPARK-44801][SQL][UI] Capture analyzing failed queries in Listener and UI

2023-08-13 Thread via GitHub
yaooqinn opened a new pull request, #42481: URL: https://github.com/apache/spark/pull/42481 ### What changes were proposed in this pull request? This PR wraps the catch-block with a new execution id to QueryExecution.assertAnalyzed. It will reuse `SQLExecution.with

[GitHub] [spark] cloud-fan commented on a diff in pull request #42467: [SPARK-44780][DOC] SQL temporary variables

2023-08-13 Thread via GitHub
cloud-fan commented on code in PR #42467: URL: https://github.com/apache/spark/pull/42467#discussion_r1293025786 ## docs/sql-ref-syntax-ddl-declare-variable.md: ## @@ -0,0 +1,82 @@ +--- +layout: global +title: DECLARE VARIABLE +displayTitle: DECLARE VARIABLE +license: | + Licen

[GitHub] [spark] LuciferYang opened a new pull request, #42480: Test Java 17 + Pyspark

2023-08-13 Thread via GitHub
LuciferYang opened a new pull request, #42480: URL: https://github.com/apache/spark/pull/42480 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] wangyum commented on pull request #42474: [SPARK-44792][BUILD] Upgrade curator to 5.2.0

2023-08-13 Thread via GitHub
wangyum commented on PR #42474: URL: https://github.com/apache/spark/pull/42474#issuecomment-1676714192 Thanks. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [spark] wangyum closed pull request #42474: [SPARK-44792][BUILD] Upgrade curator to 5.2.0

2023-08-13 Thread via GitHub
wangyum closed pull request #42474: [SPARK-44792][BUILD] Upgrade curator to 5.2.0 URL: https://github.com/apache/spark/pull/42474 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] LuciferYang commented on pull request #42385: [SPARK-44705][PYTHON] Make PythonRunner single-threaded

2023-08-13 Thread via GitHub
LuciferYang commented on PR #42385: URL: https://github.com/apache/spark/pull/42385#issuecomment-1676690789 This PR caused the failure of the Scala 2.13 mima check. https://github.com/apache/spark/pull/42479 -- This is an automated message from the Apache Git Service. To respond to the me

[GitHub] [spark] grundprinzip commented on a diff in pull request #42478: [SPARK-44795][CONNECT] CodeGenerator Cache should be classloader specific

2023-08-13 Thread via GitHub
grundprinzip commented on code in PR #42478: URL: https://github.com/apache/spark/pull/42478#discussion_r1292973359 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/OuterScopes.scala: ## @@ -26,28 +26,9 @@ import org.apache.spark.util.SparkClassUtils object Ou

[GitHub] [spark] LuciferYang opened a new pull request, #42479: [SPARK-44798][BUILD] Fix Scala 2.13 mima check after SPARK-44705 merged

2023-08-13 Thread via GitHub
LuciferYang opened a new pull request, #42479: URL: https://github.com/apache/spark/pull/42479 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] LuciferYang opened a new pull request, #42477: [SPARK-44796][BUILD][CONNECT] Remove `grpc-java` plugin related configuration from the `connect/connect-client-jvm` module

2023-08-13 Thread via GitHub
LuciferYang opened a new pull request, #42477: URL: https://github.com/apache/spark/pull/42477 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] panbingkun commented on pull request #42425: [SPARK-44729][PYTHON][DOCS] Add canonical links to the PySpark docs page

2023-08-13 Thread via GitHub
panbingkun commented on PR #42425: URL: https://github.com/apache/spark/pull/42425#issuecomment-1676614757 Alternatively, the following is also a solution, but the server configuration needs to be modified. ref: https://developers.google.com/search/docs/crawling-indexing/consolidate-dupl

[GitHub] [spark] xiaoa6435 commented on a diff in pull request #42431: [WIP][SPARK-42905][MLLIB] fix spearman correlation incorrect and inconsistent results when data has huge amount of ties

2023-08-13 Thread via GitHub
xiaoa6435 commented on code in PR #42431: URL: https://github.com/apache/spark/pull/42431#discussion_r1292926346 ## mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/SpearmanCorrelation.scala: ## @@ -65,8 +65,8 @@ private[stat] object SpearmanCorrelation extends Corre

[GitHub] [spark] ukby1234 commented on a diff in pull request #42296: [SPARK-44635][CORE] Handle shuffle fetch failures in decommissions

2023-08-13 Thread via GitHub
ukby1234 commented on code in PR #42296: URL: https://github.com/apache/spark/pull/42296#discussion_r1292923823 ## core/src/main/scala/org/apache/spark/MapOutputTracker.scala: ## @@ -1288,6 +1288,30 @@ private[spark] class MapOutputTrackerWorker(conf: SparkConf) extends MapOutp

[GitHub] [spark] ukby1234 commented on a diff in pull request #42296: [SPARK-44635][CORE] Handle shuffle fetch failures in decommissions

2023-08-13 Thread via GitHub
ukby1234 commented on code in PR #42296: URL: https://github.com/apache/spark/pull/42296#discussion_r1292923823 ## core/src/main/scala/org/apache/spark/MapOutputTracker.scala: ## @@ -1288,6 +1288,30 @@ private[spark] class MapOutputTrackerWorker(conf: SparkConf) extends MapOutp

[GitHub] [spark] hvanhovell commented on pull request #42476: [SPARK-44794][CONNECT] Make Streaming Queries work with REPL generated classes.

2023-08-13 Thread via GitHub
hvanhovell commented on PR #42476: URL: https://github.com/apache/spark/pull/42476#issuecomment-1676591099 @bogao007 PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] zhengruifeng commented on pull request #42451: [SPARK-44775][PYTHON][DOCS] Add missing version information in DataFrame APIs

2023-08-13 Thread via GitHub
zhengruifeng commented on PR #42451: URL: https://github.com/apache/spark/pull/42451#issuecomment-1676590123 thanks, merged to master and branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] zhengruifeng closed pull request #42451: [SPARK-44775][PYTHON][DOCS] Add missing version information in DataFrame APIs

2023-08-13 Thread via GitHub
zhengruifeng closed pull request #42451: [SPARK-44775][PYTHON][DOCS] Add missing version information in DataFrame APIs URL: https://github.com/apache/spark/pull/42451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] hvanhovell opened a new pull request, #42476: [SPARK-44794][CONNECT] Make Streaming Queries work with REPL generated classes.

2023-08-13 Thread via GitHub
hvanhovell opened a new pull request, #42476: URL: https://github.com/apache/spark/pull/42476 ### What changes were proposed in this pull request? When you try to run a streaming query from the REPL for example: ```scala val add1 = udf((i: Long) => i + 1) val query = spark.readStr

[GitHub] [spark] advancedxy commented on pull request #42255: [SPARK-40178][SQL][COONECT] support coalesce hints with ease for PySpark and R

2023-08-13 Thread via GitHub
advancedxy commented on PR #42255: URL: https://github.com/apache/spark/pull/42255#issuecomment-1676584021 Gently ping @cloud-fan @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] srowen commented on pull request #42469: [SPARK-44782][INFRA] Adjust PR template to Generative Tooling Guidance recommendations

2023-08-13 Thread via GitHub
srowen commented on PR #42469: URL: https://github.com/apache/spark/pull/42469#issuecomment-1676570724 It should be in both places. It's interesting. The typical CONTRIBUTING.md text that we have says, "When you contribute code, you affirm that the contribution is your original work",

[GitHub] [spark] yaooqinn commented on pull request #42462: [SPARK-44751][SQL] XML FileFormat Interface implementation

2023-08-13 Thread via GitHub
yaooqinn commented on PR #42462: URL: https://github.com/apache/spark/pull/42462#issuecomment-1676567114 Thanks for the explanation @HyukjinKwon. I'm OK with it if we already have precedents like arvo and csv -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [spark] panbingkun commented on pull request #42425: [SPARK-44729][PYTHON][DOCS] Add canonical links to the PySpark docs page

2023-08-13 Thread via GitHub
panbingkun commented on PR #42425: URL: https://github.com/apache/spark/pull/42425#issuecomment-1676556063 > Here is an example of a documentation page for a specific version: https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.DataFrame.withColumn.html > > This i

[GitHub] [spark] shuwang21 commented on a diff in pull request #42357: [SPARK-44306][YARN] Group FileStatus with few RPC calls within Yarn Client

2023-08-13 Thread via GitHub
shuwang21 commented on code in PR #42357: URL: https://github.com/apache/spark/pull/42357#discussion_r1292893433 ## resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala: ## @@ -666,6 +666,42 @@ class ClientSuite extends SparkFunSuite with Matchers

[GitHub] [spark] shuwang21 commented on a diff in pull request #42357: [SPARK-44306][YARN] Group FileStatus with few RPC calls within Yarn Client

2023-08-13 Thread via GitHub
shuwang21 commented on code in PR #42357: URL: https://github.com/apache/spark/pull/42357#discussion_r1292892702 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -485,7 +534,12 @@ private[spark] class Client( val localResources = Has

[GitHub] [spark] shuwang21 commented on a diff in pull request #42357: [SPARK-44306][YARN] Group FileStatus with few RPC calls within Yarn Client

2023-08-13 Thread via GitHub
shuwang21 commented on code in PR #42357: URL: https://github.com/apache/spark/pull/42357#discussion_r1292891976 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -458,6 +461,52 @@ private[spark] class Client( new Path(resolvedDestDir

[GitHub] [spark] zhengruifeng commented on pull request #42469: [SPARK-44782][INFRA] Adjust PR template to Generative Tooling Guidance recommendations

2023-08-13 Thread via GitHub
zhengruifeng commented on PR #42469: URL: https://github.com/apache/spark/pull/42469#issuecomment-1676521158 I guess this should be documented in https://spark.apache.org/developer-tools.html instead of PR template? cc @HyukjinKwon @gatorsmile @srowen -- This is an automated messa

[GitHub] [spark] shuwang21 commented on a diff in pull request #42357: [SPARK-44306][YARN] Group FileStatus with few RPC calls within Yarn Client

2023-08-13 Thread via GitHub
shuwang21 commented on code in PR #42357: URL: https://github.com/apache/spark/pull/42357#discussion_r1292889034 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala: ## @@ -462,6 +462,31 @@ package object config extends Logging { .stringConf

[GitHub] [spark] shuwang21 commented on a diff in pull request #42357: [SPARK-44306][YARN] Group FileStatus with few RPC calls within Yarn Client

2023-08-13 Thread via GitHub
shuwang21 commented on code in PR #42357: URL: https://github.com/apache/spark/pull/42357#discussion_r1292888633 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala: ## @@ -462,6 +462,31 @@ package object config extends Logging { .stringConf

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42451: [SPARK-44775][PYTHON][DOCS] Add missing version information in DataFrame APIs

2023-08-13 Thread via GitHub
zhengruifeng commented on code in PR #42451: URL: https://github.com/apache/spark/pull/42451#discussion_r1292888463 ## python/pyspark/sql/dataframe.py: ## @@ -4066,6 +4078,9 @@ def dropDuplicatesWithinWatermark(self, subset: Optional[List[str]] = None) -> " .. versi

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42451: [SPARK-44775][PYTHON][DOCS] Add missing version information in DataFrame APIs

2023-08-13 Thread via GitHub
zhengruifeng commented on code in PR #42451: URL: https://github.com/apache/spark/pull/42451#discussion_r1292888387 ## python/pyspark/sql/dataframe.py: ## @@ -3540,6 +3546,9 @@ def melt( .. versionadded:: 3.4.0 +.. versionchanged:: 3.4.0 Review Comment:

[GitHub] [spark] hvanhovell closed pull request #42473: [SPARK-44791][CONNECT] Make ArrowDeserializer work with REPL generated classes

2023-08-13 Thread via GitHub
hvanhovell closed pull request #42473: [SPARK-44791][CONNECT] Make ArrowDeserializer work with REPL generated classes URL: https://github.com/apache/spark/pull/42473 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] hvanhovell commented on pull request #42473: [SPARK-44791][CONNECT] Make ArrowDeserializer work with REPL generated classes

2023-08-13 Thread via GitHub
hvanhovell commented on PR #42473: URL: https://github.com/apache/spark/pull/42473#issuecomment-1676515981 Merging to master/3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] itholic commented on pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

2023-08-13 Thread via GitHub
itholic commented on PR #42388: URL: https://github.com/apache/spark/pull/42388#issuecomment-1676515460 I don't see any relevant log for the test failure from result as below: ``` Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, u

[GitHub] [spark] shuwang21 commented on a diff in pull request #42357: [SPARK-44306][YARN] Group FileStatus with few RPC calls within Yarn Client

2023-08-13 Thread via GitHub
shuwang21 commented on code in PR #42357: URL: https://github.com/apache/spark/pull/42357#discussion_r1292885986 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -458,6 +461,52 @@ private[spark] class Client( new Path(resolvedDestDir

[GitHub] [spark] github-actions[bot] closed pull request #40954: [PYSPARK] [CONNECT] [ML] PySpark UDF supports python package dependencies

2023-08-13 Thread via GitHub
github-actions[bot] closed pull request #40954: [PYSPARK] [CONNECT] [ML] PySpark UDF supports python package dependencies URL: https://github.com/apache/spark/pull/40954 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] github-actions[bot] closed pull request #41033: Update bufbuild plugin references

2023-08-13 Thread via GitHub
github-actions[bot] closed pull request #41033: Update bufbuild plugin references URL: https://github.com/apache/spark/pull/41033 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] hvanhovell closed pull request #42418: [SPARK-44736][CONNECT] Add Dataset.explode to Spark Connect Scala Client

2023-08-13 Thread via GitHub
hvanhovell closed pull request #42418: [SPARK-44736][CONNECT] Add Dataset.explode to Spark Connect Scala Client URL: https://github.com/apache/spark/pull/42418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] hvanhovell commented on pull request #42418: [SPARK-44736][CONNECT] Add Dataset.explode to Spark Connect Scala Client

2023-08-13 Thread via GitHub
hvanhovell commented on PR #42418: URL: https://github.com/apache/spark/pull/42418#issuecomment-1676430570 Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] bersprockets commented on pull request #42075: [SPARK-43966][SQL][PYTHON] Support non-deterministic table-valued functions

2023-08-13 Thread via GitHub
bersprockets commented on PR #42075: URL: https://github.com/apache/spark/pull/42075#issuecomment-1676417784 Super late review: I think `boundGenerator` needs to be initialized somewhere around [here](https://github.com/apache/spark/blob/7070b3672d8426834ff936fff4543b10093042fc/sql/co