[GitHub] [spark] MaxGekk closed pull request #40991: [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175`

2023-08-02 Thread via GitHub
MaxGekk closed pull request #40991: [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175` URL: https://github.com/apache/spark/pull/40991 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] MaxGekk commented on pull request #40991: [SPARK-42330][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175`

2023-08-02 Thread via GitHub
MaxGekk commented on PR #40991: URL: https://github.com/apache/spark/pull/40991#issuecomment-1663334453 +1, LGTM. Merging to master/3.5. Thank you, @kori73. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] asl3 opened a new pull request, #42314: [SPARK-44652] Raise error when only one df is None

2023-08-02 Thread via GitHub
asl3 opened a new pull request, #42314: URL: https://github.com/apache/spark/pull/42314 ### What changes were proposed in this pull request? Adds a "raise PySparkAssertionError" for the case when one of `actual` or `expected` is None, instead of just returning False. ### Why

[GitHub] [spark] anjakefala commented on a diff in pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-08-02 Thread via GitHub
anjakefala commented on code in PR #41711: URL: https://github.com/apache/spark/pull/41711#discussion_r1282666245 ## dev/error_message_refiner.py: ## @@ -0,0 +1,265 @@ +#!/usr/bin/env python3 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

[GitHub] [spark] anjakefala commented on a diff in pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-08-02 Thread via GitHub
anjakefala commented on code in PR #41711: URL: https://github.com/apache/spark/pull/41711#discussion_r1282666245 ## dev/error_message_refiner.py: ## @@ -0,0 +1,265 @@ +#!/usr/bin/env python3 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

[GitHub] [spark] anjakefala commented on a diff in pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-08-02 Thread via GitHub
anjakefala commented on code in PR #41711: URL: https://github.com/apache/spark/pull/41711#discussion_r1282666245 ## dev/error_message_refiner.py: ## @@ -0,0 +1,265 @@ +#!/usr/bin/env python3 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

[GitHub] [spark] cxzl25 opened a new pull request, #42313: [SPARK-44650][CORE] `spark.executor.defaultJavaOptions` Check illegal java options

2023-08-02 Thread via GitHub
cxzl25 opened a new pull request, #42313: URL: https://github.com/apache/spark/pull/42313 ### What changes were proposed in this pull request? ### Why are the changes needed? Command ```bash ./bin/spark-shell --conf spark.executor.extraJavaOptions='-Dspark.foo=bar'

[GitHub] [spark] mridulm commented on pull request #42296: [SPARK-44635][CORE] Handle shuffle fetch failures in decommissions

2023-08-02 Thread via GitHub
mridulm commented on PR #42296: URL: https://github.com/apache/spark/pull/42296#issuecomment-1663278588 +CC @otterc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] ulysses-you commented on a diff in pull request #41088: [SPARK-43402][SQL] FileSourceScanExec supports push down data filter with scalar subquery

2023-08-02 Thread via GitHub
ulysses-you commented on code in PR #41088: URL: https://github.com/apache/spark/pull/41088#discussion_r1282632165 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -189,7 +189,13 @@ object FileSourceStrategy extends Strategy

[GitHub] [spark] mridulm commented on pull request #42136: [SPARK-43100][CORE] Mismatch of field name in log event writer and parser for push shuffle metrics

2023-08-02 Thread via GitHub
mridulm commented on PR #42136: URL: https://github.com/apache/spark/pull/42136#issuecomment-1663274605 The test I am looking for would have failed before we fixed `JsonProtocol`, and worked after this PR. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #41088: [SPARK-43402][SQL] FileSourceScanExec supports push down data filter with scalar subquery

2023-08-02 Thread via GitHub
cloud-fan commented on code in PR #41088: URL: https://github.com/apache/spark/pull/41088#discussion_r1282626537 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -189,7 +189,13 @@ object FileSourceStrategy extends Strategy

[GitHub] [spark] ulysses-you commented on a diff in pull request #41088: [SPARK-43402][SQL] FileSourceScanExec supports push down data filter with scalar subquery

2023-08-02 Thread via GitHub
ulysses-you commented on code in PR #41088: URL: https://github.com/apache/spark/pull/41088#discussion_r1282624369 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -189,7 +189,13 @@ object FileSourceStrategy extends Strategy

[GitHub] [spark] cloud-fan commented on a diff in pull request #41088: [SPARK-43402][SQL] FileSourceScanExec supports push down data filter with scalar subquery

2023-08-02 Thread via GitHub
cloud-fan commented on code in PR #41088: URL: https://github.com/apache/spark/pull/41088#discussion_r1282621969 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -189,7 +189,13 @@ object FileSourceStrategy extends Strategy

[GitHub] [spark] cloud-fan commented on pull request #35214: [SPARK-37915][SQL] Combine unions if there is a project between them

2023-08-02 Thread via GitHub
cloud-fan commented on PR #35214: URL: https://github.com/apache/spark/pull/35214#issuecomment-1663263552 `df.union` can break caching by design, but it was to optimize a special df pattern `df1.union(df2).union(df3).union...`. This PR does make it worse as `df.union` can break caching

[GitHub] [spark] sandip-db commented on a diff in pull request #41832: [SPARK-44265][SQL] Built-in XML data source support

2023-08-02 Thread via GitHub
sandip-db commented on code in PR #41832: URL: https://github.com/apache/spark/pull/41832#discussion_r1282619785 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlInputFormat.scala: ## @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] zhengruifeng commented on pull request #42292: [SPARK-44572][INFRA] Clean up unused installers ASAP

2023-08-02 Thread via GitHub
zhengruifeng commented on PR #42292: URL: https://github.com/apache/spark/pull/42292#issuecomment-1663260086 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #42292: [SPARK-44572][INFRA] Clean up unused installers ASAP

2023-08-02 Thread via GitHub
zhengruifeng closed pull request #42292: [SPARK-44572][INFRA] Clean up unused installers ASAP URL: https://github.com/apache/spark/pull/42292 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] ueshin commented on a diff in pull request #41606: [SPARK-44061][PYTHON] Add assertDataFrameEqual util function

2023-08-02 Thread via GitHub
ueshin commented on code in PR #41606: URL: https://github.com/apache/spark/pull/41606#discussion_r1282615314 ## python/pyspark/testing/utils.py: ## @@ -209,3 +219,200 @@ def check_error( self.assertEqual( expected, actual, f"Expected message parameters

[GitHub] [spark] gengliangwang commented on pull request #41832: [SPARK-44265][SQL] Built-in XML data source support

2023-08-02 Thread via GitHub
gengliangwang commented on PR #41832: URL: https://github.com/apache/spark/pull/41832#issuecomment-1663248713 @sandip-db Thanks for the work! All the new Spark connectors are under the folder https://github.com/apache/spark/tree/master/connector. I notice that there are over 100 files

[GitHub] [spark] liangyu-1 commented on pull request #42058: [SPARK-42972][DSTREAM]ExecutorAllocationManager cannot allocate new instances when all executors down

2023-08-02 Thread via GitHub
liangyu-1 commented on PR #42058: URL: https://github.com/apache/spark/pull/42058#issuecomment-1663245878 cc @mridulm @tgravescs @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] gengliangwang commented on a diff in pull request #41832: [SPARK-44265][SQL] Built-in XML data source support

2023-08-02 Thread via GitHub
gengliangwang commented on code in PR #41832: URL: https://github.com/apache/spark/pull/41832#discussion_r1282605700 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlInputFormat.scala: ## @@ -0,0 +1,341 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] HyukjinKwon commented on pull request #41832: [SPARK-44265][SQL] Built-in XML data source support

2023-08-02 Thread via GitHub
HyukjinKwon commented on PR #41832: URL: https://github.com/apache/spark/pull/41832#issuecomment-1663243018 Seems the test failure is flakiness. Mind retriggering https://github.com/sandip-db/spark/actions/runs/5745270542/job/15573043078 please @sandip-db ? -- This is an automated

[GitHub] [spark] HyukjinKwon commented on pull request #42294: [MINOR][BUG-FIX] Fix one unit mistake related to spark.eventLog.buffer.kb

2023-08-02 Thread via GitHub
HyukjinKwon commented on PR #42294: URL: https://github.com/apache/spark/pull/42294#issuecomment-1663235316 cc @HeartSaVioR FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #42305: [SPARK-44645][PYTHON][DOCS] Update assertDataFrameEqual docs error example output

2023-08-02 Thread via GitHub
HyukjinKwon closed pull request #42305: [SPARK-44645][PYTHON][DOCS] Update assertDataFrameEqual docs error example output URL: https://github.com/apache/spark/pull/42305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #42305: [SPARK-44645][PYTHON][DOCS] Update assertDataFrameEqual docs error example output

2023-08-02 Thread via GitHub
HyukjinKwon commented on PR #42305: URL: https://github.com/apache/spark/pull/42305#issuecomment-1663233727 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #42307: [SPARK-42730][CONNECT][DOCS] Update Spark Standalone Mode page

2023-08-02 Thread via GitHub
HyukjinKwon closed pull request #42307: [SPARK-42730][CONNECT][DOCS] Update Spark Standalone Mode page URL: https://github.com/apache/spark/pull/42307 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] ulysses-you commented on a diff in pull request #41088: [SPARK-43402][SQL] FileSourceScanExec supports push down data filter with scalar subquery

2023-08-02 Thread via GitHub
ulysses-you commented on code in PR #41088: URL: https://github.com/apache/spark/pull/41088#discussion_r1282596741 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -189,7 +189,13 @@ object FileSourceStrategy extends Strategy

[GitHub] [spark] HyukjinKwon commented on pull request #42307: [SPARK-42730][CONNECT][DOCS] Update Spark Standalone Mode page

2023-08-02 Thread via GitHub
HyukjinKwon commented on PR #42307: URL: https://github.com/apache/spark/pull/42307#issuecomment-1663232011 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #39952: [SPARK-40770][PYTHON][FOLLOW-UP] Improved error messages for mapInPandas for schema mismatch

2023-08-02 Thread via GitHub
HyukjinKwon commented on PR #39952: URL: https://github.com/apache/spark/pull/39952#issuecomment-1663228670 I am fine with merging it to 3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #42083: [SPARK-44488][SQL] Support deserializing long types when creating `Metadata` object from JObject

2023-08-02 Thread via GitHub
HyukjinKwon closed pull request #42083: [SPARK-44488][SQL] Support deserializing long types when creating `Metadata` object from JObject URL: https://github.com/apache/spark/pull/42083 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #42083: [SPARK-44488][SQL] Support deserializing long types when creating `Metadata` object from JObject

2023-08-02 Thread via GitHub
HyukjinKwon commented on PR #42083: URL: https://github.com/apache/spark/pull/42083#issuecomment-1663228135 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #42177: [SPARK-44059][SQL] Add better error messages for SQL named argumnts

2023-08-02 Thread via GitHub
HyukjinKwon closed pull request #42177: [SPARK-44059][SQL] Add better error messages for SQL named argumnts URL: https://github.com/apache/spark/pull/42177 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #42177: [SPARK-44059][SQL] Add better error messages for SQL named argumnts

2023-08-02 Thread via GitHub
HyukjinKwon commented on PR #42177: URL: https://github.com/apache/spark/pull/42177#issuecomment-1663227440 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #42303: [SPARK-44643][SQL][PYTHON] Fix Row.__repr__ for the case the field is empty Row

2023-08-02 Thread via GitHub
HyukjinKwon closed pull request #42303: [SPARK-44643][SQL][PYTHON] Fix Row.__repr__ for the case the field is empty Row URL: https://github.com/apache/spark/pull/42303 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #42303: [SPARK-44643][SQL][PYTHON] Fix Row.__repr__ for the case the field is empty Row

2023-08-02 Thread via GitHub
HyukjinKwon commented on PR #42303: URL: https://github.com/apache/spark/pull/42303#issuecomment-1663225792 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42309: [SPARK-44644][PYTHON] Improve error messages for Python UDTFs with pickling errors

2023-08-02 Thread via GitHub
HyukjinKwon commented on code in PR #42309: URL: https://github.com/apache/spark/pull/42309#discussion_r1282591356 ## python/pyspark/cloudpickle/cloudpickle_fast.py: ## @@ -631,7 +631,7 @@ def dump(self, obj): try: return Pickler.dump(self, obj)

[GitHub] [spark] HyukjinKwon closed pull request #42261: [SPARK-44620][SQL][PS][CONNECT] Make `ResolvePivot` retain the `Plan_ID_TAG`

2023-08-02 Thread via GitHub
HyukjinKwon closed pull request #42261: [SPARK-44620][SQL][PS][CONNECT] Make `ResolvePivot` retain the `Plan_ID_TAG` URL: https://github.com/apache/spark/pull/42261 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #42261: [SPARK-44620][SQL][PS][CONNECT] Make `ResolvePivot` retain the `Plan_ID_TAG`

2023-08-02 Thread via GitHub
HyukjinKwon commented on PR #42261: URL: https://github.com/apache/spark/pull/42261#issuecomment-1663222018 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] Ngone51 commented on a diff in pull request #42296: [SPARK-44635][CORE] Handle shuffle fetch failures in decommissions

2023-08-02 Thread via GitHub
Ngone51 commented on code in PR #42296: URL: https://github.com/apache/spark/pull/42296#discussion_r1282588775 ## core/src/main/scala/org/apache/spark/MapOutputTracker.scala: ## @@ -1288,6 +1288,30 @@ private[spark] class MapOutputTrackerWorker(conf: SparkConf) extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #41088: [SPARK-43402][SQL] FileSourceScanExec supports push down data filter with scalar subquery

2023-08-02 Thread via GitHub
cloud-fan commented on code in PR #41088: URL: https://github.com/apache/spark/pull/41088#discussion_r1282582380 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -189,7 +189,13 @@ object FileSourceStrategy extends Strategy

[GitHub] [spark] advancedxy commented on a diff in pull request #42255: [SPARK-40178][SQL] Support string parameters in hint method

2023-08-02 Thread via GitHub
advancedxy commented on code in PR #42255: URL: https://github.com/apache/spark/pull/42255#discussion_r1282565854 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -1407,7 +1407,15 @@ class Dataset[T] private[sql]( */ @scala.annotation.varargs def

[GitHub] [spark] advancedxy commented on a diff in pull request #42255: [SPARK-40178][SQL] Support string parameters in hint method

2023-08-02 Thread via GitHub
advancedxy commented on code in PR #42255: URL: https://github.com/apache/spark/pull/42255#discussion_r1282565695 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -645,6 +645,35 @@ def test_generic_hints(self): df1.join(df2.hint("broadcast"),

[GitHub] [spark] wankunde commented on pull request #42206: [SPARK-44582][SQL] Skip iterator on SMJ if it was cleaned up

2023-08-02 Thread via GitHub
wankunde commented on PR #42206: URL: https://github.com/apache/spark/pull/42206#issuecomment-1663191874 > It looks fine to me, except maybe check the code for left semi joins. > > I could not make the crash happen with left semi joins. I think the bug might actually exist in that

[GitHub] [spark] cloud-fan commented on a diff in pull request #42286: [MINOR][SQL] Rename shouldBroadcast to isDynamicPruning in InSubqueryExec

2023-08-02 Thread via GitHub
cloud-fan commented on code in PR #42286: URL: https://github.com/apache/spark/pull/42286#discussion_r1282560911 ## sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala: ## @@ -117,7 +117,7 @@ case class InSubqueryExec( child: Expression, plan:

[GitHub] [spark] itholic opened a new pull request, #42312: [SPARK-43476][SPARK-43477][SPARK-43478][PS] Support `StringMethods` for pandas 2.0.0 and above

2023-08-02 Thread via GitHub
itholic opened a new pull request, #42312: URL: https://github.com/apache/spark/pull/42312 ### What changes were proposed in this pull request? This PR proposes to support `StringMethods` for pandas 2.0.0 and above. ### Why are the changes needed? Support the latest

[GitHub] [spark] ulysses-you commented on a diff in pull request #41088: [SPARK-43402][SQL] FileSourceScanExec supports push down data filter with scalar subquery

2023-08-02 Thread via GitHub
ulysses-you commented on code in PR #41088: URL: https://github.com/apache/spark/pull/41088#discussion_r1282550963 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -189,7 +189,13 @@ object FileSourceStrategy extends Strategy

[GitHub] [spark] ueshin commented on pull request #42310: [SPARK-44561][PYTHON] Fix AssertionError when converting UDTF output to a complex type

2023-08-02 Thread via GitHub
ueshin commented on PR #42310: URL: https://github.com/apache/spark/pull/42310#issuecomment-1663153829 cc @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] sandip-db commented on pull request #41832: [SPARK-44265][SQL] Built-in XML data source support

2023-08-02 Thread via GitHub
sandip-db commented on PR #41832: URL: https://github.com/apache/spark/pull/41832#issuecomment-1663142408 > This would need an SPIP. [SPIP link](https://docs.google.com/document/d/1ZaOBT4-YFtN58UCx2cdFhlsKbie1ugAn-Fgz_Dddz-Q/edit?usp=sharing)

[GitHub] [spark] ueshin commented on pull request #42310: [SPARK-44561][PYTHON] Fix AssertionError when converting UDTF output to a complex type

2023-08-02 Thread via GitHub
ueshin commented on PR #42310: URL: https://github.com/apache/spark/pull/42310#issuecomment-1663140627 cc @xinrong-meng Could you take a look and see if we want to apply this to Arrow Python UDF too? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] github-actions[bot] commented on pull request #38171: [SPARK-9213] [SQL] Improve regular expression performance (via joni)

2023-08-02 Thread via GitHub
github-actions[bot] commented on PR #38171: URL: https://github.com/apache/spark/pull/38171#issuecomment-1663125700 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #40912: [SPARK-43238][CORE] Support only decommission idle workers in standalone

2023-08-02 Thread via GitHub
github-actions[bot] closed pull request #40912: [SPARK-43238][CORE] Support only decommission idle workers in standalone URL: https://github.com/apache/spark/pull/40912 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] github-actions[bot] commented on pull request #40930: [DO NOT MERGE] File constant metadata extractors split

2023-08-02 Thread via GitHub
github-actions[bot] commented on PR #40930: URL: https://github.com/apache/spark/pull/40930#issuecomment-1663125667 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark-connect-go] zhengruifeng commented on pull request #13: [SPARK-44368] Support Repartition and RepartitionByRange in Spark Connect Go Client

2023-08-02 Thread via GitHub
zhengruifeng commented on PR #13: URL: https://github.com/apache/spark-connect-go/pull/13#issuecomment-1663121478 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark-connect-go] zhengruifeng closed pull request #13: [SPARK-44368] Support Repartition and RepartitionByRange in Spark Connect Go Client

2023-08-02 Thread via GitHub
zhengruifeng closed pull request #13: [SPARK-44368] Support Repartition and RepartitionByRange in Spark Connect Go Client URL: https://github.com/apache/spark-connect-go/pull/13 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon closed pull request #42311: [SPARK-44424][PYTHON][CONNECT][FOLLOW-UP] Import Connect related libraries after checking dependencies

2023-08-02 Thread via GitHub
HyukjinKwon closed pull request #42311: [SPARK-44424][PYTHON][CONNECT][FOLLOW-UP] Import Connect related libraries after checking dependencies URL: https://github.com/apache/spark/pull/42311 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] agubichev commented on a diff in pull request #41301: [SPARK-43780][SQL] Support correlated references in join predicates

2023-08-02 Thread via GitHub
agubichev commented on code in PR #41301: URL: https://github.com/apache/spark/pull/41301#discussion_r1282513447 ## sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out: ## @@ -572,6 +572,36 @@ struct 0 1 +-- !query +SELECT * FROM t1 JOIN lateral (SELECT

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42304: [SPARK-44642] ReleaseExecute in ExecutePlanResponseReattachableIterator after it gets error from server

2023-08-02 Thread via GitHub
HyukjinKwon commented on code in PR #42304: URL: https://github.com/apache/spark/pull/42304#discussion_r1282510781 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala: ## @@ -102,28 +102,33 @@ class

[GitHub] [spark] agubichev commented on a diff in pull request #41301: [SPARK-43780][SQL] Support correlated references in join predicates

2023-08-02 Thread via GitHub
agubichev commented on code in PR #41301: URL: https://github.com/apache/spark/pull/41301#discussion_r1282510234 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala: ## @@ -826,8 +882,13 @@ object DecorrelateInnerQuery extends

[GitHub] [spark] agubichev commented on a diff in pull request #41301: [SPARK-43780][SQL] Support correlated references in join predicates

2023-08-02 Thread via GitHub
agubichev commented on code in PR #41301: URL: https://github.com/apache/spark/pull/41301#discussion_r1282510138 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala: ## @@ -804,18 +804,67 @@ object DecorrelateInnerQuery extends

[GitHub] [spark] ueshin opened a new pull request, #42310: [SPARK-44561][PYTHON] Fix AssertionError when converting UDTF output to a complex type

2023-08-02 Thread via GitHub
ueshin opened a new pull request, #42310: URL: https://github.com/apache/spark/pull/42310 ### What changes were proposed in this pull request? Fixes AssertionError when converting UDTF output to a complex type by ignore assertions in `_create_converter_from_pandas` to make Arrow

[GitHub] [spark] HyukjinKwon opened a new pull request, #42311: [SPARK-44424][PYTHON][CONNECT][FOLLOW-UP] Import Connect related libraries after checking dependencies

2023-08-02 Thread via GitHub
HyukjinKwon opened a new pull request, #42311: URL: https://github.com/apache/spark/pull/42311 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/42235 that fixes the Connect related import after the dependency checking.

[GitHub] [spark] allisonwang-db commented on pull request #42309: [SPARK-44644][PYTHON] Improve error messages for creating Python UDTFs with pickling errors

2023-08-02 Thread via GitHub
allisonwang-db commented on PR #42309: URL: https://github.com/apache/spark/pull/42309#issuecomment-1663089746 cc @ueshin @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] allisonwang-db opened a new pull request, #42309: [SPARK-44644][PYTHON] Improve error messages for creating Python UDTFs with pickling errors

2023-08-02 Thread via GitHub
allisonwang-db opened a new pull request, #42309: URL: https://github.com/apache/spark/pull/42309 ### What changes were proposed in this pull request? This PR improves the error messages when a Python UDTF failed to pickle. ### Why are the changes needed? To make

[GitHub] [spark] mathewjacob1002 opened a new pull request, #42308: [Spark Ticket][WIP]Added a warning to pop up in the case the user doesn't use gpus

2023-08-02 Thread via GitHub
mathewjacob1002 opened a new pull request, #42308: URL: https://github.com/apache/spark/pull/42308 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] pegasas opened a new pull request, #42307: [SPARK-42730][CONNECT][DOCS] Update Spark Standalone Mode - Starting …

2023-08-02 Thread via GitHub
pegasas opened a new pull request, #42307: URL: https://github.com/apache/spark/pull/42307 …a Cluster Manually ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing

[GitHub] [spark] HyukjinKwon closed pull request #42235: [SPARK-44424][CONNECT][PYTHON] Python client for reattaching to existing execute in Spark Connect

2023-08-02 Thread via GitHub
HyukjinKwon closed pull request #42235: [SPARK-44424][CONNECT][PYTHON] Python client for reattaching to existing execute in Spark Connect URL: https://github.com/apache/spark/pull/42235 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #42235: [SPARK-44424][CONNECT][PYTHON] Python client for reattaching to existing execute in Spark Connect

2023-08-02 Thread via GitHub
HyukjinKwon commented on PR #42235: URL: https://github.com/apache/spark/pull/42235#issuecomment-1663083079 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dtenedor commented on pull request #42177: [SPARK-44059][SQL] Add better error messages for SQL named argumnts

2023-08-02 Thread via GitHub
dtenedor commented on PR #42177: URL: https://github.com/apache/spark/pull/42177#issuecomment-1663068692 Hi @MaxGekk @HyukjinKwon the tests for this PR are passing, if everything looks good to you, would one of you mind to merge this PR for us? Thanks :)  -- This is an automated

[GitHub] [spark] szehon-ho opened a new pull request, #42306: [SQL][SPARK-44647] Support SPJ where join keys are less than cluster keys

2023-08-02 Thread via GitHub
szehon-ho opened a new pull request, #42306: URL: https://github.com/apache/spark/pull/42306 ### What changes were proposed in this pull request? - Add new conf spark.sql.sources.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled - Change key compatibility checks in

[GitHub] [spark] juliuszsompolski closed pull request #42274: [SPARK-44624][CONNECT] Retry ExecutePlan in case initial request didn't reach server overkill

2023-08-02 Thread via GitHub
juliuszsompolski closed pull request #42274: [SPARK-44624][CONNECT] Retry ExecutePlan in case initial request didn't reach server overkill URL: https://github.com/apache/spark/pull/42274 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] asl3 opened a new pull request, #42305: [SPARK-44645] Update assertDataFrameEqual docs error example output

2023-08-02 Thread via GitHub
asl3 opened a new pull request, #42305: URL: https://github.com/apache/spark/pull/42305 ### What changes were proposed in this pull request? This PR updates the error example output for the `assertDataFrameEqual` docs, given the new error message formatting. ### Why are the

[GitHub] [spark] HyukjinKwon closed pull request #42280: [SPARK-44626][SS][CONNECT] Followup on streaming query termination when client session is timed out for Spark Connect

2023-08-02 Thread via GitHub
HyukjinKwon closed pull request #42280: [SPARK-44626][SS][CONNECT] Followup on streaming query termination when client session is timed out for Spark Connect URL: https://github.com/apache/spark/pull/42280 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HyukjinKwon commented on pull request #42280: [SPARK-44626][SS][CONNECT] Followup on streaming query termination when client session is timed out for Spark Connect

2023-08-02 Thread via GitHub
HyukjinKwon commented on PR #42280: URL: https://github.com/apache/spark/pull/42280#issuecomment-1663025910 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] juliuszsompolski opened a new pull request, #42304: [SPARK-44642] ReleaseExecute in ExecutePlanResponseReattachableIterator after it gets error from server

2023-08-02 Thread via GitHub
juliuszsompolski opened a new pull request, #42304: URL: https://github.com/apache/spark/pull/42304 ### What changes were proposed in this pull request? When server returns error on the response stream via onError, the ExecutePlanResponseReattachableIterator will not see the stream

[GitHub] [spark] hvanhovell closed pull request #42298: [SPARK-44636][CONNECT] Leave no dangling iterators

2023-08-02 Thread via GitHub
hvanhovell closed pull request #42298: [SPARK-44636][CONNECT] Leave no dangling iterators URL: https://github.com/apache/spark/pull/42298 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] hvanhovell commented on pull request #42298: [SPARK-44636][CONNECT] Leave no dangling iterators

2023-08-02 Thread via GitHub
hvanhovell commented on PR #42298: URL: https://github.com/apache/spark/pull/42298#issuecomment-1663000546 Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dtenedor commented on pull request #42174: [SPARK-44503][SQL] Add analysis and planning for PARTITION BY and ORDER BY clause after TABLE arguments for TVF calls

2023-08-02 Thread via GitHub
dtenedor commented on PR #42174: URL: https://github.com/apache/spark/pull/42174#issuecomment-1662992296 Discussed offline, we decided not to backport anymore PRs from this feature to Spark 3.5. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] ueshin closed pull request #42290: [SPARK-44559][PYTHON][3.5] Improve error messages for Python UDTF arrow cast

2023-08-02 Thread via GitHub
ueshin closed pull request #42290: [SPARK-44559][PYTHON][3.5] Improve error messages for Python UDTF arrow cast URL: https://github.com/apache/spark/pull/42290 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] ueshin commented on pull request #42290: [SPARK-44559][PYTHON][3.5] Improve error messages for Python UDTF arrow cast

2023-08-02 Thread via GitHub
ueshin commented on PR #42290: URL: https://github.com/apache/spark/pull/42290#issuecomment-1662985587 Thanks! merging to 3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dtenedor commented on a diff in pull request #42272: [SPARK-44508][PYTHON][DOCS] Add user guide for Python user-defined table functions

2023-08-02 Thread via GitHub
dtenedor commented on code in PR #42272: URL: https://github.com/apache/spark/pull/42272#discussion_r1282403569 ## python/docs/source/user_guide/sql/python_udtf.rst: ## @@ -0,0 +1,140 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor

[GitHub] [spark] allisonwang-db commented on pull request #39952: [SPARK-40770][PYTHON][FOLLOW-UP] Improved error messages for mapInPandas for schema mismatch

2023-08-02 Thread via GitHub
allisonwang-db commented on PR #39952: URL: https://github.com/apache/spark/pull/39952#issuecomment-1662979295 @xinrong-meng @EnricoMi should we also merge this in spark-3.5? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42302: [SPARK-44640][PYTHON] Improve error messages for Python UDTF returning non Iterable

2023-08-02 Thread via GitHub
allisonwang-db commented on code in PR #42302: URL: https://github.com/apache/spark/pull/42302#discussion_r1282426270 ## python/pyspark/worker.py: ## @@ -656,32 +651,46 @@ def mapper(_, it): def wrap_udtf(f, return_type): assert

[GitHub] [spark] allisonwang-db commented on pull request #42302: [SPARK-44640][PYTHON] Improve error messages for Python UDTF returning non Iterable

2023-08-02 Thread via GitHub
allisonwang-db commented on PR #42302: URL: https://github.com/apache/spark/pull/42302#issuecomment-1662973685 cc @ueshin @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on a diff in pull request #42266: [SPARK-44575][SQL][CONNECT] Implement basic error translation

2023-08-02 Thread via GitHub
amaliujia commented on code in PR #42266: URL: https://github.com/apache/spark/pull/42266#discussion_r1282412197 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -47,11 +52,37 @@ private[client] object

[GitHub] [spark] learningchess2003 commented on pull request #42177: [SPARK-44059] Add better error messages for SQL named argumnts

2023-08-02 Thread via GitHub
learningchess2003 commented on PR #42177: URL: https://github.com/apache/spark/pull/42177#issuecomment-1662955100 @allisonwang-db PR reviewed! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] heyihong commented on a diff in pull request #42266: [SPARK-44575][SQL][CONNECT] Implement basic error translation

2023-08-02 Thread via GitHub
heyihong commented on code in PR #42266: URL: https://github.com/apache/spark/pull/42266#discussion_r1282395373 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -47,11 +52,37 @@ private[client] object

[GitHub] [spark] learningchess2003 commented on pull request #42177: [SPARK-44059] Add better error messages for SQL named argumnts

2023-08-02 Thread via GitHub
learningchess2003 commented on PR #42177: URL: https://github.com/apache/spark/pull/42177#issuecomment-1662935288 @MaxGekk All tests are green! Can you merge it when you have time? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] heyihong commented on a diff in pull request #42266: [SPARK-44575][SQL][CONNECT] Implement basic error translation

2023-08-02 Thread via GitHub
heyihong commented on code in PR #42266: URL: https://github.com/apache/spark/pull/42266#discussion_r1282395373 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -47,11 +52,37 @@ private[client] object

[GitHub] [spark] ion-elgreco commented on pull request #38624: [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup

2023-08-02 Thread via GitHub
ion-elgreco commented on PR #38624: URL: https://github.com/apache/spark/pull/38624#issuecomment-1662931308 Looking forward to see this PR getting merged :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] ueshin commented on pull request #42303: [SPARK-44643][SQL][PYTHON] Fix Row.__repr__ for the case the field is empty Row

2023-08-02 Thread via GitHub
ueshin commented on PR #42303: URL: https://github.com/apache/spark/pull/42303#issuecomment-1662918237 cc @HyukjinKwon @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia commented on a diff in pull request #42266: [SPARK-44575][SQL][CONNECT] Implement basic error translation

2023-08-02 Thread via GitHub
amaliujia commented on code in PR #42266: URL: https://github.com/apache/spark/pull/42266#discussion_r1282380443 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -47,11 +52,37 @@ private[client] object

[GitHub] [spark] amaliujia commented on a diff in pull request #42266: [SPARK-44575][SQL][CONNECT] Implement basic error translation

2023-08-02 Thread via GitHub
amaliujia commented on code in PR #42266: URL: https://github.com/apache/spark/pull/42266#discussion_r1282378744 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -47,11 +52,37 @@ private[client] object

[GitHub] [spark] amaliujia commented on a diff in pull request #42266: [SPARK-44575][SQL][CONNECT] Implement basic error translation

2023-08-02 Thread via GitHub
amaliujia commented on code in PR #42266: URL: https://github.com/apache/spark/pull/42266#discussion_r1282378744 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -47,11 +52,37 @@ private[client] object

[GitHub] [spark] amaliujia commented on a diff in pull request #42266: [SPARK-44575][SQL][CONNECT] Implement basic error translation

2023-08-02 Thread via GitHub
amaliujia commented on code in PR #42266: URL: https://github.com/apache/spark/pull/42266#discussion_r1282376757 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -47,11 +52,37 @@ private[client] object

[GitHub] [spark] ueshin opened a new pull request, #42303: [SPARK-44643][SQL][PYTHON] Fix Row.__repr__ for the case the field is empty Row

2023-08-02 Thread via GitHub
ueshin opened a new pull request, #42303: URL: https://github.com/apache/spark/pull/42303 ### What changes were proposed in this pull request? Fix `Row.__repr__` for the case the field is empty `Row`. ```py >>> repr(Row(Row())) ')>' ``` ### Why are the changes

[GitHub] [spark] zhouyejoe commented on pull request #42136: [SPARK-43100][CORE] Mismatch of field name in log event writer and parser for push shuffle metrics

2023-08-02 Thread via GitHub
zhouyejoe commented on PR #42136: URL: https://github.com/apache/spark/pull/42136#issuecomment-1662863188 There are existing tests for querying rest calls and verify the Json response in HistoryServerSuite: "stage list with peak metrics" ->

[GitHub] [spark] hvanhovell closed pull request #42300: [SPARK-44421][FOLLOWUP] Minor rename of ResponseComplete to ResultComplete

2023-08-02 Thread via GitHub
hvanhovell closed pull request #42300: [SPARK-44421][FOLLOWUP] Minor rename of ResponseComplete to ResultComplete URL: https://github.com/apache/spark/pull/42300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] hvanhovell commented on pull request #42300: [SPARK-44421][FOLLOWUP] Minor rename of ResponseComplete to ResultComplete

2023-08-02 Thread via GitHub
hvanhovell commented on PR #42300: URL: https://github.com/apache/spark/pull/42300#issuecomment-1662841978 Merging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] juliuszsompolski commented on pull request #42300: [SPARK-44421][FOLLOWUP] Minor rename of ResponseComplete to ResultComplete

2023-08-02 Thread via GitHub
juliuszsompolski commented on PR #42300: URL: https://github.com/apache/spark/pull/42300#issuecomment-1662833240 https://github.com/juliuszsompolski/apache-spark/runs/15565715779 finished successfuly, but seems to not have posted back to github status. -- This is an automated message

[GitHub] [spark] hvanhovell commented on pull request #42299: [SPARK-44637][CONNECT] Synchronize accesses to ExecuteResponseObserver

2023-08-02 Thread via GitHub
hvanhovell commented on PR #42299: URL: https://github.com/apache/spark/pull/42299#issuecomment-1662819633 Merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

  1   2   >