[GitHub] [spark] zhengruifeng commented on pull request #42088: [SPARK-44491][INFRA] Add `branch-3.5` to `publish_snapshot` GitHub Action job

2023-07-19 Thread via GitHub
zhengruifeng commented on PR #42088: URL: https://github.com/apache/spark/pull/42088#issuecomment-1643364098 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [spark] HyukjinKwon commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

2023-07-19 Thread via GitHub
HyukjinKwon commented on PR #42086: URL: https://github.com/apache/spark/pull/42086#issuecomment-1643363137 I am fine with this as a workaround for now but such implementation depending on tags is sort of flaky. The tags are easily lost when you, e.g., copy the expressions IIRC. -- This

[GitHub] [spark] beliefer commented on pull request #42084: [SPARK-44292][SQL][FOLLOWUP] Make TYPE_CHECK_FAILURE_WITH_HINT use correct name

2023-07-19 Thread via GitHub
beliefer commented on PR #42084: URL: https://github.com/apache/spark/pull/42084#issuecomment-1643357973 ping @cloud-fan cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] yihua commented on pull request #40728: [WIP][SPARK-39634][SQL] Allow file splitting in combination with row index generation

2023-07-19 Thread via GitHub
yihua commented on PR #40728: URL: https://github.com/apache/spark/pull/40728#issuecomment-1643350395 Hi @vkorukanti, this is an important performance improvement for using the row index from Parquet. Is the PR targeted for Spark 3.5? -- This is an automated message from the Apache Git Se

[GitHub] [spark] HyukjinKwon closed pull request #42088: [SPARK-44491][INFRA] Add `branch-3.5` to `publish_snapshot` GitHub Action job

2023-07-19 Thread via GitHub
HyukjinKwon closed pull request #42088: [SPARK-44491][INFRA] Add `branch-3.5` to `publish_snapshot` GitHub Action job URL: https://github.com/apache/spark/pull/42088 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #42088: [SPARK-44491][INFRA] Add `branch-3.5` to `publish_snapshot` GitHub Action job

2023-07-19 Thread via GitHub
HyukjinKwon commented on PR #42088: URL: https://github.com/apache/spark/pull/42088#issuecomment-1643340583 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] itholic commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

2023-07-19 Thread via GitHub
itholic commented on PR #42086: URL: https://github.com/apache/spark/pull/42086#issuecomment-1643329346 I got it. Just created single ticket here: SPARK-44492 for addressing undefined remaining tests so we don't miss it. -- This is an automated message from the Apache Git Service. To resp

[GitHub] [spark] liangyu-1 commented on a diff in pull request #42058: [SPARK-42972][DSTREAM]ExecutorAllocationManager cannot allocate new instances when all executors down

2023-07-19 Thread via GitHub
liangyu-1 commented on code in PR #42058: URL: https://github.com/apache/spark/pull/42058#discussion_r1267703981 ## streaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala: ## @@ -102,6 +102,11 @@ private[streaming] class ExecutorAllocationM

[GitHub] [spark] zhengruifeng commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

2023-07-19 Thread via GitHub
zhengruifeng commented on PR #42086: URL: https://github.com/apache/spark/pull/42086#issuecomment-1643320025 > > The good news is that 90% UTs can be resolved by this single one, and I think we only need to touch 3~4 more rules. > > Great! Could you help creating tickets for remaining

[GitHub] [spark] surnaik commented on pull request #41856: [SPARK-44301][SQL] Add Benchmark Suite for TPCH

2023-07-19 Thread via GitHub
surnaik commented on PR #41856: URL: https://github.com/apache/spark/pull/41856#issuecomment-1643310115 Back from a break. I will use the official dbgen from TPCH website and update the PR. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] itholic closed pull request #42041: [DO-NOT-MERGE][PS][TESTS] Enable pandas API on Spark tests related to SPARK-43611

2023-07-19 Thread via GitHub
itholic closed pull request #42041: [DO-NOT-MERGE][PS][TESTS] Enable pandas API on Spark tests related to SPARK-43611 URL: https://github.com/apache/spark/pull/42041 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] itholic commented on pull request #42041: [DO-NOT-MERGE][PS][TESTS] Enable pandas API on Spark tests related to SPARK-43611

2023-07-19 Thread via GitHub
itholic commented on PR #42041: URL: https://github.com/apache/spark/pull/42041#issuecomment-1643309098 Closing since now we have a fix for tests: https://github.com/apache/spark/pull/42086 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] ulysses-you commented on pull request #40524: [SPARK-42898][SQL] Mark that string/date casts do not need time zone id

2023-07-19 Thread via GitHub
ulysses-you commented on PR #40524: URL: https://github.com/apache/spark/pull/40524#issuecomment-1643307262 After offline discussion, @pan3793 will try to take over this pr. Thank you all! -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [spark] itholic commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

2023-07-19 Thread via GitHub
itholic commented on PR #42086: URL: https://github.com/apache/spark/pull/42086#issuecomment-1643306997 > The good news is that 90% UTs can be resolved by this single one, and I think we only need to touch 3~4 more rules. Great! Could you help creating tickets for remaining 3~4 more r

[GitHub] [spark] yaooqinn commented on pull request #40524: [SPARK-42898][SQL] Mark that string/date casts do not need time zone id

2023-07-19 Thread via GitHub
yaooqinn commented on PR #40524: URL: https://github.com/apache/spark/pull/40524#issuecomment-1643292939 cc @ulysses-you who is the last one to touch this part -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] yaooqinn commented on pull request #41951: [SPARK-44367][SQL][UI] Show error message on UI for each failed query

2023-07-19 Thread via GitHub
yaooqinn commented on PR #41951: URL: https://github.com/apache/spark/pull/41951#issuecomment-1643284132 Thanks, merged to master and 3.5. PS, K8s IT failures are irrelevant. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [spark] yaooqinn closed pull request #41951: [SPARK-44367][SQL][UI] Show error message on UI for each failed query

2023-07-19 Thread via GitHub
yaooqinn closed pull request #41951: [SPARK-44367][SQL][UI] Show error message on UI for each failed query URL: https://github.com/apache/spark/pull/41951 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] LuciferYang commented on pull request #42088: [SPARK-44491][INFRA] Add `branch-3.5` to `publish_snapshot` GitHub Action job

2023-07-19 Thread via GitHub
LuciferYang commented on PR #42088: URL: https://github.com/apache/spark/pull/42088#issuecomment-1643272217 cc @HyukjinKwon @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] panbingkun opened a new pull request, #42088: [SPARK-44491][INFRA] Add `branch-3.5` to `publish_snapshot` GitHub Action job

2023-07-19 Thread via GitHub
panbingkun opened a new pull request, #42088: URL: https://github.com/apache/spark/pull/42088 ### What changes were proposed in this pull request? This PR aims to add `branch-3.5` to `publish_snapshot` GitHub Action job. ### Why are the changes needed? Since GitHub Action Cron jo

[GitHub] [spark] mathewjacob1002 opened a new pull request, #42087: [SPARK-44264] Added Example to Deepspeed Distributor

2023-07-19 Thread via GitHub
mathewjacob1002 opened a new pull request, #42087: URL: https://github.com/apache/spark/pull/42087 ### What changes were proposed in this pull request? Added examples to the docstring of using DeepspeedTorchDistributor ### Why are the changes needed? More concrete examples, a

[GitHub] [spark] zhengruifeng closed pull request #42040: [WIP][SPARK-43611][SQL][PS][CONNCECT] Fix unexpected `AnalysisException` from Spark Connect client

2023-07-19 Thread via GitHub
zhengruifeng closed pull request #42040: [WIP][SPARK-43611][SQL][PS][CONNCECT] Fix unexpected `AnalysisException` from Spark Connect client URL: https://github.com/apache/spark/pull/42040 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] zhengruifeng commented on pull request #42040: [WIP][SPARK-43611][SQL][PS][CONNCECT] Fix unexpected `AnalysisException` from Spark Connect client

2023-07-19 Thread via GitHub
zhengruifeng commented on PR #42040: URL: https://github.com/apache/spark/pull/42040#issuecomment-1643204561 close this one in favor of https://github.com/apache/spark/pull/42086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] beliefer commented on pull request #41932: [SPARK-44131][SQL][PYTHON][CONNECT][FOLLOWUP] Support qualified function name for call_function

2023-07-19 Thread via GitHub
beliefer commented on PR #41932: URL: https://github.com/apache/spark/pull/41932#issuecomment-1643189573 The CI failure is unrelated to this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] zhengruifeng commented on pull request #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

2023-07-19 Thread via GitHub
zhengruifeng commented on PR #42086: URL: https://github.com/apache/spark/pull/42086#issuecomment-1643187065 I have checked with @cloud-fan that we might have to modify the rules one by one. The good news is that 90% UTs can be resolved by this single one, and I think we only need to tou

[GitHub] [spark] zhengruifeng opened a new pull request, #42086: [SPARK-43611][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG `

2023-07-19 Thread via GitHub
zhengruifeng opened a new pull request, #42086: URL: https://github.com/apache/spark/pull/42086 ### What changes were proposed in this pull request? Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG ` ### Why are the changes needed? In https://github.com/apache/spark/pu

[GitHub] [spark] cxzl25 opened a new pull request, #42085: [SPARK-44490][WEBUI] Remove `TaskPagedTable` in StagePage

2023-07-19 Thread via GitHub
cxzl25 opened a new pull request, #42085: URL: https://github.com/apache/spark/pull/42085 ### What changes were proposed in this pull request? Remove `TaskPagedTable` ### Why are the changes needed? In [SPARK-21809](https://issues.apache.org/jira/browse/SPARK-21809), we introdu

[GitHub] [spark] HyukjinKwon closed pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-19 Thread via GitHub
HyukjinKwon closed pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python URL: https://github.com/apache/spark/pull/41948 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-19 Thread via GitHub
HyukjinKwon commented on PR #41948: URL: https://github.com/apache/spark/pull/41948#issuecomment-1643063875 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] beliefer opened a new pull request, #42084: [SPARK-44292][SQL][FOLLOWUP] Make TYPE_CHECK_FAILURE_WITH_HINT use correct name

2023-07-19 Thread via GitHub
beliefer opened a new pull request, #42084: URL: https://github.com/apache/spark/pull/42084 ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/41850 uses `TYPE_CHECK_FAILURE_WITH_HINT`, it should be `DATATYPE_MISMATCH.TYPE_CHECK_FAILURE_WITH_HINT`.

[GitHub] [spark] cloud-fan closed pull request #42007: [SPARK-44431][SQL] Fix behavior of null IN (empty list) in optimization rules

2023-07-19 Thread via GitHub
cloud-fan closed pull request #42007: [SPARK-44431][SQL] Fix behavior of null IN (empty list) in optimization rules URL: https://github.com/apache/spark/pull/42007 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] cloud-fan commented on pull request #42007: [SPARK-44431][SQL] Fix behavior of null IN (empty list) in optimization rules

2023-07-19 Thread via GitHub
cloud-fan commented on PR #42007: URL: https://github.com/apache/spark/pull/42007#issuecomment-1643037992 thanks, merging to master/3.5! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] richardc-db opened a new pull request, #42083: Support deserializing long types when creating `Metadata` object from JObject

2023-07-19 Thread via GitHub
richardc-db opened a new pull request, #42083: URL: https://github.com/apache/spark/pull/42083 ### What changes were proposed in this pull request? Adds support to deserialize long types when creating `Metadata` objects from `JObject`s. ### Why are the changes needed?

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42037: [SPARK-44305][SQL] Dynamically choose whether to broadcast hadoop conf

2023-07-19 Thread via GitHub
HyukjinKwon commented on code in PR #42037: URL: https://github.com/apache/spark/pull/42037#discussion_r1268879200 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala: ## @@ -152,15 +153,25 @@ class OrcFileFormat assert(supportBat

[GitHub] [spark] panbingkun commented on a diff in pull request #41349: [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL`

2023-07-19 Thread via GitHub
panbingkun commented on code in PR #41349: URL: https://github.com/apache/spark/pull/41349#discussion_r1268872661 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala: ## @@ -89,10 +89,12 @@ class V2SessionCatalog(catalog: SessionCatalo

[GitHub] [spark] panbingkun commented on pull request #42082: [SPARK-43839][SQL][FOLLOWUP] Convert _LEGACY_ERROR_TEMP_1337 to UNSUPPORTED_FEATURE.TIME_TRAVEL

2023-07-19 Thread via GitHub
panbingkun commented on PR #42082: URL: https://github.com/apache/spark/pull/42082#issuecomment-1643026104 cc @cloud-fan @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [spark] panbingkun commented on a diff in pull request #42082: [SPARK-43839][SQL][FOLLOWUP] Convert _LEGACY_ERROR_TEMP_1337 to UNSUPPORTED_FEATURE.TIME_TRAVEL

2023-07-19 Thread via GitHub
panbingkun commented on code in PR #42082: URL: https://github.com/apache/spark/pull/42082#discussion_r1268872155 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala: ## @@ -88,19 +88,11 @@ class V2SessionCatalog(catalog: SessionCatalo

[GitHub] [spark] panbingkun opened a new pull request, #42082: [SPARK-43839][SQL][FOLLOWUP] Convert _LEGACY_ERROR_TEMP_1337 to UNSUPPORTED_FEATURE.TIME_TRAVEL

2023-07-19 Thread via GitHub
panbingkun opened a new pull request, #42082: URL: https://github.com/apache/spark/pull/42082 ### What changes were proposed in this pull request? - The pr is following up https://github.com/apache/spark/pull/41349. - The pr aims to simplify code logic after merge `_LEGACY_ERROR_TEMP_13

[GitHub] [spark] Hisoka-X commented on pull request #42081: [SPARK-44487][TEST] Fix KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir

2023-07-19 Thread via GitHub
Hisoka-X commented on PR #42081: URL: https://github.com/apache/spark/pull/42081#issuecomment-1643018942 > Is this related to the GA failure of the master? No, just bug I fonud when debug. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] LuciferYang commented on pull request #42081: [SPARK-44487][TEST] Fix KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir

2023-07-19 Thread via GitHub
LuciferYang commented on PR #42081: URL: https://github.com/apache/spark/pull/42081#issuecomment-1643016092 Is this related to the GA failure of the master? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] beliefer commented on a diff in pull request #41850: [SPARK-44292][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2315-2319]

2023-07-19 Thread via GitHub
beliefer commented on code in PR #41850: URL: https://github.com/apache/spark/pull/41850#discussion_r1268862800 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -277,13 +277,13 @@ trait CheckAnalysis extends PredicateHelper with L

[GitHub] [spark] Hisoka-X commented on pull request #41347: [SPARK-43838][SQL] Fix subquery on single table with having clause can't be optimized

2023-07-19 Thread via GitHub
Hisoka-X commented on PR #41347: URL: https://github.com/apache/spark/pull/41347#issuecomment-1643006858 Thanks @cloud-fan for your help and @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [spark] cloud-fan closed pull request #41347: [SPARK-43838][SQL] Fix subquery on single table with having clause can't be optimized

2023-07-19 Thread via GitHub
cloud-fan closed pull request #41347: [SPARK-43838][SQL] Fix subquery on single table with having clause can't be optimized URL: https://github.com/apache/spark/pull/41347 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [spark] cloud-fan commented on pull request #41347: [SPARK-43838][SQL] Fix subquery on single table with having clause can't be optimized

2023-07-19 Thread via GitHub
cloud-fan commented on PR #41347: URL: https://github.com/apache/spark/pull/41347#issuecomment-1643004942 The k8s failure is unrelated, I'm merging it to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] panbingkun commented on a diff in pull request #41349: [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL`

2023-07-19 Thread via GitHub
panbingkun commented on code in PR #41349: URL: https://github.com/apache/spark/pull/41349#discussion_r1268852208 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala: ## @@ -89,10 +89,12 @@ class V2SessionCatalog(catalog: SessionCatalo

[GitHub] [spark] Hisoka-X commented on pull request #42081: [SPARK-44487][TEST] Fix KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir

2023-07-19 Thread via GitHub
Hisoka-X commented on PR #42081: URL: https://github.com/apache/spark/pull/42081#issuecomment-1642997959 cc @Yikun @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] Hisoka-X opened a new pull request, #42081: [SPARK-44487][TEST] Fix KubernetesSuite report NPE when not set spark.kubernetes.test.unpackSparkDir

2023-07-19 Thread via GitHub
Hisoka-X opened a new pull request, #42081: URL: https://github.com/apache/spark/pull/42081 ### What changes were proposed in this pull request? Fix KubernetesSuite report NPE when not set `spark.kubernetes.test.unpackSparkDir` ```java Exception encountered when invoking

[GitHub] [spark] panbingkun commented on a diff in pull request #41349: [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL`

2023-07-19 Thread via GitHub
panbingkun commented on code in PR #41349: URL: https://github.com/apache/spark/pull/41349#discussion_r1268851681 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala: ## @@ -89,10 +89,12 @@ class V2SessionCatalog(catalog: SessionCatalo

[GitHub] [spark] itholic commented on a diff in pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-07-19 Thread via GitHub
itholic commented on code in PR #41711: URL: https://github.com/apache/spark/pull/41711#discussion_r1268843077 ## dev/error_message_refiner.py: ## @@ -0,0 +1,265 @@ +#!/usr/bin/env python3 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] [spark] panbingkun commented on a diff in pull request #41349: [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL`

2023-07-19 Thread via GitHub
panbingkun commented on code in PR #41349: URL: https://github.com/apache/spark/pull/41349#discussion_r1268843065 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala: ## @@ -89,10 +89,12 @@ class V2SessionCatalog(catalog: SessionCatalo

[GitHub] [spark] zhenlineo commented on a diff in pull request #42009: [SPARK-44422][CONNECT] Spark Connect fine grained interrupt

2023-07-19 Thread via GitHub
zhenlineo commented on code in PR #42009: URL: https://github.com/apache/spark/pull/42009#discussion_r1268824443 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SparkSessionE2ESuite.scala: ## @@ -96,5 +103,151 @@ class SparkSessionE2ESuite extends RemoteSpark

[GitHub] [spark] ericm-db opened a new pull request, #42080: Statestoresuite threadpool

2023-07-19 Thread via GitHub
ericm-db opened a new pull request, #42080: URL: https://github.com/apache/spark/pull/42080 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How wa

[GitHub] [spark] allisonwang-db commented on pull request #42075: [SPARK-43966][SQL][PYTHON] Support non-deterministic table-valued functions

2023-07-19 Thread via GitHub
allisonwang-db commented on PR #42075: URL: https://github.com/apache/spark/pull/42075#issuecomment-1642927436 cc @HyukjinKwon @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] github-actions[bot] commented on pull request #40608: [SPARK-35198][CONNECT][CORE][PYTHON][SQL] Add support for calling debugCodegen from Python & Java

2023-07-19 Thread via GitHub
github-actions[bot] commented on PR #40608: URL: https://github.com/apache/spark/pull/40608#issuecomment-1642927305 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #40728: [WIP][SPARK-39634][SQL] Allow file splitting in combination with row index generation

2023-07-19 Thread via GitHub
github-actions[bot] commented on PR #40728: URL: https://github.com/apache/spark/pull/40728#issuecomment-1642927268 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] xinrong-meng opened a new pull request, #42079: [WIP][SPARK-44486][PYTHON][CONNECT] Implement PyArrow `self_destruct` feature for `toPandas`

2023-07-19 Thread via GitHub
xinrong-meng opened a new pull request, #42079: URL: https://github.com/apache/spark/pull/42079 ### What changes were proposed in this pull request? Implement Arrow `self_destruct` of `toPandas` for memory savings. Now the Spark configuration `spark.sql.execution.arrow.pyspark.self

[GitHub] [spark] HyukjinKwon closed pull request #42072: [SPARK-44481][CONNECT][PYTHON] Make pyspark.sql.is_remote an API

2023-07-19 Thread via GitHub
HyukjinKwon closed pull request #42072: [SPARK-44481][CONNECT][PYTHON] Make pyspark.sql.is_remote an API URL: https://github.com/apache/spark/pull/42072 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on pull request #42072: [SPARK-44481][CONNECT][PYTHON] Make pyspark.sql.is_remote an API

2023-07-19 Thread via GitHub
HyukjinKwon commented on PR #42072: URL: https://github.com/apache/spark/pull/42072#issuecomment-1642913899 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #41831: [SPARK-44278][CONNECT] Implement a GRPC server interceptor that cleans up thread local properties

2023-07-19 Thread via GitHub
HyukjinKwon closed pull request #41831: [SPARK-44278][CONNECT] Implement a GRPC server interceptor that cleans up thread local properties URL: https://github.com/apache/spark/pull/41831 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [spark] HyukjinKwon commented on pull request #41831: [SPARK-44278][CONNECT] Implement a GRPC server interceptor that cleans up thread local properties

2023-07-19 Thread via GitHub
HyukjinKwon commented on PR #41831: URL: https://github.com/apache/spark/pull/41831#issuecomment-1642910764 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42072: [SPARK-44481][CONNECT][PYTHON] Make pyspark.sql.is_remote an API

2023-07-19 Thread via GitHub
HyukjinKwon commented on code in PR #42072: URL: https://github.com/apache/spark/pull/42072#discussion_r1268791343 ## python/pyspark/sql/__init__.py: ## @@ -72,4 +73,5 @@ "DataFrameWriter", "DataFrameWriterV2", "PandasCogroupedOps", +"is_remote", Review Comme

[GitHub] [spark] juliuszsompolski commented on a diff in pull request #42009: [SPARK-44422][CONNECT] Spark Connect fine grained interrupt

2023-07-19 Thread via GitHub
juliuszsompolski commented on code in PR #42009: URL: https://github.com/apache/spark/pull/42009#discussion_r1268751980 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala: ## @@ -40,6 +40,7 @@ private[sql] class SparkResult[T](

[GitHub] [spark] asl3 commented on a diff in pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-07-19 Thread via GitHub
asl3 commented on code in PR #41711: URL: https://github.com/apache/spark/pull/41711#discussion_r1268722533 ## dev/error_message_refiner.py: ## @@ -0,0 +1,265 @@ +#!/usr/bin/env python3 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor lic

[GitHub] [spark] asl3 commented on a diff in pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-07-19 Thread via GitHub
asl3 commented on code in PR #41711: URL: https://github.com/apache/spark/pull/41711#discussion_r1268722533 ## dev/error_message_refiner.py: ## @@ -0,0 +1,265 @@ +#!/usr/bin/env python3 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor lic

[GitHub] [spark] asl3 commented on a diff in pull request #41711: [SPARK-44155] Adding a dev utility to improve error messages based on LLM

2023-07-19 Thread via GitHub
asl3 commented on code in PR #41711: URL: https://github.com/apache/spark/pull/41711#discussion_r1268719990 ## dev/error_message_refiner.py: ## @@ -0,0 +1,265 @@ +#!/usr/bin/env python3 + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor lic

[GitHub] [spark] grundprinzip commented on a diff in pull request #42009: [SPARK-44422][CONNECT] Spark Connect fine grained interrupt

2023-07-19 Thread via GitHub
grundprinzip commented on code in PR #42009: URL: https://github.com/apache/spark/pull/42009#discussion_r1268715542 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala: ## @@ -40,6 +40,7 @@ private[sql] class SparkResult[T](

[GitHub] [spark] mathewjacob1002 opened a new pull request, #42078: [WIP][DO NOT REVIEW] Testing stuff

2023-07-19 Thread via GitHub
mathewjacob1002 opened a new pull request, #42078: URL: https://github.com/apache/spark/pull/42078 First commit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

[GitHub] [spark] juliuszsompolski commented on pull request #42009: [SPARK-44422][CONNECT] Spark Connect fine grained interrupt

2023-07-19 Thread via GitHub
juliuszsompolski commented on PR #42009: URL: https://github.com/apache/spark/pull/42009#issuecomment-1642777980 cc @zhenlineo @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] WweiL opened a new pull request, #42077: [SPARK-44484][SS]Add batchDuration to StreamingQueryProgress json method

2023-07-19 Thread via GitHub
WweiL opened a new pull request, #42077: URL: https://github.com/apache/spark/pull/42077 ### What changes were proposed in this pull request? Add the missing field batchDuration to StreamingQueryProgress json method. Also modify tests accordingly ### Why are the changes

[GitHub] [spark] ueshin commented on a diff in pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-19 Thread via GitHub
ueshin commented on code in PR #41948: URL: https://github.com/apache/spark/pull/41948#discussion_r1268593999 ## python/pyspark/sql/udtf.py: ## @@ -153,6 +175,19 @@ def _validate_udtf_handler(cls: Any) -> None: error_class="INVALID_UDTF_NO_EVAL", message_parameters=

[GitHub] [spark] hvanhovell opened a new pull request, #42076: [SPARK-44449][CONNECT] Upcasting for direct Arrow Deserialization

2023-07-19 Thread via GitHub
hvanhovell opened a new pull request, #42076: URL: https://github.com/apache/spark/pull/42076 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] dtenedor commented on a diff in pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-19 Thread via GitHub
dtenedor commented on code in PR #41948: URL: https://github.com/apache/spark/pull/41948#discussion_r1268552267 ## python/pyspark/sql/udtf.py: ## @@ -153,6 +175,19 @@ def _validate_udtf_handler(cls: Any) -> None: error_class="INVALID_UDTF_NO_EVAL", message_parameter

[GitHub] [spark] juliuszsompolski commented on a diff in pull request #42009: [SPARK-44422][CONNECT] Spark Connect fine grained interrupt

2023-07-19 Thread via GitHub
juliuszsompolski commented on code in PR #42009: URL: https://github.com/apache/spark/pull/42009#discussion_r1268542613 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -613,16 +613,30 @@ class SparkSession private[sql] ( /** *

[GitHub] [spark] ueshin commented on pull request #41948: [SPARK-44380][SQL][PYTHON] Support for Python UDTF to analyze in Python

2023-07-19 Thread via GitHub
ueshin commented on PR #41948: URL: https://github.com/apache/spark/pull/41948#issuecomment-1642586143 The failing test is [Spark on Kubernetes Integration test](https://github.com/ueshin/apache-spark/actions/runs/5596078914/jobs/10247102939#logs) that seems to be broken in `master` branch.

[GitHub] [spark] allisonwang-db opened a new pull request, #42075: [SPARK-43966][SQL][PYTHON] Support non-deterministic table-valued functions

2023-07-19 Thread via GitHub
allisonwang-db opened a new pull request, #42075: URL: https://github.com/apache/spark/pull/42075 ### What changes were proposed in this pull request? This PR supports non-deterministic table-valued functions. More specifically, it supports running non-deterministic Python UDT

[GitHub] [spark] siying opened a new pull request, #42074: [SPARK-44464][SS] Fix applyInPandasWithStatePythonRunner to output rows that have Null as first column value

2023-07-19 Thread via GitHub
siying opened a new pull request, #42074: URL: https://github.com/apache/spark/pull/42074 Change the serialization format for group-by-with-state outputs: include an explicit hidden column indicating how many data and state records there are. The current implementation of ApplyInPanda

[GitHub] [spark] siying commented on pull request #42046: [SPARK-44464][SS] Implement applyInPandasWithState in PySpark

2023-07-19 Thread via GitHub
siying commented on PR #42046: URL: https://github.com/apache/spark/pull/42046#issuecomment-1642557386 > @siying There was a conflict. Could you please create a PR against branch-3.4? Thanks in advance! > > (Btw, I didn't indicate that title is not accurate. Could you please fix the

[GitHub] [spark] siying commented on pull request #42046: [SPARK-44464][SS] Implement applyInPandasWithState in PySpark

2023-07-19 Thread via GitHub
siying commented on PR #42046: URL: https://github.com/apache/spark/pull/42046#issuecomment-1642555659 Sure. Will do that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [spark] ericm-db commented on pull request #42066: [SPARK-44480][SS] Use thread pool to perform maintenance activity for hdfs/rocksdb state store providers

2023-07-19 Thread via GitHub
ericm-db commented on PR #42066: URL: https://github.com/apache/spark/pull/42066#issuecomment-1642470993 > CI looks failing. Could you please look into it? https://github.com/ericm-db/spark/actions/runs/5595467996/jobs/10231376237 @HeartSaVioR looks like I was setting the runnable mai

[GitHub] [spark] amaliujia commented on a diff in pull request #41928: [SPARK-44475][SQL][CONNECT] Relocate DataType and Parser to sql/api

2023-07-19 Thread via GitHub
amaliujia commented on code in PR #41928: URL: https://github.com/apache/spark/pull/41928#discussion_r1268235385 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkAnalysisUtils.scala: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [spark] amaliujia commented on a diff in pull request #41928: [SPARK-44475][SQL][CONNECT] Relocate DataType and Parser to sql/api

2023-07-19 Thread via GitHub
amaliujia commented on code in PR #41928: URL: https://github.com/apache/spark/pull/41928#discussion_r1268233819 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala: ## @@ -16,34 +16,20 @@ */ package org.apache.spark.sql.catalyst.parser -im

[GitHub] [spark] cloud-fan commented on a diff in pull request #41349: [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL`

2023-07-19 Thread via GitHub
cloud-fan commented on code in PR #41349: URL: https://github.com/apache/spark/pull/41349#discussion_r1268198286 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala: ## @@ -89,10 +89,12 @@ class V2SessionCatalog(catalog: SessionCatalog

[GitHub] [spark] cloud-fan commented on a diff in pull request #41349: [SPARK-43839][SQL] Convert `_LEGACY_ERROR_TEMP_1337` to `UNSUPPORTED_FEATURE.TIME_TRAVEL`

2023-07-19 Thread via GitHub
cloud-fan commented on code in PR #41349: URL: https://github.com/apache/spark/pull/41349#discussion_r1268192638 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala: ## @@ -89,10 +89,12 @@ class V2SessionCatalog(catalog: SessionCatalog

[GitHub] [spark] cloud-fan commented on a diff in pull request #41850: [SPARK-44292][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2315-2319]

2023-07-19 Thread via GitHub
cloud-fan commented on code in PR #41850: URL: https://github.com/apache/spark/pull/41850#discussion_r1268138809 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -277,13 +277,13 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] hvanhovell closed pull request #42011: [SPARK-44396][Connect] Direct Arrow Deserialization

2023-07-19 Thread via GitHub
hvanhovell closed pull request #42011: [SPARK-44396][Connect] Direct Arrow Deserialization URL: https://github.com/apache/spark/pull/42011 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] hvanhovell commented on pull request #42011: [SPARK-44396][Connect] Direct Arrow Deserialization

2023-07-19 Thread via GitHub
hvanhovell commented on PR #42011: URL: https://github.com/apache/spark/pull/42011#issuecomment-1642080475 Merging this. Test failure is unrelated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] jchen5 commented on pull request #42007: [SPARK-44431][SQL] Fix behavior of null IN (empty list) in optimization rules

2023-07-19 Thread via GitHub
jchen5 commented on PR #42007: URL: https://github.com/apache/spark/pull/42007#issuecomment-1642057004 @cloud-fan PR is updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] heyihong commented on pull request #41831: [SPARK-44278][CONNECT] Implement a GRPC server interceptor that cleans up thread local properties

2023-07-19 Thread via GitHub
heyihong commented on PR #41831: URL: https://github.com/apache/spark/pull/41831#issuecomment-1642046358 > Should this interceptor be included by default (as the outermost interceptor)? I am not sure... But maybe we can do this in a separate pr if needed -- This is an automated mes

[GitHub] [spark] harupy commented on a diff in pull request #42072: [SPARK-44481][CONNECT][PYTHON] Make pyspark.sql.is_remote an API

2023-07-19 Thread via GitHub
harupy commented on code in PR #42072: URL: https://github.com/apache/spark/pull/42072#discussion_r1267980107 ## python/pyspark/sql/__init__.py: ## @@ -72,4 +73,5 @@ "DataFrameWriter", "DataFrameWriterV2", "PandasCogroupedOps", +"is_remote", Review Comment:

[GitHub] [spark] beliefer commented on a diff in pull request #41932: [SPARK-44131][SQL][PYTHON][CONNECT][FOLLOWUP] Support qualified function name for call_function

2023-07-19 Thread via GitHub
beliefer commented on code in PR #41932: URL: https://github.com/apache/spark/pull/41932#discussion_r1267967007 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -139,7 +139,18 @@ object SparkConnectServerUtil

[GitHub] [spark] LuciferYang commented on a diff in pull request #41932: [SPARK-44131][SQL][PYTHON][CONNECT][FOLLOWUP] Support qualified function name for call_function

2023-07-19 Thread via GitHub
LuciferYang commented on code in PR #41932: URL: https://github.com/apache/spark/pull/41932#discussion_r1267940632 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -139,7 +139,18 @@ object SparkConnectServerU

[GitHub] [spark] panbingkun commented on pull request #42073: [SPARK-44482][CONNECT] Connect server should can specify the bind address

2023-07-19 Thread via GitHub
panbingkun commented on PR #42073: URL: https://github.com/apache/spark/pull/42073#issuecomment-1641909732 The manual testing process is as follows: 1.My local env as follows: 172.xxx.xxx.xxx 2.Set the following configuration in `spark-defauls.conf` - spark.connect.grpc.bindin

[GitHub] [spark] panbingkun opened a new pull request, #42073: [SPARK-44482][CONNECT] Connect server should can specify the bind address

2023-07-19 Thread via GitHub
panbingkun opened a new pull request, #42073: URL: https://github.com/apache/spark/pull/42073 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch t

[GitHub] [spark] beliefer commented on a diff in pull request #41932: [SPARK-44131][SQL][PYTHON][CONNECT][FOLLOWUP] Support qualified function name for call_function

2023-07-19 Thread via GitHub
beliefer commented on code in PR #41932: URL: https://github.com/apache/spark/pull/41932#discussion_r1267892365 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -1161,6 +1161,27 @@ class ClientE2ETestSuite extends RemoteSparkSes

[GitHub] [spark] HyukjinKwon opened a new pull request, #42072: [SPARK-44481][CONNECT][PYTHON] Make pyspark.sql.is_remote an API

2023-07-19 Thread via GitHub
HyukjinKwon opened a new pull request, #42072: URL: https://github.com/apache/spark/pull/42072 ### What changes were proposed in this pull request? ### Why are the changes needed? For the end users to be able to do if-else, e.g., for dispatching the code path to the legacy mode

[GitHub] [spark] yaooqinn closed pull request #42054: [SPARK-44470][BUILD] Setting version to 4.0.0-SNAPSHOT

2023-07-19 Thread via GitHub
yaooqinn closed pull request #42054: [SPARK-44470][BUILD] Setting version to 4.0.0-SNAPSHOT URL: https://github.com/apache/spark/pull/42054 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] yaooqinn commented on pull request #42054: [SPARK-44470][BUILD] Setting version to 4.0.0-SNAPSHOT

2023-07-19 Thread via GitHub
yaooqinn commented on PR #42054: URL: https://github.com/apache/spark/pull/42054#issuecomment-1641794593 CLOSE as duplicated and fixed by SPARK-44467 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] Deependra-Patel opened a new pull request, #42071: [SPARK-44209] Expose amount of shuffle data available on the node

2023-07-19 Thread via GitHub
Deependra-Patel opened a new pull request, #42071: URL: https://github.com/apache/spark/pull/42071 This will be available as external shuffle service metric ### What changes were proposed in this pull request? Adding three more metrics to ShuffleMetrics (exposed by External Shuffle

[GitHub] [spark] HyukjinKwon closed pull request #42070: [MINOR][INFRA] Update the labeler for CORE and CONNECT

2023-07-19 Thread via GitHub
HyukjinKwon closed pull request #42070: [MINOR][INFRA] Update the labeler for CORE and CONNECT URL: https://github.com/apache/spark/pull/42070 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] HyukjinKwon commented on pull request #42070: [MINOR][INFRA] Update the labeler for CORE and CONNECT

2023-07-19 Thread via GitHub
HyukjinKwon commented on PR #42070: URL: https://github.com/apache/spark/pull/42070#issuecomment-1641707945 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on pull request #42068: [SPARK-44361][SQL][FOLLOW-UP] Remove unused variables and fix import statements

2023-07-19 Thread via GitHub
HyukjinKwon commented on PR #42068: URL: https://github.com/apache/spark/pull/42068#issuecomment-1641706328 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

  1   2   >