Re: [PR] [SPARK-41952][SQL] Fix Parquet zstd off-heap memory leak as a workaround for PARQUET-2160 [spark]

2023-10-30 Thread via GitHub
pan3793 commented on PR #40091: URL: https://github.com/apache/spark/pull/40091#issuecomment-1786555127 @yujhe Oops, I totally forgot this one... Your analysis makes sense, I took a look at the parquet non-vectorized reading code path, injecting such a workaround does not clean as we

Re: [PR] [SPARK-38723][SS][TEST][FOLLOWUP] Deflake the newly added test in QueryExecutionErrorsSuite [spark]

2023-10-30 Thread via GitHub
MaxGekk closed pull request #43565: [SPARK-38723][SS][TEST][FOLLOWUP] Deflake the newly added test in QueryExecutionErrorsSuite URL: https://github.com/apache/spark/pull/43565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-41952][SQL] Fix Parquet zstd off-heap memory leak as a workaround for PARQUET-2160 [spark]

2023-10-30 Thread via GitHub
yujhe commented on PR #40091: URL: https://github.com/apache/spark/pull/40091#issuecomment-1786538476 We found that this happens if we are reading Parquet file with nested columns in schema. ```scala val path = "/tmp/parquet_zstd" (1 to 100).map(i => (i, Seq(i))) .toDF("id"

Re: [PR] [SPARK-38723][SS][TEST][FOLLOWUP] Deflake the newly added test in QueryExecutionErrorsSuite [spark]

2023-10-30 Thread via GitHub
MaxGekk commented on PR #43565: URL: https://github.com/apache/spark/pull/43565#issuecomment-1786538045 +1, LGTM. Merging to master. Thank you, @WweiL and @HeartSaVioR for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-45654][PYTHON] Add Python data source write API [spark]

2023-10-30 Thread via GitHub
HyukjinKwon commented on PR #43516: URL: https://github.com/apache/spark/pull/43516#issuecomment-1786536936 @allisonwang-db mind resolving conflicts please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-45735][PYTHON][CONNECT][TESTS] Reenable CatalogTests without Spark Connect [spark]

2023-10-30 Thread via GitHub
HyukjinKwon commented on PR #43595: URL: https://github.com/apache/spark/pull/43595#issuecomment-1786513729 Test results: https://github.com/HyukjinKwon/spark/actions/runs/6701783863/job/18209761204 -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] [SPARK-33393][SQL] Support SHOW TABLE EXTENDED in v2 [spark]

2023-10-30 Thread via GitHub
panbingkun commented on code in PR #37588: URL: https://github.com/apache/spark/pull/37588#discussion_r1377094224 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -2483,7 +2477,7 @@ private[sql] object QueryCompilationErrors extends

Re: [PR] [SPARK-33393][SQL] Support SHOW TABLE EXTENDED in v2 [spark]

2023-10-30 Thread via GitHub
beliefer commented on code in PR #37588: URL: https://github.com/apache/spark/pull/37588#discussion_r1377086123 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -2483,7 +2477,7 @@ private[sql] object QueryCompilationErrors extends Q

Re: [PR] [SPARK-45352][SQL] Eliminate foldable window partitions [spark]

2023-10-30 Thread via GitHub
beliefer commented on code in PR #43144: URL: https://github.com/apache/spark/pull/43144#discussion_r1377081807 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1241,6 +1242,24 @@ object OptimizeRepartition extends Rule[LogicalPlan]

Re: [PR] [SPARK-45713][PYTHON] Support registering Python data sources [spark]

2023-10-30 Thread via GitHub
HyukjinKwon commented on PR #43566: URL: https://github.com/apache/spark/pull/43566#issuecomment-1786475155 Hey, I don't think we should go ahead. It should have a lookup logic first, and then we should think about something extra runtime registration -- This is an automated message from

Re: [PR] [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status [spark]

2023-10-30 Thread via GitHub
aokolnychyi commented on PR #36564: URL: https://github.com/apache/spark/pull/36564#issuecomment-1786467568 Thanks for confirming, @cloud-fan! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45713][PYTHON] Support registering Python data sources [spark]

2023-10-30 Thread via GitHub
cloud-fan closed pull request #43566: [SPARK-45713][PYTHON] Support registering Python data sources URL: https://github.com/apache/spark/pull/43566 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-45511][SS] State Data Source - Reader [spark]

2023-10-30 Thread via GitHub
HeartSaVioR commented on code in PR #43425: URL: https://github.com/apache/spark/pull/43425#discussion_r1377059613 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala: ## @@ -0,0 +1,670 @@ +/* + * Licensed to the Apache S

Re: [PR] [SPARK-45713][PYTHON] Support registering Python data sources [spark]

2023-10-30 Thread via GitHub
cloud-fan commented on code in PR #43566: URL: https://github.com/apache/spark/pull/43566#discussion_r1377059437 ## sql/core/src/main/scala/org/apache/spark/sql/DataSourceRegistration.scala: ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] [SPARK-45713][PYTHON] Support registering Python data sources [spark]

2023-10-30 Thread via GitHub
cloud-fan commented on PR #43566: URL: https://github.com/apache/spark/pull/43566#issuecomment-1786454246 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-45368][SQL] Remove scala2.12 compatibility logic for DoubleType, FloatType, Decimal [spark]

2023-10-30 Thread via GitHub
laglangyue commented on PR #43456: URL: https://github.com/apache/spark/pull/43456#issuecomment-1786451271 @srowen @amaliujia @HyukjinKwon PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45713][PYTHON] Support registering Python data sources [spark]

2023-10-30 Thread via GitHub
cloud-fan commented on code in PR #43566: URL: https://github.com/apache/spark/pull/43566#discussion_r1377057267 ## sql/core/src/main/scala/org/apache/spark/sql/DataSourceRegistration.scala: ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] [SPARK-45511][SS] State Data Source - Reader [spark]

2023-10-30 Thread via GitHub
HeartSaVioR commented on code in PR #43425: URL: https://github.com/apache/spark/pull/43425#discussion_r1377056253 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSource.scala: ## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software F

Re: [PR] [SPARK-45511][SS] State Data Source - Reader [spark]

2023-10-30 Thread via GitHub
HeartSaVioR commented on code in PR #43425: URL: https://github.com/apache/spark/pull/43425#discussion_r1377043635 ## sql/core/src/main/scala/org/apache/spark/sql/RuntimeConfig.scala: ## @@ -31,7 +31,7 @@ import org.apache.spark.sql.internal.SQLConf * @since 2.0.0 */ @Stabl

Re: [PR] [SPARK-45713][PYTHON] Support registering Python data sources [spark]

2023-10-30 Thread via GitHub
allisonwang-db commented on PR #43566: URL: https://github.com/apache/spark/pull/43566#issuecomment-1786437928 @HyukjinKwon @cloud-fan Thanks for the review! I've addressed the comments. Note this PR does not include the data source lookup logic. I will work on it in the next PR ([SPARK-45

Re: [PR] [SPARK-45713][PYTHON] Support registering Python data sources [spark]

2023-10-30 Thread via GitHub
allisonwang-db commented on code in PR #43566: URL: https://github.com/apache/spark/pull/43566#discussion_r1377045722 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceRegistry.scala: ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] [SPARK-45730][CORE] Make ReloadingX509TrustManagerSuite less flaky [spark]

2023-10-30 Thread via GitHub
hasnain-db commented on PR #43596: URL: https://github.com/apache/spark/pull/43596#issuecomment-1786430142 cc @mridulm this flakiness is mostly in our internal environment (haven't seen it too often in the sbt build) but I figured it makes sense to put here in case things change in the futu

[PR] [SPARK-45730][CORE] Make ReloadingX509TrustManagerSuite less flaky [spark]

2023-10-30 Thread via GitHub
hasnain-db opened a new pull request, #43596: URL: https://github.com/apache/spark/pull/43596 ### What changes were proposed in this pull request? Improve a few timing related constraints: * Wait 10s instead of 5 for a reload to happen when under high load. This should not dela

Re: [PR] [SPARK-38723][SS][TEST][FOLLOWUP] Deflake the newly added test in QueryExecutionErrorsSuite [spark]

2023-10-30 Thread via GitHub
HeartSaVioR commented on PR #43565: URL: https://github.com/apache/spark/pull/43565#issuecomment-1786412133 @MaxGekk Please either merge this or let me know once you're OK with the change. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] [SPARK-38723][SS][TEST][FOLLOWUP] Deflake the newly added test in QueryExecutionErrorsSuite [spark]

2023-10-30 Thread via GitHub
HeartSaVioR commented on PR #43565: URL: https://github.com/apache/spark/pull/43565#issuecomment-1786393654 Looks like the test failure in CI is irrelevant. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-45735][PYTHON][CONNECT][TESTS] Reenable CatalogTests without Spark Connect [spark]

2023-10-30 Thread via GitHub
HyukjinKwon commented on PR #43595: URL: https://github.com/apache/spark/pull/43595#issuecomment-1786393151 Has to be merged down to branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-45242][SQL][FOLLOWUP] Do not canonicalize DataFrame ID in CollectMetrics [spark]

2023-10-30 Thread via GitHub
amaliujia closed pull request #43594: [SPARK-45242][SQL][FOLLOWUP] Do not canonicalize DataFrame ID in CollectMetrics URL: https://github.com/apache/spark/pull/43594 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-45242][SQL][FOLLOWUP] Do not canonicalize DataFrame ID in CollectMetrics [spark]

2023-10-30 Thread via GitHub
amaliujia commented on PR #43594: URL: https://github.com/apache/spark/pull/43594#issuecomment-1786387313 @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[PR] [SPARK-45242][SQL][FOLLOWUP] Do not canonicalize DataFrame ID in CollectMetrics [spark]

2023-10-30 Thread via GitHub
amaliujia opened a new pull request, #43594: URL: https://github.com/apache/spark/pull/43594 ### What changes were proposed in this pull request? We should also not canonicalize the new DataFrame id field to avoid downstream plan comparison failures. ### Why are the cha

Re: [PR] [SPARK-45352][SQL] Eliminate foldable window partitions [spark]

2023-10-30 Thread via GitHub
zml1206 commented on code in PR #43144: URL: https://github.com/apache/spark/pull/43144#discussion_r1377013427 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1241,6 +1242,24 @@ object OptimizeRepartition extends Rule[LogicalPlan] {

Re: [PR] [SPARK-45729][PYTHON][DOCS] Fix PySpark testing guide links [spark]

2023-10-30 Thread via GitHub
HyukjinKwon closed pull request #43587: [SPARK-45729][PYTHON][DOCS] Fix PySpark testing guide links URL: https://github.com/apache/spark/pull/43587 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-45729][PYTHON][DOCS] Fix PySpark testing guide links [spark]

2023-10-30 Thread via GitHub
HyukjinKwon commented on PR #43587: URL: https://github.com/apache/spark/pull/43587#issuecomment-1786386386 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-45727][SS] Remove unused map in watermark propagation simulation [spark]

2023-10-30 Thread via GitHub
HeartSaVioR closed pull request #43588: [SPARK-45727][SS] Remove unused map in watermark propagation simulation URL: https://github.com/apache/spark/pull/43588 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-45727][SS] Remove unused map in watermark propagation simulation [spark]

2023-10-30 Thread via GitHub
HeartSaVioR commented on PR #43588: URL: https://github.com/apache/spark/pull/43588#issuecomment-1786382261 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45704][BUILD] Fix `legacy-binding` [spark]

2023-10-30 Thread via GitHub
panbingkun commented on PR #43593: URL: https://github.com/apache/spark/pull/43593#issuecomment-1786381598 > Let's remove `"-Wconf:msg=legacy-binding:s"` from SparkBuild and pom.xml Done. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-45704][BUILD] Fix `legacy-binding` [spark]

2023-10-30 Thread via GitHub
LuciferYang commented on PR #43593: URL: https://github.com/apache/spark/pull/43593#issuecomment-1786379713 Let's remove `"-Wconf:msg=legacy-binding:s"` from SparkBuild and pom.xml -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[PR] [SPARK-45704][BUILD] Fix `legacy-binding` [spark]

2023-10-30 Thread via GitHub
panbingkun opened a new pull request, #43593: URL: https://github.com/apache/spark/pull/43593 ### What changes were proposed in this pull request? The pr aims to fix `legacy-binding`, message as follows: ``` [error] /Users/panbingkun/Developer/spark/spark-community/sql/catalyst/src/

Re: [PR] [SPARK-45592][SQL] Correctness issue in AQE with InMemoryTableScanExec [spark]

2023-10-30 Thread via GitHub
cloud-fan closed pull request #43435: [SPARK-45592][SQL] Correctness issue in AQE with InMemoryTableScanExec URL: https://github.com/apache/spark/pull/43435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-45592][SQL] Correctness issue in AQE with InMemoryTableScanExec [spark]

2023-10-30 Thread via GitHub
cloud-fan commented on PR #43435: URL: https://github.com/apache/spark/pull/43435#issuecomment-1786374844 The failed streaming test is unrelated, and my last comment is quite minor, let's merge it first to fix the correctness bug. Thanks for you great work! -- This is an automated message

Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

2023-10-30 Thread via GitHub
LuciferYang commented on code in PR #43502: URL: https://github.com/apache/spark/pull/43502#discussion_r1377001120 ## common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java: ## @@ -182,23 +189,21 @@ public boolean skip(long n) { @Override public

Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

2023-10-30 Thread via GitHub
LuciferYang commented on code in PR #43502: URL: https://github.com/apache/spark/pull/43502#discussion_r1377001120 ## common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java: ## @@ -182,23 +189,21 @@ public boolean skip(long n) { @Override public

Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

2023-10-30 Thread via GitHub
LuciferYang commented on code in PR #43502: URL: https://github.com/apache/spark/pull/43502#discussion_r1377001116 ## common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java: ## @@ -182,23 +189,21 @@ public boolean skip(long n) { @Override public

Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

2023-10-30 Thread via GitHub
mridulm commented on code in PR #43502: URL: https://github.com/apache/spark/pull/43502#discussion_r1376996077 ## common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java: ## @@ -182,23 +189,21 @@ public boolean skip(long n) { @Override public sync

Re: [PR] [SPARK-45725][SQL] Remove the non-default IN subquery runtime filter [spark]

2023-10-30 Thread via GitHub
beliefer commented on code in PR #43585: URL: https://github.com/apache/spark/pull/43585#discussion_r1376993199 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala: ## @@ -26,47 +26,27 @@ import org.apache.spark.sql.catalyst.plans.log

[PR] [BUILD] Test Commons io 2.15.0 [spark]

2023-10-30 Thread via GitHub
LuciferYang opened a new pull request, #43592: URL: https://github.com/apache/spark/pull/43592 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-45327][BUILD] Upgrade zstd-jni to 1.5.5-7 [spark]

2023-10-30 Thread via GitHub
panbingkun commented on PR #43113: URL: https://github.com/apache/spark/pull/43113#issuecomment-1786322746 > 1.5.5-7 released Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-33393][SQL] Support SHOW TABLE EXTENDED in v2 [spark]

2023-10-30 Thread via GitHub
panbingkun commented on code in PR #37588: URL: https://github.com/apache/spark/pull/37588#discussion_r1376969387 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -2483,7 +2477,7 @@ private[sql] object QueryCompilationErrors extends

Re: [PR] [SPARK-45733][CONNECT][PYTHON] Support multiple retry policies [spark]

2023-10-30 Thread via GitHub
cdkrot commented on PR #43591: URL: https://github.com/apache/spark/pull/43591#issuecomment-1786303204 cc @HyukjinKwon, @juliuszsompolski, @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [SPARK-45733][CONNECT][PYTHON] Support multiple retry policies [spark]

2023-10-30 Thread via GitHub
cdkrot opened a new pull request, #43591: URL: https://github.com/apache/spark/pull/43591 ### What changes were proposed in this pull request? Support multiple retry policies defined at the same time. Each policy determines which error types it can retry and how exactly those should b

Re: [PR] [SPARK-45511][SS] State Data Source - Reader [spark]

2023-10-30 Thread via GitHub
anishshri-db commented on code in PR #43425: URL: https://github.com/apache/spark/pull/43425#discussion_r1376953649 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala: ## @@ -0,0 +1,670 @@ +/* + * Licensed to the Apache

Re: [PR] [SPARK-45511][SS] State Data Source - Reader [spark]

2023-10-30 Thread via GitHub
anishshri-db commented on code in PR #43425: URL: https://github.com/apache/spark/pull/43425#discussion_r1376952982 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala: ## @@ -0,0 +1,670 @@ +/* + * Licensed to the Apache

Re: [PR] [SPARK-45511][SS] State Data Source - Reader [spark]

2023-10-30 Thread via GitHub
anishshri-db commented on code in PR #43425: URL: https://github.com/apache/spark/pull/43425#discussion_r1376951304 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala: ## @@ -0,0 +1,670 @@ +/* + * Licensed to the Apache

Re: [PR] [SPARK-45726][CONNECT] Make Dataset.collectResult private [spark]

2023-10-30 Thread via GitHub
dongjoon-hyun commented on PR #43586: URL: https://github.com/apache/spark/pull/43586#issuecomment-1786284105 Got it, if you insist, @hvanhovell . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-45728][BUILD][K8S] Upgrade `kubernetes-client` to 6.9.1 [spark]

2023-10-30 Thread via GitHub
dongjoon-hyun closed pull request #43589: [SPARK-45728][BUILD][K8S] Upgrade `kubernetes-client` to 6.9.1 URL: https://github.com/apache/spark/pull/43589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-45728][BUILD][K8S] Upgrade `kubernetes-client` to 6.9.1 [spark]

2023-10-30 Thread via GitHub
dongjoon-hyun commented on PR #43589: URL: https://github.com/apache/spark/pull/43589#issuecomment-1786282659 Thank you, @HyukjinKwon , @LuciferYang , @bjornjorgensen . Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-45511][SS] State Data Source - Reader [spark]

2023-10-30 Thread via GitHub
anishshri-db commented on code in PR #43425: URL: https://github.com/apache/spark/pull/43425#discussion_r1376943396 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/StateTable.scala: ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] [SPARK-45511][SS] State Data Source - Reader [spark]

2023-10-30 Thread via GitHub
anishshri-db commented on code in PR #43425: URL: https://github.com/apache/spark/pull/43425#discussion_r1376943187 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/StateTable.scala: ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] [SPARK-45511][SS] State Data Source - Reader [spark]

2023-10-30 Thread via GitHub
anishshri-db commented on code in PR #43425: URL: https://github.com/apache/spark/pull/43425#discussion_r1376940262 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSource.scala: ## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software

[PR] [SPARK-45732][BUILD] Upgrade commons-text to 1.11.0 [spark]

2023-10-30 Thread via GitHub
panbingkun opened a new pull request, #43590: URL: https://github.com/apache/spark/pull/43590 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-45592][SQL] Correctness issue in AQE with InMemoryTableScanExec [spark]

2023-10-30 Thread via GitHub
maryannxue commented on PR #43435: URL: https://github.com/apache/spark/pull/43435#issuecomment-1786266047 Synced with @cloud-fan offline, (2) in the above suggestion wouldn't work. Let's go ahead with current fix. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status [spark]

2023-10-30 Thread via GitHub
cloud-fan commented on PR #36564: URL: https://github.com/apache/spark/pull/36564#issuecomment-1786265601 @aokolnychyi In most cases, yes. However, we have `BatchWrite#onDataWriterCommit`, so implementations can decide how to deal with conflicting commit messages, maybe first-win. -- Thi

Re: [PR] [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status [spark]

2023-10-30 Thread via GitHub
aokolnychyi commented on PR #36564: URL: https://github.com/apache/spark/pull/36564#issuecomment-1786251500 @huaxingao @cloud-fan, could you confirm only a single `WriterCommitMessage` will be passed in case of speculative execution even without the commit coordinator? Based on what I see i

Re: [PR] [SPARK-44473] Overwriting the same partition of a partitioned table multiple times with empty data yields non-idempotent results [spark]

2023-10-30 Thread via GitHub
github-actions[bot] closed pull request #42103: [SPARK-44473] Overwriting the same partition of a partitioned table multiple times with empty data yields non-idempotent results URL: https://github.com/apache/spark/pull/42103 -- This is an automated message from the Apache Git Service. To res

Re: [PR] [SPARK-44301][SQL] Add Benchmark Suite for TPCH [spark]

2023-10-30 Thread via GitHub
github-actions[bot] closed pull request #41856: [SPARK-44301][SQL] Add Benchmark Suite for TPCH URL: https://github.com/apache/spark/pull/41856 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-45680][CONNECT] Release session [spark]

2023-10-30 Thread via GitHub
HyukjinKwon commented on PR #43546: URL: https://github.com/apache/spark/pull/43546#issuecomment-1786230748 I think we might need to retrigger ... https://github.com/juliuszsompolski/apache-spark/actions/runs/6695859696/job/18192423072 -- This is an automated message from the Apache Git S

Re: [PR] [SPARK-45680][CONNECT] Release session [spark]

2023-10-30 Thread via GitHub
hvanhovell commented on PR #43546: URL: https://github.com/apache/spark/pull/43546#issuecomment-1786225354 @HyukjinKwon is this the same python test issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-45726][CONNECT] Make Dataset.collectResult private [spark]

2023-10-30 Thread via GitHub
HyukjinKwon commented on PR #43586: URL: https://github.com/apache/spark/pull/43586#issuecomment-1786224778 Yeah, let's keep it then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45726][CONNECT] Make Dataset.collectResult private [spark]

2023-10-30 Thread via GitHub
hvanhovell commented on PR #43586: URL: https://github.com/apache/spark/pull/43586#issuecomment-1786223772 I'd like to keep this in since it is a more flexible way of working with results. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-45245][PYTHON][CONNECT] PythonWorkerFactory: Timeout if worker does not connect back. [spark]

2023-10-30 Thread via GitHub
rangadi commented on PR #43023: URL: https://github.com/apache/spark/pull/43023#issuecomment-1786222604 > I think we should switch this to use the daemonized worker instead of simple workers soon. I see. I have been looking for reasons doing so. This seems to be one of them. -- T

Re: [PR] [SPARK-45726][CONNECT] Make Dataset.collectResult private [spark]

2023-10-30 Thread via GitHub
HyukjinKwon commented on PR #43586: URL: https://github.com/apache/spark/pull/43586#issuecomment-1786220537 Agree ^ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] [SPARK-45245][PYTHON][CONNECT] PythonWorkerFactory: Timeout if worker does not connect back. [spark]

2023-10-30 Thread via GitHub
HyukjinKwon closed pull request #43023: [SPARK-45245][PYTHON][CONNECT] PythonWorkerFactory: Timeout if worker does not connect back. URL: https://github.com/apache/spark/pull/43023 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-45245][PYTHON][CONNECT] PythonWorkerFactory: Timeout if worker does not connect back. [spark]

2023-10-30 Thread via GitHub
HyukjinKwon commented on PR #43023: URL: https://github.com/apache/spark/pull/43023#issuecomment-1786219936 Yeah I am merging it to master but I think we should switch this to use the daemonized worker instead of simple workers soon. Merged to master. -- This is an automated messag

Re: [PR] [SPARK-45506][CONNECT] Add ivy URI support to SparkConnect addArtifact [spark]

2023-10-30 Thread via GitHub
HyukjinKwon closed pull request #43354: [SPARK-45506][CONNECT] Add ivy URI support to SparkConnect addArtifact URL: https://github.com/apache/spark/pull/43354 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-45506][CONNECT] Add ivy URI support to SparkConnect addArtifact [spark]

2023-10-30 Thread via GitHub
HyukjinKwon commented on PR #43354: URL: https://github.com/apache/spark/pull/43354#issuecomment-1786218198 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-45506][CONNECT] Add ivy URI support to SparkConnect addArtifact [spark]

2023-10-30 Thread via GitHub
HyukjinKwon commented on PR #43354: URL: https://github.com/apache/spark/pull/43354#issuecomment-1786218108 Yeah, the test failure at https://github.com/vsevolodstep-db/spark/actions/runs/6691537121/job/18179083580 seems unrelated. -- This is an automated message from the Apache Git Serv

Re: [PR] [SPARK-45729][PYTHON][DOCS] Fix PySpark testing guide links [spark]

2023-10-30 Thread via GitHub
allisonwang-db commented on code in PR #43587: URL: https://github.com/apache/spark/pull/43587#discussion_r1376835836 ## python/docs/source/getting_started/testing_pyspark.ipynb: ## @@ -193,7 +193,7 @@ "### Option 2: Using [Unit Test](https://docs.python.org/3/library/unit

Re: [PR] [SPARK-45729][PYTHON][DOCS] Fix PySpark testing guide links [spark]

2023-10-30 Thread via GitHub
allisonwang-db commented on code in PR #43587: URL: https://github.com/apache/spark/pull/43587#discussion_r1376834363 ## python/docs/source/getting_started/testing_pyspark.ipynb: ## @@ -273,7 +273,7 @@ "source": [ "### Option 3: Using [Pytest](https://docs.pytest.org/e

Re: [PR] [SPARK-45728][BUILD][K8S] Upgrade `kubernetes-client` to 6.9.1 [spark]

2023-10-30 Thread via GitHub
dongjoon-hyun commented on PR #43589: URL: https://github.com/apache/spark/pull/43589#issuecomment-1786026266 Could you review once more, @bjornjorgensen ? Although the test suite passed with and without the last commit, the last commit is the intended one as you commented. I verified manua

Re: [PR] [SPARK-45728][BUILD][K8S] Upgrade `kubernetes-client` to 6.9.1 [spark]

2023-10-30 Thread via GitHub
dongjoon-hyun commented on code in PR #43589: URL: https://github.com/apache/spark/pull/43589#discussion_r1376787495 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala: ## @@ -95,7 +95,7 @@ class MountVolum

Re: [PR] [SPARK-45431][DOCS] Document new SSL RPC feature [spark]

2023-10-30 Thread via GitHub
hasnain-db commented on code in PR #43240: URL: https://github.com/apache/spark/pull/43240#discussion_r1376768939 ## docs/security.md: ## @@ -563,7 +604,52 @@ replaced with one of the above namespaces. ${ns}.trustStoreType JKS -The type of the trust store. +

Re: [PR] [SPARK-45728][BUILD][K8S] Upgrade `kubernetes-client` to 6.9.1 [spark]

2023-10-30 Thread via GitHub
bjornjorgensen commented on code in PR #43589: URL: https://github.com/apache/spark/pull/43589#discussion_r1376764990 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala: ## @@ -95,7 +95,7 @@ class MountVolu

Re: [PR] [SPARK-38723][SS][TEST][FOLLOWUP] Deflake the newly added test in QueryExecutionErrorsSuite [spark]

2023-10-30 Thread via GitHub
MaxGekk commented on code in PR #43565: URL: https://github.com/apache/spark/pull/43565#discussion_r1376734655 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala: ## @@ -910,13 +912,15 @@ class QueryExecutionErrorsSuite }

Re: [PR] [SPARK-45506][CONNECT] Add ivy URI support to SparkConnect addArtifact [spark]

2023-10-30 Thread via GitHub
hvanhovell closed pull request #43354: [SPARK-45506][CONNECT] Add ivy URI support to SparkConnect addArtifact URL: https://github.com/apache/spark/pull/43354 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-45506][CONNECT] Add ivy URI support to SparkConnect addArtifact [spark]

2023-10-30 Thread via GitHub
hvanhovell commented on PR #43354: URL: https://github.com/apache/spark/pull/43354#issuecomment-1785922075 @HyukjinKwon are the current python failures a known problem? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-45728][BUILD][K8S] Upgrade `kubernetes-client` to 6.9.1 [spark]

2023-10-30 Thread via GitHub
dongjoon-hyun commented on code in PR #43589: URL: https://github.com/apache/spark/pull/43589#discussion_r1376687923 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala: ## @@ -95,7 +95,7 @@ class MountVolum

Re: [PR] [SPARK-45728][BUILD][K8S] Upgrade `kubernetes-client` to 6.9.1 [spark]

2023-10-30 Thread via GitHub
dongjoon-hyun commented on code in PR #43589: URL: https://github.com/apache/spark/pull/43589#discussion_r1376687923 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala: ## @@ -95,7 +95,7 @@ class MountVolum

[PR] [SPARK-45728][BUILD][K8S] Upgrade `kubernetes-client` to 6.9.1 [spark]

2023-10-30 Thread via GitHub
dongjoon-hyun opened a new pull request, #43589: URL: https://github.com/apache/spark/pull/43589 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-45727][SS] Remove unused map in watermark propagation simulation [spark]

2023-10-30 Thread via GitHub
anishshri-db commented on PR #43588: URL: https://github.com/apache/spark/pull/43588#issuecomment-1785854466 @HeartSaVioR - PTAL, thx ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-45713][PYTHON] Support registering Python data sources [spark]

2023-10-30 Thread via GitHub
allisonwang-db commented on code in PR #43566: URL: https://github.com/apache/spark/pull/43566#discussion_r1376679807 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceRegistry.scala: ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Found

[PR] [SPARK-45727] Remove unused map in watermark propagation simulation [spark]

2023-10-30 Thread via GitHub
anishshri-db opened a new pull request, #43588: URL: https://github.com/apache/spark/pull/43588 ### What changes were proposed in this pull request? Remove unused map in watermark propagation simulation ### Why are the changes needed? Remove use of redundant/unused map

Re: [PR] [SPARK-45481][SQL][FOLLOWUP] Add `lowerCaseName` for `ParquetCompressionCodec`. [spark]

2023-10-30 Thread via GitHub
dongjoon-hyun commented on PR #43571: URL: https://github.com/apache/spark/pull/43571#issuecomment-1785818454 Merged to master. Thank you, @beliefer . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-45481][SQL][FOLLOWUP] Add `lowerCaseName` for `ParquetCompressionCodec`. [spark]

2023-10-30 Thread via GitHub
dongjoon-hyun closed pull request #43571: [SPARK-45481][SQL][FOLLOWUP] Add `lowerCaseName` for `ParquetCompressionCodec`. URL: https://github.com/apache/spark/pull/43571 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-45718][PS] Remove remaining deprecated Pandas features from Spark 3.4.0 [spark]

2023-10-30 Thread via GitHub
dongjoon-hyun commented on PR #43581: URL: https://github.com/apache/spark/pull/43581#issuecomment-1785814391 Please re-trigger the CI, @itholic . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

2023-10-30 Thread via GitHub
PhilDakin commented on PR #43369: URL: https://github.com/apache/spark/pull/43369#issuecomment-1785785637 @allisonwang-db added full-page screenshot to description and rebased onto master. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

2023-10-30 Thread via GitHub
PhilDakin commented on code in PR #43369: URL: https://github.com/apache/spark/pull/43369#discussion_r1376630344 ## python/docs/source/user_guide/sql/type_conversions.rst: ## @@ -0,0 +1,249 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contribut

Re: [PR] [SPARK-45022][SQL] Provide context for dataset API errors [spark]

2023-10-30 Thread via GitHub
MaxGekk commented on PR #43334: URL: https://github.com/apache/spark/pull/43334#issuecomment-1785771481 > If not, we probably should disable this feature for spark connect for now since customers may get confused if they see contexts for dataset API errors (likely in the spark connect plan

Re: [PR] [SPARK-45726][CONNECT] Make Dataset.collectResult private [spark]

2023-10-30 Thread via GitHub
heyihong commented on PR #43586: URL: https://github.com/apache/spark/pull/43586#issuecomment-1785765470 > @heyihong the cat is already out of bag. I don't want to put it back in. I don't have a strong opinion on either but exposing collectResult but not exposing the return type [Spa

Re: [PR] [SPARK-45726][CONNECT] Make Dataset.collectResult private [spark]

2023-10-30 Thread via GitHub
hvanhovell commented on PR #43586: URL: https://github.com/apache/spark/pull/43586#issuecomment-1785736240 @heyihong the cat is already out of bag. I don't want to put it back in. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[PR] Fix PySpark testing guide links [spark]

2023-10-30 Thread via GitHub
asl3 opened a new pull request, #43587: URL: https://github.com/apache/spark/pull/43587 ### What changes were proposed in this pull request? This PR fixes links in the PySpark testing guidelines page (hyperlinks words instead of displaying the URLs). ### Why are the changes nee

Re: [PR] [SPARK-44733][PYTHON][DOCS] Add Python to Spark type conversion page to PySpark docs. [spark]

2023-10-30 Thread via GitHub
allisonwang-db commented on code in PR #43369: URL: https://github.com/apache/spark/pull/43369#discussion_r1376549912 ## python/docs/source/user_guide/sql/type_conversions.rst: ## @@ -0,0 +1,249 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more cont

  1   2   >