[PR] [SPARK-47210][SQL][COLLATION] Implicit casting on collated expressions [spark]

2024-03-05 Thread via GitHub
mihailom-db opened a new pull request, #45383: URL: https://github.com/apache/spark/pull/45383 ### What changes were proposed in this pull request? This PR adds automatic casting and collations resolution as per `PGSQL` behaviour: 1. Collations set on the metadata level are implici

[PR] [SPARK-47280][SQL] Remove timezone limitation for ORACLE TIMESTAMP WITH TIMEZONE [spark]

2024-03-05 Thread via GitHub
yaooqinn opened a new pull request, #45384: URL: https://github.com/apache/spark/pull/45384 ### What changes were proposed in this pull request? As illustrated by Oracle Documentation: TIMESTAMP WITH TIME ZONE and TIMESTAMP WITH LOCAL TIME ZONE types can be repr

Re: [PR] [SPARK-47253][CORE] Allow LiveEventBus to stop without the completely draining of event queue [spark]

2024-03-05 Thread via GitHub
beliefer commented on code in PR #45367: URL: https://github.com/apache/spark/pull/45367#discussion_r1512329502 ## core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala: ## @@ -142,9 +142,11 @@ private class AsyncEventQueue( eventCount.incrementAndGet()

[PR] [SPARK-47279][CORE]When the messageLoop encounter a fatal exception, such as oom, exit the JVM to avoid the driver hanging forever [spark]

2024-03-05 Thread via GitHub
lastbus opened a new pull request, #45385: URL: https://github.com/apache/spark/pull/45385 ### What changes were proposed in this pull request? When a task has finished and sent messages back to the driver, but the driver cannot create new thread because of insufficient memory, then t

Re: [PR] [SPARK-47253][CORE] Allow LiveEventBus to stop without the completely draining of event queue [spark]

2024-03-05 Thread via GitHub
TakawaAkirayo commented on code in PR #45367: URL: https://github.com/apache/spark/pull/45367#discussion_r1512354669 ## core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala: ## @@ -142,9 +142,11 @@ private class AsyncEventQueue( eventCount.incrementAndGet(

[PR] [SPARK-47281][PYSPARK][DOCS] Update the `versions. json` file for the already released saprk version [spark]

2024-03-05 Thread via GitHub
panbingkun opened a new pull request, #45386: URL: https://github.com/apache/spark/pull/45386 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47281][PYTHON][DOCS] Update the `versions. json` file for the already released spark version [spark]

2024-03-05 Thread via GitHub
panbingkun commented on PR #45386: URL: https://github.com/apache/spark/pull/45386#issuecomment-1978229328 cc @HyukjinKwon @HeartSaVioR @dongjoon-hyun @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-47281][PYTHON][DOCS] Update the `versions. json` file for the already released spark version [spark]

2024-03-05 Thread via GitHub
panbingkun commented on PR #45386: URL: https://github.com/apache/spark/pull/45386#issuecomment-1978231255 For the `published document`, I will submit another PR to manually update it first. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-47281][PYTHON][DOCS] Update the `versions. json` file for the already released spark version [spark]

2024-03-05 Thread via GitHub
panbingkun commented on PR #45386: URL: https://github.com/apache/spark/pull/45386#issuecomment-1978241875 > For the `published document`, I will submit another PR to manually update it first. Okay, I checked the `spark-website` code repository and currently only the `spark-3.5.1` ve

Re: [PR] [SPARK-47281][PYTHON][DOCS] Update the `versions. json` file for the already released spark version [spark]

2024-03-05 Thread via GitHub
HeartSaVioR commented on PR #45386: URL: https://github.com/apache/spark/pull/45386#issuecomment-1978242397 We shouldn't go with this approach before constructing the story to update the released version. I feel like this is not scalable. -- This is an automated message from the Apache Gi

Re: [PR] [SPARK-47177][SQL][3.4] Cached SQL plan do not display final AQE plan in explain string [spark]

2024-03-05 Thread via GitHub
ulysses-you commented on PR #45381: URL: https://github.com/apache/spark/pull/45381#issuecomment-1978242602 thank you all, merging to branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-47177][SQL][3.4] Cached SQL plan do not display final AQE plan in explain string [spark]

2024-03-05 Thread via GitHub
ulysses-you closed pull request #45381: [SPARK-47177][SQL][3.4] Cached SQL plan do not display final AQE plan in explain string URL: https://github.com/apache/spark/pull/45381 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-47281][PYTHON][DOCS] Update the `versions. json` file for the already released spark version [spark]

2024-03-05 Thread via GitHub
panbingkun commented on PR #45386: URL: https://github.com/apache/spark/pull/45386#issuecomment-1978246862 I have marked this PR on JIRA and it needs to be applied to branch-3.5.1 https://github.com/apache/spark/assets/15246973/8e295cbd-a030-4f6f-b4a2-8a9cae5b0144";> -- This is

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-03-05 Thread via GitHub
ulysses-you commented on PR #45234: URL: https://github.com/apache/spark/pull/45234#issuecomment-1978266178 cc @cloud-fan @maryannxue as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-47281][PYTHON][DOCS] Update the `versions. json` file for the already released spark version [spark]

2024-03-05 Thread via GitHub
panbingkun commented on PR #45386: URL: https://github.com/apache/spark/pull/45386#issuecomment-1978274301 > We shouldn't go with this approach before constructing the story to update the released versions of docs. I feel like this is not scalable. If this feature is not needed for th

Re: [PR] [DO-NOT-MERGE] Restructuring MasterSuite [spark]

2024-03-05 Thread via GitHub
HyukjinKwon commented on PR #45366: URL: https://github.com/apache/spark/pull/45366#issuecomment-1978275552 https://github.com/HyukjinKwon/spark/actions/runs/8153673605/job/22285532203 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47277] PySpark util function assertDataFrameEqual should not support streaming DF [spark]

2024-03-05 Thread via GitHub
WweiL commented on PR #45380: URL: https://github.com/apache/spark/pull/45380#issuecomment-1978276102 CI error is still related, i'll rebuild locally and verify tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] [SPARK-47283][PYSPARK][DOCS] Remove Spark version drop down to the PySpark doc site [spark]

2024-03-05 Thread via GitHub
panbingkun opened a new pull request, #45387: URL: https://github.com/apache/spark/pull/45387 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-47281][PYTHON][DOCS] Update the `versions. json` file for the already released spark version [spark]

2024-03-05 Thread via GitHub
panbingkun commented on PR #45386: URL: https://github.com/apache/spark/pull/45386#issuecomment-1978297201 The PR for removing this feature is here: https://github.com/apache/spark/pull/45387 -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-03-05 Thread via GitHub
cloud-fan commented on code in PR #45293: URL: https://github.com/apache/spark/pull/45293#discussion_r1512459777 ## sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala: ## @@ -336,6 +337,18 @@ class QueryExecutionSuite extends SharedSparkSession {

Re: [PR] [SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-03-05 Thread via GitHub
andrej-db commented on code in PR #45293: URL: https://github.com/apache/spark/pull/45293#discussion_r1512462874 ## sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala: ## @@ -336,6 +337,18 @@ class QueryExecutionSuite extends SharedSparkSession {

Re: [PR] [SPARK-46992]make dataset.cache() return new ds instance [spark]

2024-03-05 Thread via GitHub
doki23 commented on PR #45181: URL: https://github.com/apache/spark/pull/45181#issuecomment-1978337106 > Regardless of the answer I think it makes sense to use the same approach for both Dataset states (persisted and unpersisted). I agree. We can cache it in a lazy variable `queryExec

Re: [PR] [SPARK-44259][CONNECT][TESTS] Make `connect-client-jvm` pass on Java 21 except `RemoteSparkSession`-based tests [spark]

2024-03-05 Thread via GitHub
Midhunpottammal commented on PR #41805: URL: https://github.com/apache/spark/pull/41805#issuecomment-1978359967 > Merged to master for Apache Spark 3.5.0. Thank you, @LuciferYang , @yaooqinn , @HyukjinKwon . while java Java(TM) SE Runtime Environment (build 21.0.2+13-LTS-58) Spark

Re: [PR] [SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-03-05 Thread via GitHub
cloud-fan commented on code in PR #45293: URL: https://github.com/apache/spark/pull/45293#discussion_r1512564403 ## sql/core/src/test/scala/org/apache/spark/sql/execution/ExecuteImmediateEndToEndSuite.scala: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation

[PR] [SPARK-47285][SQL] AdaptiveSparkPlanExec should always use the context.session [spark]

2024-03-05 Thread via GitHub
ulysses-you opened a new pull request, #45388: URL: https://github.com/apache/spark/pull/45388 ### What changes were proposed in this pull request? Use `context.session` instead of `session` to avoid potential issue. For example, a cached plan may re-instance `AdaptiveSparkPla

Re: [PR] [SPARK-47285][SQL] AdaptiveSparkPlanExec should always use the context.session [spark]

2024-03-05 Thread via GitHub
ulysses-you commented on PR #45388: URL: https://github.com/apache/spark/pull/45388#issuecomment-1978430201 cc @cloud-fan @yaooqinn thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-47168][SQL] Disable parquet filter pushdown when working with non default collated strings [spark]

2024-03-05 Thread via GitHub
stefankandic commented on PR #45262: URL: https://github.com/apache/spark/pull/45262#issuecomment-1978441754 @cloud-fan are we okay to merge this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-03-05 Thread via GitHub
andrej-db commented on code in PR #45293: URL: https://github.com/apache/spark/pull/45293#discussion_r1512603069 ## sql/core/src/test/scala/org/apache/spark/sql/execution/ExecuteImmediateEndToEndSuite.scala: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47239][SQL] Support distinct window function [spark]

2024-03-05 Thread via GitHub
zml1206 commented on PR #45349: URL: https://github.com/apache/spark/pull/45349#issuecomment-1978489581 cc @cloud-fan What do you think of this feature? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] [SPARK-46835][SQL][Collations] Join support for non-binary collations [spark]

2024-03-05 Thread via GitHub
dbatomic opened a new pull request, #45389: URL: https://github.com/apache/spark/pull/45389 ### What changes were proposed in this pull request? This change introduces support for JOINs for StringType that doesn't follow binary collations (e.g. *LCASE collations). For i

Re: [PR] [DO-NOT-MERGE] Restructuring MasterSuite [spark]

2024-03-05 Thread via GitHub
HyukjinKwon commented on PR #45366: URL: https://github.com/apache/spark/pull/45366#issuecomment-1978537445 https://github.com/HyukjinKwon/spark/actions/runs/8155204243 https://github.com/HyukjinKwon/spark/actions/runs/8155325532 https://github.com/HyukjinKwon/spark/actions/runs/8155328

[PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-03-05 Thread via GitHub
JacobZheng0927 opened a new pull request, #45390: URL: https://github.com/apache/spark/pull/45390 This pr backport https://github.com/apache/spark/pull/45327 to branch-3.5 ### What changes were proposed in this pull request? Add TaskCompletionListener to close inputStream to avoid thr

Re: [PR] [SPARK-47102][SQL] Add the `COLLATION_ENABLED` config flag [spark]

2024-03-05 Thread via GitHub
MaxGekk commented on PR #45285: URL: https://github.com/apache/spark/pull/45285#issuecomment-1978618455 The failed GA [Run / Build modules: pyspark-connect](https://github.com/uros-db/spark/actions/runs/8154382989/job/22292454313#logs) has been passed already a couple comments before. Highl

Re: [PR] [SPARK-47102][SQL] Add the `COLLATION_ENABLED` config flag [spark]

2024-03-05 Thread via GitHub
MaxGekk commented on PR #45285: URL: https://github.com/apache/spark/pull/45285#issuecomment-1978621347 +1, LGTM. Merging to master. Thank you, @mihailom-db and @cloud-fan @dbatomic @mkaravel for review. -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] [SPARK-47102][SQL][COLLATION] Adding COLLATION_ENABLED config [spark]

2024-03-05 Thread via GitHub
MaxGekk closed pull request #45218: [SPARK-47102][SQL][COLLATION] Adding COLLATION_ENABLED config URL: https://github.com/apache/spark/pull/45218 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47102][SQL] Add the `COLLATION_ENABLED` config flag [spark]

2024-03-05 Thread via GitHub
MaxGekk closed pull request #45285: [SPARK-47102][SQL] Add the `COLLATION_ENABLED` config flag URL: https://github.com/apache/spark/pull/45285 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-03-05 Thread via GitHub
JacobZheng0927 commented on PR #45327: URL: https://github.com/apache/spark/pull/45327#issuecomment-1978625585 > @JacobZheng0927, might be a good idea to backport this to 3.5 as well - will you be able to create a backport PR ? (I ran into some issue locally when trying to merge to branch-3

Re: [PR] [SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-03-05 Thread via GitHub
cloud-fan commented on code in PR #45293: URL: https://github.com/apache/spark/pull/45293#discussion_r1512783420 ## sql/core/src/test/scala/org/apache/spark/sql/execution/ExecuteImmediateEndToEndSuite.scala: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47168][SQL] Disable parquet filter pushdown when working with non default collated strings [spark]

2024-03-05 Thread via GitHub
cloud-fan commented on PR #45262: URL: https://github.com/apache/spark/pull/45262#issuecomment-1978722134 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-47168][SQL] Disable parquet filter pushdown when working with non default collated strings [spark]

2024-03-05 Thread via GitHub
cloud-fan closed pull request #45262: [SPARK-47168][SQL] Disable parquet filter pushdown when working with non default collated strings URL: https://github.com/apache/spark/pull/45262 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] [SPARK-47270][SQL] Dataset.isEmpty projects CommandResults locally [spark]

2024-03-05 Thread via GitHub
peter-toth commented on PR #45373: URL: https://github.com/apache/spark/pull/45373#issuecomment-1978754010 I don't fully get this issue. In https://github.com/apache/spark/pull/40779 the `isTemporary` column had to be casted to string so a job was triggered. But why does `isEmpty` trigger a

Re: [PR] [SPARK-47280][SQL] Remove timezone limitation for ORACLE TIMESTAMP WITH TIMEZONE [spark]

2024-03-05 Thread via GitHub
yaooqinn commented on code in PR #45384: URL: https://github.com/apache/spark/pull/45384#discussion_r1512817343 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala: ## @@ -284,35 +288,20 @@ class OracleIntegrationSuite exte

Re: [PR] [SPARK-47270][SQL] Dataset.isEmpty projects CommandResults locally [spark]

2024-03-05 Thread via GitHub
wForget commented on PR #45373: URL: https://github.com/apache/spark/pull/45373#issuecomment-1978771030 > I don't fully get this issue. In #40779 the `isTemporary` column had to be casted to string so a job was triggered. But why does `isEmpty` trigger a job? Also, do other APIs (like `head

Re: [PR] [SPARK-47270][SQL] Dataset.isEmpty projects CommandResults locally [spark]

2024-03-05 Thread via GitHub
wForget commented on PR #45373: URL: https://github.com/apache/spark/pull/45373#issuecomment-1978793412 > Also, do other APIs (like head()) trigger jobs on CommandResults that shouldn't? The `head()` method will not trigger a job. Because `CollectLimitExec.executeCollect()` calls `c

Re: [PR] [SPARK-46835][SQL][Collations] Join support for non-binary collations [spark]

2024-03-05 Thread via GitHub
GideonPotok commented on PR #45389: URL: https://github.com/apache/spark/pull/45389#issuecomment-1978798573 Why does this issue not seem to exist when I search it? https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20text%20~%20SPARK-46835 -- This is an automated mes

Re: [PR] [SPARK-46992]make dataset.cache() return new ds instance [spark]

2024-03-05 Thread via GitHub
dtarima commented on PR #45181: URL: https://github.com/apache/spark/pull/45181#issuecomment-1978804309 > > Regardless of the answer I think it makes sense to use the same approach for both Dataset states (persisted and unpersisted). > > I agree. We can cache it in a lazy variable `qu

Re: [PR] [SPARK-46835][SQL][Collations] Join support for non-binary collations [spark]

2024-03-05 Thread via GitHub
dbatomic commented on PR #45389: URL: https://github.com/apache/spark/pull/45389#issuecomment-1978807250 > Why does this issue not seem to exist when I search it? https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20text%20~%20SPARK-46835 @GideonPotok - Not sure, h

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-03-05 Thread via GitHub
cloud-fan commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1512850096 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -148,6 +148,18 @@ abstract class QueryStageExec extends LeafExecNode {

Re: [PR] [SPARK-46992]make dataset.cache() return new ds instance [spark]

2024-03-05 Thread via GitHub
doki23 commented on PR #45181: URL: https://github.com/apache/spark/pull/45181#issuecomment-1978809925 We can't cache the queryExecution in the Dataset itself because the queryExecution may come from other Dataset instance. See `isEmpty`: ```scala def isEmpty: Boolean = withAction("

Re: [PR] [SPARK-46835][SQL][Collations] Join support for non-binary collations [spark]

2024-03-05 Thread via GitHub
cloud-fan commented on code in PR #45389: URL: https://github.com/apache/spark/pull/45389#discussion_r1512856918 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala: ## @@ -228,9 +235,10 @@ abstract class SparkStrategies extends QueryPlanner[SparkPla

Re: [PR] [SPARK-46835][SQL][Collations] Join support for non-binary collations [spark]

2024-03-05 Thread via GitHub
cloud-fan commented on code in PR #45389: URL: https://github.com/apache/spark/pull/45389#discussion_r1512858148 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -597,4 +597,49 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlan

Re: [PR] [SPARK-47247][SQL] Use smaller target size when coalescing partitions with exploding joins [spark]

2024-03-05 Thread via GitHub
cloud-fan commented on code in PR #45357: URL: https://github.com/apache/spark/pull/45357#discussion_r1512860006 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala: ## @@ -126,9 +126,12 @@ case class CoalesceShufflePartitions(sessi

Re: [PR] [SPARK-46835][SQL][Collations] Join support for non-binary collations [spark]

2024-03-05 Thread via GitHub
dbatomic commented on code in PR #45389: URL: https://github.com/apache/spark/pull/45389#discussion_r1512869445 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -597,4 +597,49 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlanH

Re: [PR] [SPARK-46835][SQL][Collations] Join support for non-binary collations [spark]

2024-03-05 Thread via GitHub
dbatomic commented on code in PR #45389: URL: https://github.com/apache/spark/pull/45389#discussion_r1512870997 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala: ## @@ -228,9 +235,10 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan

Re: [PR] [SPARK-46992]make dataset.cache() return new ds instance [spark]

2024-03-05 Thread via GitHub
dtarima commented on PR #45181: URL: https://github.com/apache/spark/pull/45181#issuecomment-1978837075 > We can't cache the queryExecution in the Dataset itself because the queryExecution may come from other Dataset instance. See `isEmpty`: > > ```scala > def isEmpty: Boolean = wi

Re: [PR] [SPARK-46835][SQL][Collations] Join support for non-binary collations [spark]

2024-03-05 Thread via GitHub
cloud-fan commented on code in PR #45389: URL: https://github.com/apache/spark/pull/45389#discussion_r1512887045 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -597,4 +597,49 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlan

Re: [PR] [SPARK-47248][SQL][COLLATION] Extended string function support: contains [spark]

2024-03-05 Thread via GitHub
cloud-fan commented on code in PR #45382: URL: https://github.com/apache/spark/pull/45382#discussion_r1512896346 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -343,19 +346,33 @@ public boolean contains(final UTF8String substring) { retu

Re: [PR] [SPARK-46835][SQL][Collations] Join support for non-binary collations [spark]

2024-03-05 Thread via GitHub
GideonPotok commented on code in PR #45389: URL: https://github.com/apache/spark/pull/45389#discussion_r1512899167 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala: ## @@ -205,6 +206,12 @@ abstract class SparkStrategies extends QueryPlanner[SparkP

Re: [PR] [SPARK-46835][SQL][Collations] Join support for non-binary collations [spark]

2024-03-05 Thread via GitHub
GideonPotok commented on code in PR #45389: URL: https://github.com/apache/spark/pull/45389#discussion_r1512907243 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala: ## @@ -228,37 +235,46 @@ abstract class SparkStrategies extends QueryPlanner[Spark

Re: [PR] [SPARK-46835][SQL][Collations] Join support for non-binary collations [spark]

2024-03-05 Thread via GitHub
dbatomic commented on code in PR #45389: URL: https://github.com/apache/spark/pull/45389#discussion_r1513002552 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala: ## @@ -228,37 +235,46 @@ abstract class SparkStrategies extends QueryPlanner[SparkPla

Re: [PR] [SPARK-46835][SQL][Collations] Join support for non-binary collations [spark]

2024-03-05 Thread via GitHub
dbatomic commented on code in PR #45389: URL: https://github.com/apache/spark/pull/45389#discussion_r1513004663 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -597,4 +597,49 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlanH

Re: [PR] [SPARK-46835][SQL][Collations] Join support for non-binary collations [spark]

2024-03-05 Thread via GitHub
dbatomic commented on code in PR #45389: URL: https://github.com/apache/spark/pull/45389#discussion_r1513005783 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala: ## @@ -205,6 +206,12 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan

Re: [PR] [SPARK-47210][SQL][COLLATION][WIP] Implicit casting on collated expressions [spark]

2024-03-05 Thread via GitHub
stefankandic commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1513022131 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -475,6 +475,24 @@ ], "sqlState" : "42704" }, + "COLLATION_MISMATCH" : { +"mess

Re: [PR] [SPARK-46992]make dataset.cache() return new ds instance [spark]

2024-03-05 Thread via GitHub
doki23 commented on PR #45181: URL: https://github.com/apache/spark/pull/45181#issuecomment-1979031337 I've force updated my pr and now it brings the smallest changes and fixes this issue completely. -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] [SPARK-47210][SQL][COLLATION][WIP] Implicit casting on collated expressions [spark]

2024-03-05 Thread via GitHub
stefankandic commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1513032494 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -958,14 +1062,16 @@ object TypeCoercion extends TypeCoercionBase {

Re: [PR] [SPARK-47210][SQL][COLLATION][WIP] Implicit casting on collated expressions [spark]

2024-03-05 Thread via GitHub
stefankandic commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1513034188 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -764,6 +782,91 @@ abstract class TypeCoercionBase { } } +

Re: [PR] [SPARK-47210][SQL][COLLATION][WIP] Implicit casting on collated expressions [spark]

2024-03-05 Thread via GitHub
stefankandic commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1513035016 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -764,6 +782,91 @@ abstract class TypeCoercionBase { } } +

Re: [PR] [SPARK-47210][SQL][COLLATION][WIP] Implicit casting on collated expressions [spark]

2024-03-05 Thread via GitHub
stefankandic commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1513038768 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -764,6 +782,91 @@ abstract class TypeCoercionBase { } } +

Re: [PR] [SPARK-47210][SQL][COLLATION][WIP] Implicit casting on collated expressions [spark]

2024-03-05 Thread via GitHub
stefankandic commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1513038768 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -764,6 +782,91 @@ abstract class TypeCoercionBase { } } +

Re: [PR] [SPARK-47210][SQL][COLLATION][WIP] Implicit casting on collated expressions [spark]

2024-03-05 Thread via GitHub
stefankandic commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1513042604 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -764,6 +782,91 @@ abstract class TypeCoercionBase { } } +

Re: [PR] [SPARK-47210][SQL][COLLATION][WIP] Implicit casting on collated expressions [spark]

2024-03-05 Thread via GitHub
stefankandic commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1513053311 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -764,6 +782,91 @@ abstract class TypeCoercionBase { } } +

Re: [PR] [SPARK-47210][SQL][COLLATION][WIP] Implicit casting on collated expressions [spark]

2024-03-05 Thread via GitHub
stefankandic commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1513053988 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala: ## @@ -764,6 +782,91 @@ abstract class TypeCoercionBase { } } +

Re: [PR] [SPARK-47210][SQL][COLLATION][WIP] Implicit casting on collated expressions [spark]

2024-03-05 Thread via GitHub
stefankandic commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1513061821 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -509,18 +509,10 @@ abstract class StringPredicate extends B

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-03-05 Thread via GitHub
sunchao commented on code in PR #45327: URL: https://github.com/apache/spark/pull/45327#discussion_r1513080294 ## sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala: ## @@ -1737,3 +1738,34 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSpark

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-03-05 Thread via GitHub
sunchao commented on PR #45327: URL: https://github.com/apache/spark/pull/45327#issuecomment-1979101387 > Any thoughts https://github.com/apache/spark/pull/45327#discussion_r1507120685 @dongjoon-hyun , @sunchao ? Sorry for the delay @mridulm . Posted my reply. This PR LGTM too. --

Re: [PR] [SPARK-47210][SQL][COLLATION][WIP] Implicit casting on collated expressions [spark]

2024-03-05 Thread via GitHub
stefankandic commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1513061821 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -509,18 +509,10 @@ abstract class StringPredicate extends B

Re: [PR] [SPARK-46992]Fix "Inconsistent results with 'sort', 'cache', and AQE." [spark]

2024-03-05 Thread via GitHub
dtarima commented on code in PR #45181: URL: https://github.com/apache/spark/pull/45181#discussion_r1513104440 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -193,10 +193,12 @@ private[sql] object Dataset { */ @Stable class Dataset[T] private[sql]( -

Re: [PR] [SPARK-47271][DOCS] Explain importance of statistics on SQL performance tuning page [spark]

2024-03-05 Thread via GitHub
nchammas commented on code in PR #45374: URL: https://github.com/apache/spark/pull/45374#discussion_r1513144146 ## docs/sql-performance-tuning.md: ## @@ -157,6 +157,18 @@ SELECT /*+ REBALANCE(3, c) */ * FROM t; For more details please refer to the documentation of [Partitioni

Re: [PR] [SPARK-46992]Fix "Inconsistent results with 'sort', 'cache', and AQE." [spark]

2024-03-05 Thread via GitHub
cloud-fan commented on code in PR #45181: URL: https://github.com/apache/spark/pull/45181#discussion_r1513172673 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -193,10 +193,12 @@ private[sql] object Dataset { */ @Stable class Dataset[T] private[sql]( -

Re: [PR] [SPARK-46992]Fix "Inconsistent results with 'sort', 'cache', and AQE." [spark]

2024-03-05 Thread via GitHub
cloud-fan commented on code in PR #45181: URL: https://github.com/apache/spark/pull/45181#discussion_r1513173362 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -193,10 +193,12 @@ private[sql] object Dataset { */ @Stable class Dataset[T] private[sql]( -

Re: [PR] [SPARK-46992]Fix "Inconsistent results with 'sort', 'cache', and AQE." [spark]

2024-03-05 Thread via GitHub
cloud-fan commented on code in PR #45181: URL: https://github.com/apache/spark/pull/45181#discussion_r1513176460 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -3878,6 +3880,8 @@ class Dataset[T] private[sql]( */ def persist(newLevel: StorageLevel):

Re: [PR] [SPARK-47218] [SQL] XML: Changed schemOfXml to fail on DROPMALFORMED mode [spark]

2024-03-05 Thread via GitHub
sandip-db commented on code in PR #45379: URL: https://github.com/apache/spark/pull/45379#discussion_r1513205680 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/xml/XmlSuite.scala: ## @@ -1302,6 +1302,22 @@ class XmlSuite assert(result.select("decoded

[PR] [WIP][BUILD] Upgrade RocksDB version to 8.11.3 [spark]

2024-03-05 Thread via GitHub
neilramaswamy opened a new pull request, #45391: URL: https://github.com/apache/spark/pull/45391 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-46395][CORE] Assign Spark configs to groups for use in documentation [spark]

2024-03-05 Thread via GitHub
nchammas commented on PR #44755: URL: https://github.com/apache/spark/pull/44755#issuecomment-1979311637 @holdenk - This is the config documentation approach we discussed on the mailing list. (The alternative, YAML-based approach is over on #44756.) This PR just adds the fields and me

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-05 Thread via GitHub
agubichev commented on code in PR #45125: URL: https://github.com/apache/spark/pull/45125#discussion_r1513261508 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -328,6 +328,31 @@ abstract class Optimizer(catalogManager: CatalogManage

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-05 Thread via GitHub
agubichev commented on code in PR #45125: URL: https://github.com/apache/spark/pull/45125#discussion_r1513262881 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -328,6 +328,31 @@ abstract class Optimizer(catalogManager: CatalogManage

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-03-05 Thread via GitHub
mridulm commented on code in PR #45327: URL: https://github.com/apache/spark/pull/45327#discussion_r1513264898 ## sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala: ## @@ -1737,3 +1738,34 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSpark

Re: [PR] [SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-03-05 Thread via GitHub
cloud-fan commented on PR #45293: URL: https://github.com/apache/spark/pull/45293#issuecomment-1979362005 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-03-05 Thread via GitHub
cloud-fan closed pull request #45293: [SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names URL: https://github.com/apache/spark/pull/45293 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-42040][SQL] SPJ: Introduce a new API for V2 input partition to report partition size [spark]

2024-03-05 Thread via GitHub
sunchao commented on code in PR #45314: URL: https://github.com/apache/spark/pull/45314#discussion_r1513307103 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/HasPartitionSize.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] [WIP][SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers [spark]

2024-03-05 Thread via GitHub
xinrong-meng commented on PR #45269: URL: https://github.com/apache/spark/pull/45269#issuecomment-1979532670 Marked WIP to wait for https://github.com/apache/spark/pull/45378 merged first. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] [WIP][SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers [spark]

2024-03-05 Thread via GitHub
xinrong-meng commented on code in PR #45269: URL: https://github.com/apache/spark/pull/45269#discussion_r1513424368 ## python/docs/source/reference/pyspark.sql/spark_session.rst: ## @@ -49,6 +49,7 @@ See also :class:`SparkSession`. SparkSession.createDataFrame SparkSes

Re: [PR] [WIP][SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers [spark]

2024-03-05 Thread via GitHub
xinrong-meng commented on code in PR #45269: URL: https://github.com/apache/spark/pull/45269#discussion_r1513425698 ## python/docs/source/reference/pyspark.sql/spark_session.rst: ## @@ -49,6 +49,7 @@ See also :class:`SparkSession`. SparkSession.createDataFrame SparkSes

[PR] [WIP][SQL] Distribute tests from `DataFrameSuite` to more specific suites [spark]

2024-03-05 Thread via GitHub
MaxGekk opened a new pull request, #45392: URL: https://github.com/apache/spark/pull/45392 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-05 Thread via GitHub
anishshri-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1513480414 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StateTypesEncoderUtils.scala: ## @@ -66,6 +68,38 @@ class StateTypesEncoder[GK]( keyRow

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-05 Thread via GitHub
anishshri-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1513480749 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StateTypesEncoderUtils.scala: ## @@ -66,6 +68,38 @@ class StateTypesEncoder[GK]( keyRow

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-05 Thread via GitHub
anishshri-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1513483233 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImpl.scala: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-05 Thread via GitHub
anishshri-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1513484718 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithMapStateSuite.scala: ## @@ -0,0 +1,204 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-05 Thread via GitHub
anishshri-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1513485319 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithMapStateSuite.scala: ## @@ -0,0 +1,204 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-05 Thread via GitHub
anishshri-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1513485844 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithMapStateSuite.scala: ## @@ -0,0 +1,204 @@ +/* + * Licensed to the Apache Software Foundation

  1   2   3   >