Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-06 Thread via GitHub
wbo4958 commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1513991688 ## python/pyspark/resource/profile.py: ## @@ -114,14 +122,26 @@ def id(self) -> int: int A unique id of this :class:`ResourceProfile` """

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-06 Thread via GitHub
wbo4958 commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1513993480 ## python/pyspark/sql/connect/resource/profile.py: ## @@ -0,0 +1,69 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license a

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-06 Thread via GitHub
wbo4958 commented on PR #45232: URL: https://github.com/apache/spark/pull/45232#issuecomment-1980292725 Hi @HyukjinKwon, Could you help review again, thx very much. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-06 Thread via GitHub
HyukjinKwon commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1514004559 ## python/pyspark/resource/profile.py: ## @@ -114,14 +122,26 @@ def id(self) -> int: int A unique id of this :class:`ResourceProfile`

[PR] [SPARK-47300] `quoteIfNeeded` should quote identifier starts with digits [spark]

2024-03-06 Thread via GitHub
cloud-fan opened a new pull request, #45401: URL: https://github.com/apache/spark/pull/45401 ### What changes were proposed in this pull request? `quoteIfNeeded` is used to generate pretty strings of identifiers in error message, EXPLAIN result, etc. It's mostly for humans to

Re: [PR] [SPARK-47300] `quoteIfNeeded` should quote identifier starts with digits [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on PR #45401: URL: https://github.com/apache/spark/pull/45401#issuecomment-1980309252 cc @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] [DO-NOT-MERGE] Restructuring MasterSuite [spark]

2024-03-06 Thread via GitHub
HyukjinKwon commented on PR #45366: URL: https://github.com/apache/spark/pull/45366#issuecomment-1980320410 https://github.com/HyukjinKwon/spark/actions/runs/8168999430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-47293][CORE] Build batchSchema with sparkSchema instead of append one by one [spark]

2024-03-06 Thread via GitHub
zwangsheng commented on PR #45396: URL: https://github.com/apache/spark/pull/45396#issuecomment-1980326665 @ShreyeshArangath @yaooqinn @ulysses-you thanks all of you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Spark 47301 [spark]

2024-03-06 Thread via GitHub
panbingkun closed pull request #45402: Spark 47301 URL: https://github.com/apache/spark/pull/45402 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: revie

[PR] Spark 47301 [spark]

2024-03-06 Thread via GitHub
panbingkun opened a new pull request, #45402: URL: https://github.com/apache/spark/pull/45402 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47248][SQL][COLLATION] Improved string function support: contains [spark]

2024-03-06 Thread via GitHub
dbatomic commented on code in PR #45382: URL: https://github.com/apache/spark/pull/45382#discussion_r1512945586 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -343,19 +346,33 @@ public boolean contains(final UTF8String substring) { retur

[PR] [SPARK-47301][SQL][TESTS] Fix flaky ParquetIOSuite [spark]

2024-03-06 Thread via GitHub
panbingkun opened a new pull request, #45403: URL: https://github.com/apache/spark/pull/45403 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on code in PR #45368: URL: https://github.com/apache/spark/pull/45368#discussion_r1514118805 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -84,28 +84,29 @@ class BasicInMemoryTableCatalog extends Table

Re: [PR] [SPARK-47300] `quoteIfNeeded` should quote identifier starts with digits [spark]

2024-03-06 Thread via GitHub
yaooqinn commented on PR #45401: URL: https://github.com/apache/spark/pull/45401#issuecomment-1980432051 ``` spark-sql (default)> select version(); 3.5.0 ce5ddad990373636e94071e7cef2f31021add07b spark-sql (default)> create table d (0a real) using parquet; spark-sql (default)> desc

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on code in PR #45368: URL: https://github.com/apache/spark/pull/45368#discussion_r1514121653 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala: ## @@ -249,6 +241,17 @@ class V2SessionCatalog(catalog: SessionCatalo

Re: [PR] [SPARK-47300] `quoteIfNeeded` should quote identifier starts with digits [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on PR #45401: URL: https://github.com/apache/spark/pull/45401#issuecomment-1980434618 @yaooqinn `0a` is fine but `0d` is not, as it's double literal. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] [SPARK-47146][CORE][FOLLOWUP]Rename incorrect logger name [spark]

2024-03-06 Thread via GitHub
JacobZheng0927 opened a new pull request, #45404: URL: https://github.com/apache/spark/pull/45404 ### What changes were proposed in this pull request? Rename incorrect logger name in `UnsafeSorterSpillReader`. ### Why are the changes needed? The logger name in UnsafeSorterSpillRe

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-03-06 Thread via GitHub
JacobZheng0927 commented on code in PR #45327: URL: https://github.com/apache/spark/pull/45327#discussion_r1514197639 ## core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java: ## @@ -36,6 +38,7 @@ * of the file format). */ public final

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-03-06 Thread via GitHub
JacobZheng0927 commented on code in PR #45327: URL: https://github.com/apache/spark/pull/45327#discussion_r1514197639 ## core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java: ## @@ -36,6 +38,7 @@ * of the file format). */ public final

Re: [PR] [DO-NOT-MERGE] Restructuring MasterSuite [spark]

2024-03-06 Thread via GitHub
HyukjinKwon commented on PR #45366: URL: https://github.com/apache/spark/pull/45366#issuecomment-1980574192 https://github.com/HyukjinKwon/spark/actions/runs/8170727017/job/22337438942 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47303][CORE][TESTS] Restructuring MasterSuite [spark]

2024-03-06 Thread via GitHub
HyukjinKwon commented on PR #45366: URL: https://github.com/apache/spark/pull/45366#issuecomment-1980579656 https://github.com/HyukjinKwon/spark/actions/runs/8170741851/job/22337515014 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47303][CORE][TESTS] Restructuring MasterSuite [spark]

2024-03-06 Thread via GitHub
HyukjinKwon commented on PR #45366: URL: https://github.com/apache/spark/pull/45366#issuecomment-1980580408 Once https://github.com/HyukjinKwon/spark/actions/runs/8170727017/job/22337438942 passes, it can be merged. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-47299][PYTHON][DOCS] Use the same `versions.json` in the dropdown of different versions of PySpark documents [spark]

2024-03-06 Thread via GitHub
panbingkun commented on PR #45400: URL: https://github.com/apache/spark/pull/45400#issuecomment-1980589035 cc @HyukjinKwon @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-47303][CORE][TESTS] Restructure MasterSuite [spark]

2024-03-06 Thread via GitHub
HyukjinKwon commented on code in PR #45366: URL: https://github.com/apache/spark/pull/45366#discussion_r1514257802 ## core/src/test/scala/org/apache/spark/deploy/master/WorkerSelectionSuite.scala: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-47210][SQL][COLLATION][WIP] Implicit casting on collated expressions [spark]

2024-03-06 Thread via GitHub
mihailom-db commented on code in PR #45383: URL: https://github.com/apache/spark/pull/45383#discussion_r1514266949 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -509,18 +509,10 @@ abstract class StringPredicate extends Bi

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-06 Thread via GitHub
panbingkun commented on code in PR #45368: URL: https://github.com/apache/spark/pull/45368#discussion_r1514278795 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -84,28 +84,29 @@ class BasicInMemoryTableCatalog extends Tabl

Re: [PR] [SPARK-46992]Fix "Inconsistent results with 'sort', 'cache', and AQE." [spark]

2024-03-06 Thread via GitHub
dtarima commented on PR #45181: URL: https://github.com/apache/spark/pull/45181#issuecomment-1980642854 > > I don't think it fixes the issue completely and there are some problems with the solution. I believe a proper solution is in the following comment: [#45181 (comment)](https://github.

Re: [PR] [SPARK-47281][PYTHON][DOCS] Update the `versions.json` file for the already released spark version [spark]

2024-03-06 Thread via GitHub
panbingkun closed pull request #45386: [SPARK-47281][PYTHON][DOCS] Update the `versions.json` file for the already released spark version URL: https://github.com/apache/spark/pull/45386 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[PR] [SPARK-47302][SQL][Collation] Collate key word as identifier [spark]

2024-03-06 Thread via GitHub
dbatomic opened a new pull request, #45405: URL: https://github.com/apache/spark/pull/45405 ### What changes were proposed in this pull request? With this change we move away from using collation names as string literals and start treating them as identifiers, since that is th

Re: [PR] [SPARK-47304][SQL][TESTS] Distribute tests from `DataFrameSuite` to more specific suites [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on code in PR #45392: URL: https://github.com/apache/spark/pull/45392#discussion_r1514382831 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameShowSuite.scala: ## @@ -0,0 +1,433 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

Re: [PR] [SPARK-47304][SQL][TESTS] Distribute tests from `DataFrameSuite` to more specific suites [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on code in PR #45392: URL: https://github.com/apache/spark/pull/45392#discussion_r1514383692 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameToSchemaSuite.scala: ## @@ -365,4 +367,57 @@ class DataFrameToSchemaSuite extends QueryTest with SharedSpark

Re: [PR] [SPARK-47304][SQL][TESTS] Distribute tests from `DataFrameSuite` to more specific suites [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on code in PR #45392: URL: https://github.com/apache/spark/pull/45392#discussion_r1514386052 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala: ## @@ -2200,6 +2200,115 @@ class DataFrameAggregateSuite extends QueryTest check

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on code in PR #45368: URL: https://github.com/apache/spark/pull/45368#discussion_r1514390687 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -84,28 +84,29 @@ class BasicInMemoryTableCatalog extends Table

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1514403933 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -790,7 +790,8 @@ case class AdaptiveSparkPlanExec( currentP

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1514404676 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -148,6 +148,17 @@ abstract class QueryStageExec extends LeafExecNode {

Re: [PR] [WIP] Add ConvertCommandResultToLocalRelation rule [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on code in PR #45397: URL: https://github.com/apache/spark/pull/45397#discussion_r1514407588 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ConvertCommandResultToLocalRelation.scala: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Softw

Re: [PR] [WIP] Add ConvertCommandResultToLocalRelation rule [spark]

2024-03-06 Thread via GitHub
wForget commented on code in PR #45397: URL: https://github.com/apache/spark/pull/45397#discussion_r1514415933 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ConvertCommandResultToLocalRelation.scala: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] [SPARK-47303][CORE][TESTS] Restructure MasterSuite [spark]

2024-03-06 Thread via GitHub
HyukjinKwon commented on PR #45366: URL: https://github.com/apache/spark/pull/45366#issuecomment-1980813252 https://github.com/HyukjinKwon/spark/actions/runs/8170727017/job/22337438942 passed. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-47146][CORE][FOLLOWUP]Rename incorrect logger name [spark]

2024-03-06 Thread via GitHub
HyukjinKwon commented on PR #45404: URL: https://github.com/apache/spark/pull/45404#issuecomment-1980815820 Merged to master, branch-3.5 and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [WIP] Add ConvertCommandResultToLocalRelation rule [spark]

2024-03-06 Thread via GitHub
wForget commented on code in PR #45397: URL: https://github.com/apache/spark/pull/45397#discussion_r1514415933 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ConvertCommandResultToLocalRelation.scala: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] [SPARK-47304][SQL][TESTS] Distribute tests from `DataFrameSuite` to more specific suites [spark]

2024-03-06 Thread via GitHub
MaxGekk commented on code in PR #45392: URL: https://github.com/apache/spark/pull/45392#discussion_r1514427320 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala: ## @@ -2200,6 +2200,115 @@ class DataFrameAggregateSuite extends QueryTest checkAn

Re: [PR] [SPARK-47146][CORE][FOLLOWUP]Rename incorrect logger name [spark]

2024-03-06 Thread via GitHub
HyukjinKwon closed pull request #45404: [SPARK-47146][CORE][FOLLOWUP]Rename incorrect logger name URL: https://github.com/apache/spark/pull/45404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [WIP] Add ConvertCommandResultToLocalRelation rule [spark]

2024-03-06 Thread via GitHub
wForget commented on code in PR #45397: URL: https://github.com/apache/spark/pull/45397#discussion_r1514438642 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ConvertCommandResultToLocalRelation.scala: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Softwar

Re: [PR] [SPARK-47303][CORE][TESTS] Restructure MasterSuite [spark]

2024-03-06 Thread via GitHub
LuciferYang commented on code in PR #45366: URL: https://github.com/apache/spark/pull/45366#discussion_r1514432822 ## core/src/test/scala/org/apache/spark/deploy/master/MasterDecommisionSuite.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] [SPARK-47304][SQL][TESTS] Distribute tests from `DataFrameSuite` to more specific suites [spark]

2024-03-06 Thread via GitHub
MaxGekk commented on code in PR #45392: URL: https://github.com/apache/spark/pull/45392#discussion_r1514451415 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameToSchemaSuite.scala: ## @@ -365,4 +367,57 @@ class DataFrameToSchemaSuite extends QueryTest with SharedSparkSe

[PR] [SPARK-47305][SQL] Fix PruneFilters to tag the isStreaming flag of LocalRelation correctly when the plan has both batch and streaming [spark]

2024-03-06 Thread via GitHub
HeartSaVioR opened a new pull request, #45406: URL: https://github.com/apache/spark/pull/45406 ### What changes were proposed in this pull request? This PR proposes to fix PruneFilters to tag the isStreaming flag of LocalRelation correctly when the plan has both batch and streaming.

Re: [PR] [SPARK-47305][SQL] Fix PruneFilters to tag the isStreaming flag of LocalRelation correctly when the plan has both batch and streaming [spark]

2024-03-06 Thread via GitHub
HeartSaVioR commented on PR #45406: URL: https://github.com/apache/spark/pull/45406#issuecomment-1980847938 cc. @cloud-fan Mind taking a look? The change is straightforward. Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-06 Thread via GitHub
panbingkun commented on code in PR #45368: URL: https://github.com/apache/spark/pull/45368#discussion_r1514473565 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -84,28 +84,29 @@ class BasicInMemoryTableCatalog extends Tabl

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-06 Thread via GitHub
LuciferYang commented on code in PR #45368: URL: https://github.com/apache/spark/pull/45368#discussion_r1514474616 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -84,28 +84,29 @@ class BasicInMemoryTableCatalog extends Tab

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-06 Thread via GitHub
LuciferYang commented on code in PR #45368: URL: https://github.com/apache/spark/pull/45368#discussion_r1514474616 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -84,28 +84,29 @@ class BasicInMemoryTableCatalog extends Tab

Re: [PR] [SPARK-47300][SQL] `quoteIfNeeded` should quote identifier starts with digits [spark]

2024-03-06 Thread via GitHub
MaxGekk commented on PR #45401: URL: https://github.com/apache/spark/pull/45401#issuecomment-1980897588 How about deduplicate the tests: https://github.com/apache/spark/blob/5089140e2e6a43ffef584b42aed5cd9bc11268b6/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/StringUtils

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-06 Thread via GitHub
tgravescs commented on PR #45232: URL: https://github.com/apache/spark/pull/45232#issuecomment-1980935581 >Does this PR introduce any user-facing change? > Yes, Users can pass ResourceProfile to mapInPandas/mapInArrow through the connect pysprark client. I think you are adding the

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on code in PR #45368: URL: https://github.com/apache/spark/pull/45368#discussion_r1514536902 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -84,28 +84,29 @@ class BasicInMemoryTableCatalog extends Table

Re: [PR] [SPARK-47238][SQL] Reduce executor memory usage by making generated code in WSCG a broadcast variable [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on code in PR #45348: URL: https://github.com/apache/spark/pull/45348#discussion_r1514551447 ## sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala: ## @@ -830,6 +831,24 @@ case class WholeStageCodegenExec(child: SparkPlan)(val

Re: [PR] [SPARK-47238][SQL] Reduce executor memory usage by making generated code in WSCG a broadcast variable [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on code in PR #45348: URL: https://github.com/apache/spark/pull/45348#discussion_r1514559175 ## sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala: ## @@ -899,4 +900,28 @@ class WholeStageCodegenSuite extends QueryTest with S

Re: [PR] [SPARK-47302][SQL][Collation] Collate key word as identifier [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1514563440 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala: ## @@ -218,6 +218,6 @@ class DataTypeAstBuilder extends SqlBaseParserBaseVis

Re: [PR] [SPARK-47248][SQL][COLLATION] Improved string function support: contains [spark]

2024-03-06 Thread via GitHub
uros-db commented on code in PR #45382: URL: https://github.com/apache/spark/pull/45382#discussion_r1514612341 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -343,19 +346,33 @@ public boolean contains(final UTF8String substring) { return

Re: [PR] [SPARK-47248][SQL][COLLATION] Improved string function support: contains [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on code in PR #45382: URL: https://github.com/apache/spark/pull/45382#discussion_r1514639021 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -145,6 +149,18 @@ public Collation( } } + /** + * Auxiliar

Re: [PR] [SPARK-47248][SQL][COLLATION] Improved string function support: contains [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on code in PR #45382: URL: https://github.com/apache/spark/pull/45382#discussion_r1514641048 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -145,6 +149,18 @@ public Collation( } } + /** + * Auxiliar

Re: [PR] [SPARK-47241][SQL] Fix rule order issues for ExtractGenerator [spark]

2024-03-06 Thread via GitHub
cloud-fan commented on PR #45350: URL: https://github.com/apache/spark/pull/45350#issuecomment-1981124178 cc @MaxGekk @yaooqinn @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47302][SQL][Collation] Collate key word as identifier [spark]

2024-03-06 Thread via GitHub
srielau commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1514694683 ## python/pyspark/sql/tests/test_types.py: ## @@ -862,15 +862,13 @@ def test_parse_datatype_string(self): if k != "varchar" and k != "char":

Re: [PR] [SPARK-47304][SQL][TESTS] Distribute tests from `DataFrameSuite` to more specific suites [spark]

2024-03-06 Thread via GitHub
MaxGekk commented on PR #45392: URL: https://github.com/apache/spark/pull/45392#issuecomment-1981172448 Merging to master. Thank you, @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47304][SQL][TESTS] Distribute tests from `DataFrameSuite` to more specific suites [spark]

2024-03-06 Thread via GitHub
MaxGekk closed pull request #45392: [SPARK-47304][SQL][TESTS] Distribute tests from `DataFrameSuite` to more specific suites URL: https://github.com/apache/spark/pull/45392 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-06 Thread via GitHub
agubichev commented on code in PR #45125: URL: https://github.com/apache/spark/pull/45125#discussion_r1514762032 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpression.scala: ## @@ -34,7 +34,7 @@ import org.apache.spark.sql.catalyst.trees.T

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-06 Thread via GitHub
agubichev commented on code in PR #45125: URL: https://github.com/apache/spark/pull/45125#discussion_r1514771822 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpression.scala: ## @@ -34,7 +34,7 @@ import org.apache.spark.sql.catalyst.trees.T

Re: [PR] [SPARK-47238][SQL] Reduce executor memory usage by making generated code in WSCG a broadcast variable [spark]

2024-03-06 Thread via GitHub
jwang0306 commented on code in PR #45348: URL: https://github.com/apache/spark/pull/45348#discussion_r1514777808 ## sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala: ## @@ -830,6 +831,24 @@ case class WholeStageCodegenExec(child: SparkPlan)(val

Re: [PR] [SPARK-47238][SQL] Reduce executor memory usage by making generated code in WSCG a broadcast variable [spark]

2024-03-06 Thread via GitHub
jwang0306 commented on code in PR #45348: URL: https://github.com/apache/spark/pull/45348#discussion_r1514792504 ## sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala: ## @@ -899,4 +900,28 @@ class WholeStageCodegenSuite extends QueryTest with S

Re: [PR] [SPARK-47238][SQL] Reduce executor memory usage by making generated code in WSCG a broadcast variable [spark]

2024-03-06 Thread via GitHub
jwang0306 commented on code in PR #45348: URL: https://github.com/apache/spark/pull/45348#discussion_r1514792504 ## sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala: ## @@ -899,4 +900,28 @@ class WholeStageCodegenSuite extends QueryTest with S

Re: [PR] [SPARK-47238][SQL] Reduce executor memory usage by making generated code in WSCG a broadcast variable [spark]

2024-03-06 Thread via GitHub
jwang0306 commented on code in PR #45348: URL: https://github.com/apache/spark/pull/45348#discussion_r1514792504 ## sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala: ## @@ -899,4 +900,28 @@ class WholeStageCodegenSuite extends QueryTest with S

Re: [PR] [SPARK-47238][SQL] Reduce executor memory usage by making generated code in WSCG a broadcast variable [spark]

2024-03-06 Thread via GitHub
jwang0306 commented on PR #45348: URL: https://github.com/apache/spark/pull/45348#issuecomment-1981289286 @cloud-fan thanks for the review - I have updated the PR as suggested, please take another look, thanks! -- This is an automated message from the Apache Git Service. To respond to the

[PR] [SPARK-47254][SQL] Assign names to the error classes _LEGACY_ERROR_TEMP_325[1-9] [spark]

2024-03-06 Thread via GitHub
stefanbuk-db opened a new pull request, #45407: URL: https://github.com/apache/spark/pull/45407 ### What changes were proposed in this pull request? In the PR, I propose to assign the proper names to the legacy error classes _LEGACY_ERROR_TEMP_325[1-9], and modify tests in testing suites

Re: [PR] [SPARK-47241][SQL] Fix rule order issues for ExtractGenerator [spark]

2024-03-06 Thread via GitHub
viirya commented on code in PR #45350: URL: https://github.com/apache/spark/pull/45350#discussion_r1514897535 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2876,28 +2876,36 @@ class Analyzer(override val catalogManager: CatalogMana

Re: [PR] [SPARK-47241][SQL] Fix rule order issues for ExtractGenerator [spark]

2024-03-06 Thread via GitHub
viirya commented on code in PR #45350: URL: https://github.com/apache/spark/pull/45350#discussion_r1514903490 ## sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala: ## @@ -553,6 +552,32 @@ class GeneratorFunctionSuite extends QueryTest with SharedSparkSes

Re: [PR] [SPARK-47241][SQL] Fix rule order issues for ExtractGenerator [spark]

2024-03-06 Thread via GitHub
viirya commented on code in PR #45350: URL: https://github.com/apache/spark/pull/45350#discussion_r1514904624 ## sql/core/src/test/scala/org/apache/spark/sql/GeneratorFunctionSuite.scala: ## @@ -553,6 +552,32 @@ class GeneratorFunctionSuite extends QueryTest with SharedSparkSes

Re: [PR] [SPARK-47271][DOCS] Explain importance of statistics on SQL performance tuning page [spark]

2024-03-06 Thread via GitHub
nchammas commented on code in PR #45374: URL: https://github.com/apache/spark/pull/45374#discussion_r1514934182 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -582,11 +582,7 @@ object SQLConf { val AUTO_BROADCASTJOIN_THRESHOLD = buildConf

[PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-06 Thread via GitHub
ted-jenks opened a new pull request, #45408: URL: https://github.com/apache/spark/pull/45408 ### What changes were proposed in this pull request? [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder ### Why are the changes needed? In https://github.

Re: [PR] [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families [spark]

2024-03-06 Thread via GitHub
anishshri-db commented on code in PR #45360: URL: https://github.com/apache/spark/pull/45360#discussion_r1514964881 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -246,25 +246,35 @@ class RocksDB( colFamilyNameToHandleMap.cont

[PR] [SPARK-45827] Move data type checks to CreatableRelationProvider [spark]

2024-03-06 Thread via GitHub
cashmand opened a new pull request, #45409: URL: https://github.com/apache/spark/pull/45409 ### What changes were proposed in this pull request? In DataSource.scala, there are checks to prevent writing Variant and Interval types to a `CreatableRelationalProvider`. This PR unif

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-06 Thread via GitHub
ted-jenks commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1981542843 @dongjoon-hyun please may you take a look. Caused a big data correctness issue for us. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-06 Thread via GitHub
anishshri-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1514973964 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/ValueStateSuite.scala: ## @@ -218,3 +173,56 @@ class ValueStateSuite extends SharedSparkS

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-06 Thread via GitHub
anishshri-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1514974624 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithMapStateSuite.scala: ## @@ -0,0 +1,199 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-06 Thread via GitHub
anishshri-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1514975454 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithMapStateSuite.scala: ## @@ -0,0 +1,199 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-06 Thread via GitHub
anishshri-db commented on PR #45341: URL: https://github.com/apache/spark/pull/45341#issuecomment-1981548558 @jingz-db - test failure seems related ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-24497][SQL] Support recursive SQL [spark]

2024-03-06 Thread via GitHub
firstim commented on PR #40744: URL: https://github.com/apache/spark/pull/40744#issuecomment-1981566986 try this when this feature is not available yet. https://pypi.org/project/pyspark-connectby/ -- This is an automated message from the Apache Git Service. To respond to the message,

[PR] [SQL] Bind JDBC dialect to JDBCRDD at construction [spark]

2024-03-06 Thread via GitHub
johnnywalker opened a new pull request, #45410: URL: https://github.com/apache/spark/pull/45410 Registered dialects may differ between driver and executors. Bind dialect to the RDD on creation to use the same dialect regardless of JVM state. ### What changes were proposed in t

Re: [PR] [SPARK-39771][CORE] Add a warning msg in `Dependency` when a too large number of shuffle blocks is to be created. [spark]

2024-03-06 Thread via GitHub
xuanyuanking closed pull request #45266: [SPARK-39771][CORE] Add a warning msg in `Dependency` when a too large number of shuffle blocks is to be created. URL: https://github.com/apache/spark/pull/45266 -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] [SPARK-39771][CORE] Add a warning msg in `Dependency` when a too large number of shuffle blocks is to be created. [spark]

2024-03-06 Thread via GitHub
xuanyuanking commented on PR #45266: URL: https://github.com/apache/spark/pull/45266#issuecomment-1981661582 Thanks for the contribution @y-wei ! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-06 Thread via GitHub
ueshin commented on code in PR #45378: URL: https://github.com/apache/spark/pull/45378#discussion_r1515098224 ## python/pyspark/sql/profiler.py: ## @@ -224,6 +224,54 @@ def dump(id: int) -> None: for id in sorted(code_map.keys()): dump(id) +de

Re: [PR] [SPARK-45278] [YARN] Allow configuring Yarn executor bind address in Yarn [spark]

2024-03-06 Thread via GitHub
gedeh commented on PR #42870: URL: https://github.com/apache/spark/pull/42870#issuecomment-1981727449 Hello, I noticed this PR is closed, this is blocking Spark with Yarn in kubernetes. I dont understand whats left missing for this PR. If anyone in Spark project can shed a light what requir

[PR] [SPARK-47309][SQL][XML] Add schema inference unit tests and fix schema inference issues [spark]

2024-03-06 Thread via GitHub
shujingyang-db opened a new pull request, #45411: URL: https://github.com/apache/spark/pull/45411 ### What changes were proposed in this pull request? As titled. It also fixes schema inference issue 1) when there's an empty tag 2) when merging schema fo

[PR] [SPARK-47070][FOLLOW-UP] Add a flag guarding a subquery in aggregate rewrite [spark]

2024-03-06 Thread via GitHub
anton5798 opened a new pull request, #45412: URL: https://github.com/apache/spark/pull/45412 ### What changes were proposed in this pull request? Add a flag that guards a recently introduced new codepath inside optimizer that wraps `exists` variables into an agg function. See [#45133

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-06 Thread via GitHub
dongjoon-hyun commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1981829489 Hi, @ted-jenks . Could you elaborate your correctness situation a little more? It sounds like you have other systems to read Spark's data. -- This is an automated message from the

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-06 Thread via GitHub
jingz-db commented on PR #45341: URL: https://github.com/apache/spark/pull/45341#issuecomment-1981833646 > @jingz-db - test failure seems related ? Weirdly is passing locally. Let me resolve your comments and retrigger the CI and see if it still fails. Thanks for the review! -- Thi

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-06 Thread via GitHub
xinrong-meng commented on code in PR #45378: URL: https://github.com/apache/spark/pull/45378#discussion_r1515212345 ## python/pyspark/sql/profiler.py: ## @@ -224,6 +224,54 @@ def dump(id: int) -> None: for id in sorted(code_map.keys()): dump(id) +

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-03-06 Thread via GitHub
erenavsarogullari commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1515240053 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -790,7 +790,8 @@ case class AdaptiveSparkPlanExec(

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-03-06 Thread via GitHub
erenavsarogullari commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1515240053 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -790,7 +790,8 @@ case class AdaptiveSparkPlanExec(

Re: [PR] [SPARK-47303][CORE][TESTS] Restructure MasterSuite [spark]

2024-03-06 Thread via GitHub
HyukjinKwon commented on PR #45366: URL: https://github.com/apache/spark/pull/45366#issuecomment-1981968073 Thanks. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-03-06 Thread via GitHub
erenavsarogullari commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1515240053 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -790,7 +790,8 @@ case class AdaptiveSparkPlanExec(

Re: [PR] [SPARK-47303][CORE][TESTS] Restructure MasterSuite [spark]

2024-03-06 Thread via GitHub
HyukjinKwon closed pull request #45366: [SPARK-47303][CORE][TESTS] Restructure MasterSuite URL: https://github.com/apache/spark/pull/45366 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-03-06 Thread via GitHub
erenavsarogullari commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1515240053 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -790,7 +790,8 @@ case class AdaptiveSparkPlanExec(

  1   2   >