Re: [PR] [SPARK-47158][SQL] Assign proper name and `sqlState` to `_LEGACY_ERROR_TEMP_(2134|2231)` [spark]

2024-02-29 Thread via GitHub
MaxGekk commented on code in PR #45244: URL: https://github.com/apache/spark/pull/45244#discussion_r1507171237 ## sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala: ## @@ -261,8 +261,14 @@ private[sql] object DataTypeErrors extends DataTypeErrorsBase {

Re: [PR] [WIP][SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-02-29 Thread via GitHub
andrej-db commented on code in PR #45293: URL: https://github.com/apache/spark/pull/45293#discussion_r1507181594 ## sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala: ## @@ -336,6 +336,18 @@ class QueryExecutionSuite extends SharedSparkSession {

Re: [PR] [SPARK-47211][CONNECT][PYTHON] Fix ignored PySpark Connect string collation [spark]

2024-02-29 Thread via GitHub
zhengruifeng commented on PR #45316: URL: https://github.com/apache/spark/pull/45316#issuecomment-197060 @nikolamand-db seems you need to enable the Github Action? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[PR] [MINOR] Update outdated comments for class `o.a.s.s.functions` [spark]

2024-02-29 Thread via GitHub
panbingkun opened a new pull request, #45334: URL: https://github.com/apache/spark/pull/45334 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [MINOR] Update outdated comments for class `o.a.s.s.functions` [spark]

2024-02-29 Thread via GitHub
panbingkun commented on code in PR #45334: URL: https://github.com/apache/spark/pull/45334#discussion_r1507220727 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -41,22 +41,20 @@ import org.apache.spark.sql.types.DataType.parseTypeWithFallback import org

Re: [PR] [MINOR] Update outdated comments for class `o.a.s.s.functions` [spark]

2024-02-29 Thread via GitHub
panbingkun commented on PR #45334: URL: https://github.com/apache/spark/pull/45334#issuecomment-1970681548 cc @HyukjinKwon @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [WIP][SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-02-29 Thread via GitHub
cloud-fan commented on code in PR #45293: URL: https://github.com/apache/spark/pull/45293#discussion_r1507239763 ## sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala: ## @@ -336,6 +336,18 @@ class QueryExecutionSuite extends SharedSparkSession {

Re: [PR] [SPARK-47102][SQL][COLLATION] Add COLLATION_ENABLED config flag [spark]

2024-02-29 Thread via GitHub
cloud-fan commented on code in PR #45285: URL: https://github.com/apache/spark/pull/45285#discussion_r1507242125 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala: ## @@ -67,7 +73,13 @@ object CollateExpressionBuilder extends Exp

Re: [PR] [SPARK-47102][SQL][COLLATION] Add COLLATION_ENABLED config flag [spark]

2024-02-29 Thread via GitHub
mihailom-db commented on PR #45285: URL: https://github.com/apache/spark/pull/45285#issuecomment-1970706405 Current SQLConf.COLLATION_ENABLED cannot be used to prevent errors of invalid collation name in TypeContext. How should I proceed with this? In my view, if customer sends a query with

Re: [PR] [SPARK-47102][SQL][COLLATION] Add COLLATION_ENABLED config flag [spark]

2024-02-29 Thread via GitHub
cloud-fan commented on code in PR #45285: URL: https://github.com/apache/spark/pull/45285#discussion_r1507243652 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3198,7 +3198,13 @@ class AstBuilder extends DataTypeAstBuilder with SQLC

Re: [PR] [SPARK-47102][SQL][COLLATION] Add COLLATION_ENABLED config flag [spark]

2024-02-29 Thread via GitHub
mihailom-db commented on code in PR #45285: URL: https://github.com/apache/spark/pull/45285#discussion_r1507244351 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala: ## @@ -67,7 +73,13 @@ object CollateExpressionBuilder extends E

Re: [PR] [SPARK-47102][SQL][COLLATION] Add COLLATION_ENABLED config flag [spark]

2024-02-29 Thread via GitHub
mihailom-db commented on code in PR #45285: URL: https://github.com/apache/spark/pull/45285#discussion_r1507245967 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala: ## @@ -67,7 +73,13 @@ object CollateExpressionBuilder extends E

Re: [PR] [SPARK-47102][SQL][COLLATION] Add COLLATION_ENABLED config flag [spark]

2024-02-29 Thread via GitHub
cloud-fan commented on code in PR #45285: URL: https://github.com/apache/spark/pull/45285#discussion_r1507245924 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala: ## @@ -592,3 +593,18 @@ case class QualifyLocationWithWarehouse(catalog: SessionCa

Re: [PR] [SPARK-47102][SQL][COLLATION] Add COLLATION_ENABLED config flag [spark]

2024-02-29 Thread via GitHub
cloud-fan commented on code in PR #45285: URL: https://github.com/apache/spark/pull/45285#discussion_r1507247915 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala: ## @@ -67,7 +73,13 @@ object CollateExpressionBuilder extends Exp

Re: [PR] [SPARK-47102][SQL][COLLATION] Add COLLATION_ENABLED config flag [spark]

2024-02-29 Thread via GitHub
mihailom-db commented on code in PR #45285: URL: https://github.com/apache/spark/pull/45285#discussion_r1507250569 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala: ## @@ -67,7 +73,13 @@ object CollateExpressionBuilder extends E

Re: [PR] [SPARK-47102][SQL][COLLATION] Add COLLATION_ENABLED config flag [spark]

2024-02-29 Thread via GitHub
mihailom-db commented on code in PR #45285: URL: https://github.com/apache/spark/pull/45285#discussion_r1507252815 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala: ## @@ -67,7 +73,13 @@ object CollateExpressionBuilder extends E

Re: [PR] [SPARK-47102][SQL][COLLATION] Add COLLATION_ENABLED config flag [spark]

2024-02-29 Thread via GitHub
mihailom-db commented on code in PR #45285: URL: https://github.com/apache/spark/pull/45285#discussion_r1507269159 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala: ## @@ -67,7 +73,13 @@ object CollateExpressionBuilder extends E

Re: [PR] [SPARK-47102][SQL][COLLATION] Add COLLATION_ENABLED config flag [spark]

2024-02-29 Thread via GitHub
mihailom-db commented on code in PR #45285: URL: https://github.com/apache/spark/pull/45285#discussion_r1507274792 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala: ## @@ -67,7 +73,13 @@ object CollateExpressionBuilder extends E

Re: [PR] [SPARK-47211][CONNECT][PYTHON] Fix ignored PySpark Connect string collation [spark]

2024-02-29 Thread via GitHub
nikolamand-db commented on PR #45316: URL: https://github.com/apache/spark/pull/45316#issuecomment-1970744506 > @nikolamand-db seems you need to enable the Github Action? I'm having trouble with Github Actions, it's disabled for my account. Already filed a ticket with Github support.

Re: [PR] [SPARK-37932][SQL]Wait to resolve missing attributes before applying DeduplicateRelations [spark]

2024-02-29 Thread via GitHub
martinf-moodys commented on PR #35684: URL: https://github.com/apache/spark/pull/35684#issuecomment-1970792026 Hi @cloud-fan, @chenzhx, Checking for missing references is very costly on my workflow, due to the call to `missingInput`. Do you think it could be optimized, either by doing th

[PR] [CORE][TEST][MINOR] FakeTask should reference its TaskMetrics to avoid TaskMetrics accumulators being GCed before stage completion [spark]

2024-02-29 Thread via GitHub
cloud-fan opened a new pull request, #45336: URL: https://github.com/apache/spark/pull/45336 ### What changes were proposed in this pull request? I was writing some tests in Spark Core and found this log ``` INFO AccumulatorContext: Attempted to access garbage collect

Re: [PR] [CORE][TEST][MINOR] FakeTask should reference its TaskMetrics to avoid TaskMetrics accumulators being GCed before stage completion [spark]

2024-02-29 Thread via GitHub
cloud-fan commented on PR #45336: URL: https://github.com/apache/spark/pull/45336#issuecomment-1970855260 cc @jiangxb1987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] [SPARK-47194][BUILD] Upgrade log4j to 2.23.0 [spark]

2024-02-29 Thread via GitHub
panbingkun commented on PR #45292: URL: https://github.com/apache/spark/pull/45292#issuecomment-1970856013 Let me try to submit an issue to the log4j community. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [WIP][SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-02-29 Thread via GitHub
AndrejGobeX commented on code in PR #45293: URL: https://github.com/apache/spark/pull/45293#discussion_r1507399815 ## sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala: ## @@ -336,6 +336,18 @@ class QueryExecutionSuite extends SharedSparkSession {

Re: [PR] [WIP][SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-02-29 Thread via GitHub
andrej-db commented on code in PR #45293: URL: https://github.com/apache/spark/pull/45293#discussion_r1507408643 ## sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala: ## @@ -336,6 +336,18 @@ class QueryExecutionSuite extends SharedSparkSession {

Re: [PR] [WIP][SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-02-29 Thread via GitHub
andrej-db commented on code in PR #45293: URL: https://github.com/apache/spark/pull/45293#discussion_r1507412827 ## sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala: ## @@ -336,6 +336,18 @@ class QueryExecutionSuite extends SharedSparkSession {

[PR] [SPARK-42627][SPARK-26494][SQL] Support Oracle TIMESTAMP WITH LOCAL TIME ZONE [spark]

2024-02-29 Thread via GitHub
yaooqinn opened a new pull request, #45337: URL: https://github.com/apache/spark/pull/45337 ### What changes were proposed in this pull request? This PR supports TIMESTAMP WITH LOCAL TIME ZONE Datatype > TIMESTAMP WITH LOCAL TIME ZONE is another variant of TIMEST

Re: [PR] [WIP][SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-02-29 Thread via GitHub
andrej-db commented on code in PR #45293: URL: https://github.com/apache/spark/pull/45293#discussion_r1507412827 ## sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala: ## @@ -336,6 +336,18 @@ class QueryExecutionSuite extends SharedSparkSession {

Re: [PR] [WIP][SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-02-29 Thread via GitHub
andrej-db commented on code in PR #45293: URL: https://github.com/apache/spark/pull/45293#discussion_r1507412827 ## sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala: ## @@ -336,6 +336,18 @@ class QueryExecutionSuite extends SharedSparkSession {

Re: [PR] [SPARK-47194][BUILD] Upgrade log4j to 2.23.0 [spark]

2024-02-29 Thread via GitHub
LuciferYang commented on PR #45292: URL: https://github.com/apache/spark/pull/45292#issuecomment-1970987580 > Let me try to submit an issue to the log4j community. Thanks @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [WIP][SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-02-29 Thread via GitHub
cloud-fan commented on code in PR #45293: URL: https://github.com/apache/spark/pull/45293#discussion_r1507487851 ## sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala: ## @@ -336,6 +336,18 @@ class QueryExecutionSuite extends SharedSparkSession {

Re: [PR] [SPARK-47224][PS][TESTS] Split `test_split_apply_basic` and `test_split_apply_adv` [spark]

2024-02-29 Thread via GitHub
HyukjinKwon commented on PR #45332: URL: https://github.com/apache/spark/pull/45332#issuecomment-1971076603 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47224][PS][TESTS] Split `test_split_apply_basic` and `test_split_apply_adv` [spark]

2024-02-29 Thread via GitHub
HyukjinKwon closed pull request #45332: [SPARK-47224][PS][TESTS] Split `test_split_apply_basic` and `test_split_apply_adv` URL: https://github.com/apache/spark/pull/45332 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[PR] [SPARK-47229][CORE][SQL][YARN] Change the never changed `var` to `val` [spark]

2024-02-29 Thread via GitHub
LuciferYang opened a new pull request, #45338: URL: https://github.com/apache/spark/pull/45338 ### What changes were proposed in this pull request? This PR replaces unchanged `var` with `val`. ### Why are the changes needed? Use `val` instead of `var` when possible.

Re: [PR] [SPARK-47158][SQL] Assign proper name and `sqlState` to `_LEGACY_ERROR_TEMP_(2134|2231)` [spark]

2024-02-29 Thread via GitHub
itholic commented on code in PR #45244: URL: https://github.com/apache/spark/pull/45244#discussion_r1507640743 ## sql/api/src/main/scala/org/apache/spark/sql/errors/DataTypeErrors.scala: ## @@ -261,8 +261,14 @@ private[sql] object DataTypeErrors extends DataTypeErrorsBase {

Re: [PR] [SPARK-47168][SQL] Disable parquet filter pushdown when working with non default collated strings [spark]

2024-02-29 Thread via GitHub
stefankandic commented on code in PR #45262: URL: https://github.com/apache/spark/pull/45262#discussion_r1507649566 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala: ## @@ -279,4 +279,26 @@ object DataSourceUtils extends PredicateHelper

Re: [PR] [SPARK-47032][Python] Add UDTF API for "analyze" method to identify pass-through columns to output table [spark]

2024-02-29 Thread via GitHub
nickstanishadb commented on code in PR #45142: URL: https://github.com/apache/spark/pull/45142#discussion_r1506861839 ## python/pyspark/sql/udtf.py: ## @@ -123,10 +123,17 @@ class SelectedColumn: alias : str, default '' If non-empty, this is the alias for the colum

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-02-29 Thread via GitHub
dbatomic commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1507732131 ## sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashMapGenerator.scala: ## @@ -173,7 +173,10 @@ abstract class HashMapGenerator( ${has

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-02-29 Thread via GitHub
dbatomic commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1507736504 ## sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java: ## @@ -95,6 +96,20 @@ public static boolean isMutable(DataType dt) { pdt

Re: [PR] [SPARK-42627][SPARK-26494][SQL] Support Oracle TIMESTAMP WITH LOCAL TIME ZONE [spark]

2024-02-29 Thread via GitHub
steveloughran commented on PR #45337: URL: https://github.com/apache/spark/pull/45337#issuecomment-1971356377 oh, no another TZ type. From oracle -there's a surprise. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47015][SQL] Disable partitioning on collated columns [spark]

2024-02-29 Thread via GitHub
MaxGekk commented on code in PR #45104: URL: https://github.com/apache/spark/pull/45104#discussion_r1507766991 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -5142,7 +5143,7 @@ }, "_LEGACY_ERROR_TEMP_1183" : { "message" : [ - "Cannot use inter

Re: [PR] [SPARK-47015][SQL] Disable partitioning on collated columns [spark]

2024-02-29 Thread via GitHub
MaxGekk commented on code in PR #45104: URL: https://github.com/apache/spark/pull/45104#discussion_r1507766991 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -5142,7 +5143,7 @@ }, "_LEGACY_ERROR_TEMP_1183" : { "message" : [ - "Cannot use inter

Re: [PR] [SPARK-47015][SQL] Disable partitioning on collated columns [spark]

2024-02-29 Thread via GitHub
MaxGekk commented on code in PR #45104: URL: https://github.com/apache/spark/pull/45104#discussion_r1507773660 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -5142,7 +5143,7 @@ }, "_LEGACY_ERROR_TEMP_1183" : { "message" : [ - "Cannot use inter

Re: [PR] [SPARK-47015][SQL] Disable partitioning on collated columns [spark]

2024-02-29 Thread via GitHub
stefankandic commented on code in PR #45104: URL: https://github.com/apache/spark/pull/45104#discussion_r1507789713 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -5142,7 +5143,7 @@ }, "_LEGACY_ERROR_TEMP_1183" : { "message" : [ - "Cannot use

Re: [PR] [SPARK-47227][DOCS] Improve documentation for Spark Connect [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun closed pull request #45335: [SPARK-47227][DOCS] Improve documentation for Spark Connect URL: https://github.com/apache/spark/pull/45335 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47227][DOCS] Improve documentation for Spark Connect [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun commented on PR #45335: URL: https://github.com/apache/spark/pull/45335#issuecomment-1971420217 Thank you, @grundprinzip and @hvanhovell . Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] [SPARK-47231][CORE][TESTS] FakeTask should reference its TaskMetrics to avoid TaskMetrics accumulators being GCed before stage completion [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun closed pull request #45336: [SPARK-47231][CORE][TESTS] FakeTask should reference its TaskMetrics to avoid TaskMetrics accumulators being GCed before stage completion URL: https://github.com/apache/spark/pull/45336 -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-47227][DOCS] Improve documentation for Spark Connect [spark]

2024-02-29 Thread via GitHub
nchammas commented on code in PR #45335: URL: https://github.com/apache/spark/pull/45335#discussion_r1507799452 ## docs/spark-connect-overview.md: ## @@ -56,6 +56,26 @@ client through gRPC as Apache Arrow-encoded row batches. +## What is changing with Spark Connect + +On

Re: [PR] [SPARK-47231][CORE][TESTS] FakeTask should reference its TaskMetrics to avoid TaskMetrics accumulators being GCed before stage completion [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun commented on PR #45336: URL: https://github.com/apache/spark/pull/45336#issuecomment-1971430228 Thank you, @cloud-fan and @Ngone51 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47231][CORE][TESTS] FakeTask should reference its TaskMetrics to avoid TaskMetrics accumulators being GCed before stage completion [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun commented on PR #45336: URL: https://github.com/apache/spark/pull/45336#issuecomment-1971430598 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-42627][SPARK-26494][SQL] Support Oracle TIMESTAMP WITH LOCAL TIME ZONE [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun commented on PR #45337: URL: https://github.com/apache/spark/pull/45337#issuecomment-1971437495 What do you mean, @steveloughran ? `TimestampLTZ` itself is used in Apache Spark already. > oh, no another TZ type. From oracle -there's a surprise. -- This is an a

Re: [PR] [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0 [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun closed pull request #45317: [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0 URL: https://github.com/apache/spark/pull/45317 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0 [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun commented on PR #45317: URL: https://github.com/apache/spark/pull/45317#issuecomment-1971443188 Merged to master. Thank you, @steveloughran and @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-41392][BUILD][TESTS] Add `bouncy-castle` test dependencies to `sql/core` module for Hadoop 3.4.0 [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun commented on PR #45317: URL: https://github.com/apache/spark/pull/45317#issuecomment-1971446265 I hope this can help Apache Hadoop 3.4.0's adoption in the ASF communities. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47227][DOCS] Improve documentation for Spark Connect [spark]

2024-02-29 Thread via GitHub
grundprinzip commented on PR #45335: URL: https://github.com/apache/spark/pull/45335#issuecomment-1971446409 @nchammas thanks for the feedback! Will address in a follow up! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-47015][SQL] Disable partitioning on collated columns [spark]

2024-02-29 Thread via GitHub
MaxGekk commented on code in PR #45104: URL: https://github.com/apache/spark/pull/45104#discussion_r1507819005 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -5142,7 +5143,7 @@ }, "_LEGACY_ERROR_TEMP_1183" : { "message" : [ - "Cannot use inter

Re: [PR] [SPARK-47015][SQL] Disable partitioning on collated columns [spark]

2024-02-29 Thread via GitHub
MaxGekk commented on PR #45104: URL: https://github.com/apache/spark/pull/45104#issuecomment-1971449799 +1, LGTM. Merging to master. Thank you, @stefankandic and @dbatomic @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] [SPARK-47227][FOLLOW][DOCS] Improve Spark Connect Documentation [spark]

2024-02-29 Thread via GitHub
grundprinzip commented on code in PR #45339: URL: https://github.com/apache/spark/pull/45339#discussion_r1507823877 ## docs/spark-connect-overview.md: ## @@ -56,7 +56,7 @@ client through gRPC as Apache Arrow-encoded row batches. -## What is changing with Spark Connect +#

Re: [PR] [SPARK-47229][CORE][SQL][YARN] Change the never changed `var` to `val` [spark]

2024-02-29 Thread via GitHub
LuciferYang commented on PR #45338: URL: https://github.com/apache/spark/pull/45338#issuecomment-1971452751 Let me check, there might be some similar cases in the test code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47015][SQL] Disable partitioning on collated columns [spark]

2024-02-29 Thread via GitHub
MaxGekk closed pull request #45104: [SPARK-47015][SQL] Disable partitioning on collated columns URL: https://github.com/apache/spark/pull/45104 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] [SPARK-47227][FOLLOW][DOCS] Improve Spark Connect Documentation [spark]

2024-02-29 Thread via GitHub
grundprinzip opened a new pull request, #45339: URL: https://github.com/apache/spark/pull/45339 ### What changes were proposed in this pull request? Language improvements from https://github.com/apache/spark/pull/45335. ### Why are the changes needed? Readability. ### Does

Re: [PR] [SPARK-47186][DOCKER][FOLLOWUP] Reduce test time for docker ITs [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun closed pull request #45330: [SPARK-47186][DOCKER][FOLLOWUP] Reduce test time for docker ITs URL: https://github.com/apache/spark/pull/45330 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47186][DOCKER][FOLLOWUP] Reduce test time for docker ITs [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun commented on PR #45330: URL: https://github.com/apache/spark/pull/45330#issuecomment-1971477668 Thank you for keeping monitoring and working on this, @yaooqinn . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[PR] [WIP][SPARK-47227][FOLLOW][DOCS] Building Extensions [spark]

2024-02-29 Thread via GitHub
grundprinzip opened a new pull request, #45340: URL: https://github.com/apache/spark/pull/45340 ### What changes were proposed in this pull request? This PR adds a number of improvements to the documentation about how to use the extension mechanisms for Spark Connect. ### Why are t

Re: [PR] [SPARK-47229][CORE][SQL][YARN] Change the never changed `var` to `val` [spark]

2024-02-29 Thread via GitHub
LuciferYang commented on PR #45338: URL: https://github.com/apache/spark/pull/45338#issuecomment-1971498698 > I ran an inspection in IntelliJ IDEA, so, it has found at least 43 places: https://private-user-images.githubusercontent.com/1580697/308992597-74982e49-08c2-4dda-9d8b-13ac796b2227.pn

Re: [PR] [SPARK-47229][CORE][SQL][YARN] Change the never changed `var` to `val` [spark]

2024-02-29 Thread via GitHub
LuciferYang commented on PR #45338: URL: https://github.com/apache/spark/pull/45338#issuecomment-1971506017 ``` Error: /home/runner/work/spark/spark/core/src/test/scala/org/apache/spark/rpc/TestRpcEndpoint.scala:29: values cannot be volatile Error: /home/runner/work/spark/spark/cor

Re: [PR] [SPARK-47227][FOLLOW][DOCS] Improve Spark Connect Documentation [spark]

2024-02-29 Thread via GitHub
nchammas commented on code in PR #45339: URL: https://github.com/apache/spark/pull/45339#discussion_r1507868747 ## docs/spark-connect-overview.md: ## @@ -56,7 +56,7 @@ client through gRPC as Apache Arrow-encoded row batches. -## What is changing with Spark Connect +## Ho

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun commented on code in PR #45327: URL: https://github.com/apache/spark/pull/45327#discussion_r1507888706 ## sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala: ## @@ -1737,3 +1738,34 @@ class JoinSuite extends QueryTest with SharedSparkSession with Adaptiv

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun commented on PR #45327: URL: https://github.com/apache/spark/pull/45327#issuecomment-1971549366 Hi, @viirya . Did you see any thread leaks while you are working on Join (or shuffle)? -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun commented on code in PR #45327: URL: https://github.com/apache/spark/pull/45327#discussion_r1507892803 ## core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java: ## @@ -82,6 +85,13 @@ public UnsafeSorterSpillReader( Clos

Re: [PR] [SPARK-47223][SQL][CORE] Update usage of deprecated Thread.getId() to Thread.threadId() [spark]

2024-02-29 Thread via GitHub
neilagupta commented on PR #45331: URL: https://github.com/apache/spark/pull/45331#issuecomment-1971555888 Thank you for your feedback @dongjoon-hyun, I just realized this was the case right after I posted the PR 😂. I will leave this PR alone until we bump the minimum java version later.

Re: [PR] [SPARK-47223][SQL][CORE] Update usage of deprecated Thread.getId() to Thread.threadId() [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun commented on PR #45331: URL: https://github.com/apache/spark/pull/45331#issuecomment-1971561872 We had better close this because we take 10 years to drop Java 8 and 11. Java 17 will be here at least for next 4 years, @neilagupta , - https://issues.apache.org/jira/browse

Re: [PR] [SPARK-47223][SQL][CORE] Update usage of deprecated Thread.getId() to Thread.threadId() [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun closed pull request #45331: [SPARK-47223][SQL][CORE] Update usage of deprecated Thread.getId() to Thread.threadId() URL: https://github.com/apache/spark/pull/45331 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47223][SQL][CORE] Update usage of deprecated Thread.getId() to Thread.threadId() [spark]

2024-02-29 Thread via GitHub
neilagupta commented on PR #45331: URL: https://github.com/apache/spark/pull/45331#issuecomment-1971566199 Sounds good, thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47223][SQL][CORE] Update usage of deprecated Thread.getId() to Thread.threadId() [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun commented on PR #45331: URL: https://github.com/apache/spark/pull/45331#issuecomment-1971565157 Let me close first. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47223][SQL][CORE] Update usage of deprecated Thread.getId() to Thread.threadId() [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun commented on PR #45331: URL: https://github.com/apache/spark/pull/45331#issuecomment-1971569990 Thank you, @neilagupta . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-02-29 Thread via GitHub
cloud-fan commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1507920309 ## sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java: ## @@ -30,6 +30,7 @@ import org.apache.spark.SparkUnsupportedOperationExcepti

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-02-29 Thread via GitHub
cloud-fan commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1507922150 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/UnsafeRowUtils.scala: ## @@ -197,4 +197,21 @@ object UnsafeRowUtils { s"rowSizeInBytes: ${row

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-02-29 Thread via GitHub
cloud-fan commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1507922548 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/UnsafeRowUtils.scala: ## @@ -197,4 +197,21 @@ object UnsafeRowUtils { s"rowSizeInBytes: ${row

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-02-29 Thread via GitHub
cloud-fan commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1507924464 ## sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashMapGenerator.scala: ## @@ -173,7 +173,10 @@ abstract class HashMapGenerator( ${ha

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-02-29 Thread via GitHub
viirya commented on PR #45327: URL: https://github.com/apache/spark/pull/45327#issuecomment-1971599378 > Hi, @viirya . Did you see any thread leaks while you are working on Join (or shuffle)? Hi, @dongjoon-hyun. No, I've not seen it. Maybe it is because our tests are not long running

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-02-29 Thread via GitHub
sahnib commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1507925323 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -121,6 +123,42 @@ class StatefulProcessorHandleImpl( ov

Re: [PR] [SPARK-47176][SQL][FOLLOW-UP] resolveExpressions should have three versions which is the same as resolveOperators [spark]

2024-02-29 Thread via GitHub
amaliujia commented on code in PR #45321: URL: https://github.com/apache/spark/pull/45321#discussion_r1507931889 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/AnalysisHelper.scala: ## @@ -222,15 +222,23 @@ trait AnalysisHelper extends QueryPlan[Logic

Re: [PR] [SPARK-47032][Python] Add UDTF API for "analyze" method to identify pass-through columns to output table [spark]

2024-02-29 Thread via GitHub
dtenedor commented on code in PR #45142: URL: https://github.com/apache/spark/pull/45142#discussion_r1507971562 ## python/pyspark/sql/udtf.py: ## @@ -123,10 +123,17 @@ class SelectedColumn: alias : str, default '' If non-empty, this is the alias for the column or e

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-02-29 Thread via GitHub
anishshri-db commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1507971789 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -121,6 +123,42 @@ class StatefulProcessorHandleImpl(

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-02-29 Thread via GitHub
mridulm commented on code in PR #45327: URL: https://github.com/apache/spark/pull/45327#discussion_r1507995571 ## sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala: ## @@ -1737,3 +1738,34 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSpark

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-02-29 Thread via GitHub
mridulm commented on code in PR #45327: URL: https://github.com/apache/spark/pull/45327#discussion_r1507996728 ## sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala: ## @@ -1737,3 +1738,33 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSpark

Re: [PR] [SPARK-47032][Python] Add UDTF API for "analyze" method to identify pass-through columns to output table [spark]

2024-02-29 Thread via GitHub
dtenedor closed pull request #45142: [SPARK-47032][Python] Add UDTF API for "analyze" method to identify pass-through columns to output table URL: https://github.com/apache/spark/pull/45142 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] [SPARK-47032][Python] Add UDTF API for "analyze" method to identify pass-through columns to output table [spark]

2024-02-29 Thread via GitHub
dtenedor commented on PR #45142: URL: https://github.com/apache/spark/pull/45142#issuecomment-1971717342 We talked offline and the benefit from this is marginal without actually propagating the column values automatically to the output table. Having prototyped that and found no benefit in l

Re: [PR] [SPARK-46077][SQL] Consider the type generated by TimestampNTZConverter in JdbcDialect.compileValue. [spark]

2024-02-29 Thread via GitHub
09306677806 commented on PR #45261: URL: https://github.com/apache/spark/pull/45261#issuecomment-1971728089 Dear groups and respected programmers، I open bookmarks as fast as possible in Linux or Windows America and the assets of servers are hacked whenever possible and I have governmen

Re: [PR] [SPARK-47229][CORE][SQL][SS][YARN][CONNECT][TESTS] Change the never changed `var` to `val` [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun commented on code in PR #45338: URL: https://github.com/apache/spark/pull/45338#discussion_r1508022591 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -535,7 +535,7 @@ private[spark] class Client( // If preload is enab

Re: [PR] [SPARK-47229][CORE][SQL][SS][YARN][CONNECT] Change the never changed `var` to `val` [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun commented on PR #45338: URL: https://github.com/apache/spark/pull/45338#issuecomment-1971736612 Merged to master for Apache Spark 4.0.0. Thank you, @LuciferYang and @MaxGekk . -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] [SPARK-47229][CORE][SQL][SS][YARN][CONNECT] Change the never changed `var` to `val` [spark]

2024-02-29 Thread via GitHub
dongjoon-hyun closed pull request #45338: [SPARK-47229][CORE][SQL][SS][YARN][CONNECT] Change the never changed `var` to `val` URL: https://github.com/apache/spark/pull/45338 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-47229][CORE][SQL][SS][YARN][CONNECT] Change the never changed `var` to `val` [spark]

2024-02-29 Thread via GitHub
LuciferYang commented on code in PR #45338: URL: https://github.com/apache/spark/pull/45338#discussion_r1508024797 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: ## @@ -535,7 +535,7 @@ private[spark] class Client( // If preload is enable

Re: [PR] [SPARK-47229][CORE][SQL][SS][YARN][CONNECT] Change the never changed `var` to `val` [spark]

2024-02-29 Thread via GitHub
LuciferYang commented on PR #45338: URL: https://github.com/apache/spark/pull/45338#issuecomment-1971739941 Thanks @dongjoon-hyun and @MaxGekk ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers [spark]

2024-02-29 Thread via GitHub
ueshin commented on code in PR #45269: URL: https://github.com/apache/spark/pull/45269#discussion_r1508099936 ## python/docs/source/development/debugging.rst: ## @@ -341,7 +372,12 @@ Python/Pandas UDF ~ To use this on Python/Pandas UDFs, PySpark provides remo

Re: [PR] [SPARK-47211][CONNECT][PYTHON] Fix ignored PySpark Connect string collation [spark]

2024-02-29 Thread via GitHub
xinrong-meng commented on code in PR #45316: URL: https://github.com/apache/spark/pull/45316#discussion_r1508105692 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -3397,6 +3397,14 @@ def test_df_caache(self): self.assert_eq(10, df.count()) sel

Re: [PR] [SPARK-47211][CONNECT][PYTHON] Fix ignored PySpark Connect string collation [spark]

2024-02-29 Thread via GitHub
xinrong-meng commented on code in PR #45316: URL: https://github.com/apache/spark/pull/45316#discussion_r1508107339 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -3397,6 +3397,14 @@ def test_df_caache(self): self.assert_eq(10, df.count()) sel

Re: [PR] [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers [spark]

2024-02-29 Thread via GitHub
xinrong-meng commented on code in PR #45269: URL: https://github.com/apache/spark/pull/45269#discussion_r1508113784 ## python/docs/source/development/debugging.rst: ## @@ -341,7 +372,12 @@ Python/Pandas UDF ~ To use this on Python/Pandas UDFs, PySpark provide

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-02-29 Thread via GitHub
erenavsarogullari commented on PR #45234: URL: https://github.com/apache/spark/pull/45234#issuecomment-1971998442 Should we also backport this patch to `v3.4.x` and `v3.5.x`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

  1   2   >