Re: [PR] [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families [spark]

2024-03-01 Thread via GitHub
anishshri-db commented on PR #45360: URL: https://github.com/apache/spark/pull/45360#issuecomment-1974249820 @sahnib @HeartSaVioR - PTAL, thx ! @HeartSaVioR - let me know if you are ok with the proposed dir layout changes and also if you prefer them in a separate PR. Thanks --

Re: [PR] [SPARK-47243][SS] Correct the package name of `StateMetadataSource.scala` [spark]

2024-03-01 Thread via GitHub
LuciferYang commented on PR #45352: URL: https://github.com/apache/spark/pull/45352#issuecomment-1974248899 Thanks @MaxGekk and @dongjoon-hyun ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47242][BUILD] Bump ap-loader 3.0(v8) to support for async-profiler 3.0 [spark]

2024-03-01 Thread via GitHub
parthchandra commented on PR #45351: URL: https://github.com/apache/spark/pull/45351#issuecomment-1974173364 Yes there is no good way to test this in ci. Let me try it out and make sure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [MINOR] Mark `writing` as volatile [spark]

2024-03-01 Thread via GitHub
wForget commented on PR #45353: URL: https://github.com/apache/spark/pull/45353#issuecomment-1974158003 > Thank yo ufor the fix @wForget. Can you re-run ci? The failures do not appear to be related. thanks for your review, ci has passed -- This is an automated message from the

Re: [PR] [SPARK-47158][SQL] Assign proper name and `sqlState` to `_LEGACY_ERROR_TEMP_(2134|2231)` [spark]

2024-03-01 Thread via GitHub
itholic commented on PR #45244: URL: https://github.com/apache/spark/pull/45244#issuecomment-1974134659 Thanks for the review, @MaxGekk ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [MINOR] make param name more clear [spark]

2024-03-01 Thread via GitHub
github-actions[bot] commented on PR #43956: URL: https://github.com/apache/spark/pull/43956#issuecomment-1974121074 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [spark]

2024-03-01 Thread via GitHub
github-actions[bot] closed pull request #43808: [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans URL: https://github.com/apache/spark/pull/43808 -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-45265][SQL][WIP] Supporting Hive 4.0 metastore [spark]

2024-03-01 Thread via GitHub
github-actions[bot] closed pull request #43064: [SPARK-45265][SQL][WIP] Supporting Hive 4.0 metastore URL: https://github.com/apache/spark/pull/43064 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-45417][PYTHON] Make InheritableThread inherit the active session [spark]

2024-03-01 Thread via GitHub
github-actions[bot] closed pull request #43231: [SPARK-45417][PYTHON] Make InheritableThread inherit the active session URL: https://github.com/apache/spark/pull/43231 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47151][PYTHON][PS][BUILD] Upgrade to `pandas` 2.2.1 [spark]

2024-03-01 Thread via GitHub
nchammas commented on code in PR #45236: URL: https://github.com/apache/spark/pull/45236#discussion_r1509657863 ## python/pyspark/pandas/supported_api_gen.py: ## @@ -38,7 +38,7 @@ MAX_MISSING_PARAMS_SIZE = 5 COMMON_PARAMETER_SET = {"kwargs", "args", "cls"} MODULE_GROUP_MATCH

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-03-01 Thread via GitHub
erenavsarogullari commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1509646119 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -198,10 +221,18 @@ case class ShuffleQueryStageExec(

Re: [PR] [SPARK-47070] Fix invalid aggregation after subquery rewrite [spark]

2024-03-01 Thread via GitHub
anton5798 commented on code in PR #45133: URL: https://github.com/apache/spark/pull/45133#discussion_r1509548922 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -248,24 +248,72 @@ object RewritePredicateSubquery extends

Re: [PR] [SPARK-47070] Fix invalid aggregation after subquery rewrite [spark]

2024-03-01 Thread via GitHub
jchen5 commented on code in PR #45133: URL: https://github.com/apache/spark/pull/45133#discussion_r1509535334 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -248,24 +248,72 @@ object RewritePredicateSubquery extends

Re: [PR] [SPARK-47070] Fix invalid aggregation after subquery rewrite [spark]

2024-03-01 Thread via GitHub
jchen5 commented on code in PR #45133: URL: https://github.com/apache/spark/pull/45133#discussion_r1509535334 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -248,24 +248,72 @@ object RewritePredicateSubquery extends

Re: [PR] [SPARK-47070] Fix invalid aggregation after subquery rewrite [spark]

2024-03-01 Thread via GitHub
anton5798 commented on PR #45133: URL: https://github.com/apache/spark/pull/45133#issuecomment-1973897479 @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47194][BUILD] Upgrade log4j to 2.23.0 [spark]

2024-03-01 Thread via GitHub
dongjoon-hyun commented on PR #45292: URL: https://github.com/apache/spark/pull/45292#issuecomment-1973849084 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] [SPARK-47251][PYTHON] Block invalid types from the `args` argument for `sql` command [spark]

2024-03-01 Thread via GitHub
ueshin opened a new pull request, #45361: URL: https://github.com/apache/spark/pull/45361 ### What changes were proposed in this pull request? Blocks invalid types from the `args` argument for `sql` command. Additionally, uses a `PySparkValueError` instead of assertions.

Re: [PR] [SPARK-47158][SQL] Assign proper name and `sqlState` to `_LEGACY_ERROR_TEMP_(2134|2231)` [spark]

2024-03-01 Thread via GitHub
MaxGekk closed pull request #45244: [SPARK-47158][SQL] Assign proper name and `sqlState` to `_LEGACY_ERROR_TEMP_(2134|2231)` URL: https://github.com/apache/spark/pull/45244 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47158][SQL] Assign proper name and `sqlState` to `_LEGACY_ERROR_TEMP_(2134|2231)` [spark]

2024-03-01 Thread via GitHub
MaxGekk commented on PR #45244: URL: https://github.com/apache/spark/pull/45244#issuecomment-1973827372 +1, LGTM. Merging to master. Thank you, @itholic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47237][BUILD] Upgrade xmlschema-core to 2.3.1 [spark]

2024-03-01 Thread via GitHub
MaxGekk closed pull request #45347: [SPARK-47237][BUILD] Upgrade xmlschema-core to 2.3.1 URL: https://github.com/apache/spark/pull/45347 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47237][BUILD] Upgrade xmlschema-core to 2.3.1 [spark]

2024-03-01 Thread via GitHub
MaxGekk commented on PR #45347: URL: https://github.com/apache/spark/pull/45347#issuecomment-1973821219 +1, LGTM. Merging to master. Thank you, @panbingkun. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-43149][SQL] `CreateDataSourceTableCommand` should create metadata first [spark]

2024-03-01 Thread via GitHub
avikbesu commented on PR #42574: URL: https://github.com/apache/spark/pull/42574#issuecomment-1973820070 in which spark version, this issue would be fixed ? latest 3.5.1 also persists same issue. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47242][BUILD] Bump ap-loader 3.0(v8) to support for async-profiler 3.0 [spark]

2024-03-01 Thread via GitHub
dongjoon-hyun commented on PR #45351: URL: https://github.com/apache/spark/pull/45351#issuecomment-1973807623 To @SteNicholas , for this kind of PR, we can use simply `[BUILD]` tag like GitHub Action bot suggested. ![Screenshot 2024-03-01 at 11 39

Re: [PR] [SPARK-47227][FOLLOW][DOCS] Improve Spark Connect Documentation [spark]

2024-03-01 Thread via GitHub
MaxGekk closed pull request #45339: [SPARK-47227][FOLLOW][DOCS] Improve Spark Connect Documentation URL: https://github.com/apache/spark/pull/45339 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47227][FOLLOW][DOCS] Improve Spark Connect Documentation [spark]

2024-03-01 Thread via GitHub
MaxGekk commented on PR #45339: URL: https://github.com/apache/spark/pull/45339#issuecomment-1973806289 +1, LGTM. Merging to master. Thank you, @grundprinzip and @xinrong-meng @nchammas for review. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47216][DOCS] Refine layout of SQL performance tuning page [spark]

2024-03-01 Thread via GitHub
MaxGekk closed pull request #45322: [SPARK-47216][DOCS] Refine layout of SQL performance tuning page URL: https://github.com/apache/spark/pull/45322 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47216][DOCS] Refine layout of SQL performance tuning page [spark]

2024-03-01 Thread via GitHub
MaxGekk commented on PR #45322: URL: https://github.com/apache/spark/pull/45322#issuecomment-1973801912 +1, LGTM. Merging to master. Thank you, @nchammas and @HyukjinKwon for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-47244][CONNECT] `SparkConnectPlanner` make internal functions private [spark]

2024-03-01 Thread via GitHub
xinrong-meng commented on PR #45354: URL: https://github.com/apache/spark/pull/45354#issuecomment-1973796476 Just curious do we consider such changes as "user-facing" considering users may import those methods in their existing code? -- This is an automated message from the Apache Git

Re: [PR] [SPARK-47244][CONNECT] `SparkConnectPlanner` make internal functions private [spark]

2024-03-01 Thread via GitHub
xinrong-meng commented on PR #45354: URL: https://github.com/apache/spark/pull/45354#issuecomment-1973796427 LGTM, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47227][FOLLOW][DOCS] Improve Spark Connect Documentation [spark]

2024-03-01 Thread via GitHub
xinrong-meng commented on PR #45339: URL: https://github.com/apache/spark/pull/45339#issuecomment-1973783953 LGTM, thank you! Looks much clearer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47227][FOLLOW][DOCS] Improve Spark Connect Documentation [spark]

2024-03-01 Thread via GitHub
xinrong-meng commented on code in PR #45339: URL: https://github.com/apache/spark/pull/45339#discussion_r1509439684 ## docs/spark-connect-overview.md: ## @@ -67,8 +67,8 @@ that developers need to be aware of when using Spark Connect: the execution environment. In

Re: [PR] [SPARK-47243][SS] Correct the package name of `StateMetadataSource.scala` [spark]

2024-03-01 Thread via GitHub
MaxGekk closed pull request #45352: [SPARK-47243][SS] Correct the package name of `StateMetadataSource.scala` URL: https://github.com/apache/spark/pull/45352 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47243][SS] Correct the package name of `StateMetadataSource.scala` [spark]

2024-03-01 Thread via GitHub
MaxGekk commented on PR #45352: URL: https://github.com/apache/spark/pull/45352#issuecomment-1973763967 +1, LGTM. Merging to master. Thank you, @LuciferYang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

2024-03-01 Thread via GitHub
caldempsey commented on PR #52: URL: https://github.com/apache/spark-docker/pull/52#issuecomment-1973756934 Also running into this issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47070] Fix invalid aggregation after in-subquery rewrite [spark]

2024-03-01 Thread via GitHub
anton5798 commented on PR #45133: URL: https://github.com/apache/spark/pull/45133#issuecomment-1973748999 Changed the approach to wrap into max(). Ideally we'd use any_value(), but it is not working well in Spark. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-46961][SS] Using ProcessorContext to store and retrieve handle [spark]

2024-03-01 Thread via GitHub
ericm-db commented on PR #45359: URL: https://github.com/apache/spark/pull/45359#issuecomment-1973745035 cc @HeartSaVioR @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46961][SS] Using ProcessorContext to store and retrieve handle [spark]

2024-03-01 Thread via GitHub
anishshri-db commented on code in PR #45359: URL: https://github.com/apache/spark/pull/45359#discussion_r1509393780 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -3320,6 +3320,13 @@ ], "sqlState" : "42802" }, +

Re: [PR] [SPARK-46961][SS] Using ProcessorContext to store and retrieve handle [spark]

2024-03-01 Thread via GitHub
anishshri-db commented on code in PR #45359: URL: https://github.com/apache/spark/pull/45359#discussion_r1509389021 ## sql/api/src/main/scala/org/apache/spark/sql/errors/ExecutionErrors.scala: ## @@ -53,6 +53,12 @@ private[sql] trait ExecutionErrors extends DataTypeErrorsBase

Re: [PR] [SPARK-46961][SS] Using ProcessorContext to store and retrieve handle [spark]

2024-03-01 Thread via GitHub
anishshri-db commented on code in PR #45359: URL: https://github.com/apache/spark/pull/45359#discussion_r1509389021 ## sql/api/src/main/scala/org/apache/spark/sql/errors/ExecutionErrors.scala: ## @@ -53,6 +53,12 @@ private[sql] trait ExecutionErrors extends DataTypeErrorsBase

[PR] [SPARK-46961][SS] Using ProcessorContext to store and retrieve handle [spark]

2024-03-01 Thread via GitHub
ericm-db opened a new pull request, #45359: URL: https://github.com/apache/spark/pull/45359 ### What changes were proposed in this pull request? Setting the processorHandle as a part of the statefulProcessor, so that the user doesn't have to explicitly keep track of it, and

Re: [PR] [SPARK-46961][SS] Using ProcessorContext to store and retrieve handle [spark]

2024-03-01 Thread via GitHub
ericm-db closed pull request #45002: [SPARK-46961][SS] Using ProcessorContext to store and retrieve handle URL: https://github.com/apache/spark/pull/45002 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [MINOR] Mark `writing` as volatile [spark]

2024-03-01 Thread via GitHub
parthchandra commented on PR #45353: URL: https://github.com/apache/spark/pull/45353#issuecomment-1973626210 Thank yo ufor the fix @wForget. Can you re-run ci? The failures do not appear to be related. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [WIP][SPARK-47234][BUILD] Upgrade Scala to 2.13.13 [spark]

2024-03-01 Thread via GitHub
dongjoon-hyun commented on PR #45342: URL: https://github.com/apache/spark/pull/45342#issuecomment-1973541527 Thank you for updates for both. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47249][CONNECT] Fix bug where all connect executions are considered abandoned regardless of their actual status [spark]

2024-03-01 Thread via GitHub
vicennial commented on PR #45358: URL: https://github.com/apache/spark/pull/45358#issuecomment-1973540664 cc @juliuszsompolski -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-47249][CONNECT] Fix bug where all connect executions are considered abandoned regardless of their actual status [spark]

2024-03-01 Thread via GitHub
vicennial opened a new pull request, #45358: URL: https://github.com/apache/spark/pull/45358 ### What changes were proposed in this pull request? Adds a guard to check if the execution was abandoned before putting it into the `abandonedTombstones` cache ### Why are the

Re: [PR] [SPARK-43157][SQL] Clone InMemoryRelation cached plan to prevent cloned plan from referencing same objects [spark]

2024-03-01 Thread via GitHub
robreeves commented on PR #40812: URL: https://github.com/apache/spark/pull/40812#issuecomment-1973502406 @liuzqt another idea for a narrow explain utils fix is to move the `QueryPlan.OP_ID_TAG` currently stored in the `TreeNode.tags` into a thread local map maintained in explain utils.

Re: [PR] [SPARK-47102][SQL][COLLATION] Add COLLATION_ENABLED config flag [spark]

2024-03-01 Thread via GitHub
dbatomic commented on code in PR #45285: URL: https://github.com/apache/spark/pull/45285#discussion_r1509146815 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collationExpressions.scala: ## @@ -40,6 +41,11 @@ import org.apache.spark.sql.types._

Re: [PR] [SPARK-47168][SQL] Disable parquet filter pushdown when working with non default collated strings [spark]

2024-03-01 Thread via GitHub
stefankandic commented on code in PR #45262: URL: https://github.com/apache/spark/pull/45262#discussion_r1509122961 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -293,4 +293,29 @@ private[spark] object SchemaUtils { * @return The escaped

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
uros-db commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1509069300 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +183,266 @@ class CollationSuite extends DatasourceV2SQLBase { } } +

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
dbatomic commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1509083011 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +183,266 @@ class CollationSuite extends DatasourceV2SQLBase { } } +

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
uros-db commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1509076371 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +183,266 @@ class CollationSuite extends DatasourceV2SQLBase { } } +

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
uros-db commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1509076371 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +183,266 @@ class CollationSuite extends DatasourceV2SQLBase { } } +

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
uros-db commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1509076371 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +183,266 @@ class CollationSuite extends DatasourceV2SQLBase { } } +

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
dbatomic commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1509075313 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -61,6 +61,5 @@ class StringType private(val collationId: Int) extends AtomicType with

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
uros-db commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1509069300 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +183,266 @@ class CollationSuite extends DatasourceV2SQLBase { } } +

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-03-01 Thread via GitHub
dbatomic commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1509062803 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -33,6 +33,14 @@ class StringType private(val collationId: Int) extends AtomicType with

Re: [PR] [SPARK-47168][SQL] Disable parquet filter pushdown when working with non default collated strings [spark]

2024-03-01 Thread via GitHub
cloud-fan commented on code in PR #45262: URL: https://github.com/apache/spark/pull/45262#discussion_r1509062485 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala: ## @@ -279,4 +279,23 @@ object DataSourceUtils extends PredicateHelper {

Re: [PR] [SPARK-47168][SQL] Disable parquet filter pushdown when working with non default collated strings [spark]

2024-03-01 Thread via GitHub
cloud-fan commented on code in PR #45262: URL: https://github.com/apache/spark/pull/45262#discussion_r1509060591 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala: ## @@ -293,4 +293,29 @@ private[spark] object SchemaUtils { * @return The escaped

Re: [PR] [SPARK-47167][SQL] Add concrete class for JDBC anonymous relation [spark]

2024-03-01 Thread via GitHub
cloud-fan closed pull request #45259: [SPARK-47167][SQL] Add concrete class for JDBC anonymous relation URL: https://github.com/apache/spark/pull/45259 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47167][SQL] Add concrete class for JDBC anonymous relation [spark]

2024-03-01 Thread via GitHub
cloud-fan commented on PR #45259: URL: https://github.com/apache/spark/pull/45259#issuecomment-1973264143 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47081][CONNECT] Support Query Execution Progress [spark]

2024-03-01 Thread via GitHub
grundprinzip commented on PR #45150: URL: https://github.com/apache/spark/pull/45150#issuecomment-1973223167 Any chance to get some more feedback here? @HyukjinKwon or @hvanhovell ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
uros-db commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1508983490 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -497,10 +497,33 @@ case class Lower(child: Expression)

Re: [PR] [SPARK-47247][SQL] Use smaller target size when coalescing partitions with exploding joins [spark]

2024-03-01 Thread via GitHub
cloud-fan commented on PR #45357: URL: https://github.com/apache/spark/pull/45357#issuecomment-1973167129 fine-grained reduce ratio looks nice, but it needs a lot of tuning, and more or less based on experience. I don't think fixed reduce ratios work for all the cases and we probably need

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-03-01 Thread via GitHub
cloud-fan commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1508938251 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -33,6 +33,14 @@ class StringType private(val collationId: Int) extends AtomicType with

Re: [PR] [SPARK-47102][SQL][COLLATION] Add COLLATION_ENABLED config flag [spark]

2024-03-01 Thread via GitHub
cloud-fan commented on code in PR #45285: URL: https://github.com/apache/spark/pull/45285#discussion_r1508931880 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala: ## @@ -592,3 +593,28 @@ case class QualifyLocationWithWarehouse(catalog:

Re: [PR] [SPARK-47102][SQL][COLLATION] Add COLLATION_ENABLED config flag [spark]

2024-03-01 Thread via GitHub
cloud-fan commented on code in PR #45285: URL: https://github.com/apache/spark/pull/45285#discussion_r1508930731 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala: ## @@ -592,3 +593,28 @@ case class QualifyLocationWithWarehouse(catalog:

Re: [PR] [SPARK-47102][SQL][COLLATION] Add COLLATION_ENABLED config flag [spark]

2024-03-01 Thread via GitHub
cloud-fan commented on code in PR #45285: URL: https://github.com/apache/spark/pull/45285#discussion_r1508927759 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala: ## @@ -592,3 +593,18 @@ case class QualifyLocationWithWarehouse(catalog:

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
cloud-fan commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1508925352 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +183,266 @@ class CollationSuite extends DatasourceV2SQLBase { } } +

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
cloud-fan commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1508922010 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +183,266 @@ class CollationSuite extends DatasourceV2SQLBase { } } +

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
cloud-fan commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1508920123 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -497,10 +497,33 @@ case class Lower(child: Expression)

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
cloud-fan commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1508917781 ## sql/api/src/main/scala/org/apache/spark/sql/types/StringType.scala: ## @@ -61,6 +61,5 @@ class StringType private(val collationId: Int) extends AtomicType with

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-03-01 Thread via GitHub
JacobZheng0927 commented on code in PR #45327: URL: https://github.com/apache/spark/pull/45327#discussion_r1508908677 ## sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala: ## @@ -1737,3 +1738,33 @@ class JoinSuite extends QueryTest with SharedSparkSession with

Re: [PR] [SPARK-47015][SQL] Disable partitioning on collated columns [spark]

2024-03-01 Thread via GitHub
dbatomic commented on code in PR #45104: URL: https://github.com/apache/spark/pull/45104#discussion_r1508901070 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -2304,6 +2304,12 @@ }, "sqlState" : "22023" }, + "INVALID_PARTITION_COLUMN_DATA_TYPE"

Re: [PR] [SPARK-39771][CORE] Add a warning msg in `Dependency` when a too large number of shuffle blocks is to be created. [spark]

2024-03-01 Thread via GitHub
mridulm commented on code in PR #45266: URL: https://github.com/apache/spark/pull/45266#discussion_r1508881617 ## core/src/main/scala/org/apache/spark/Dependency.scala: ## @@ -206,6 +206,21 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: ClassTag]( finalizeTask =

Re: [PR] [SPARK-39771][CORE] Add a warning msg in `Dependency` when a too large number of shuffle blocks is to be created. [spark]

2024-03-01 Thread via GitHub
mridulm commented on code in PR #45266: URL: https://github.com/apache/spark/pull/45266#discussion_r1508881617 ## core/src/main/scala/org/apache/spark/Dependency.scala: ## @@ -206,6 +206,21 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: ClassTag]( finalizeTask =

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
uros-db commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1508881751 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -586,9 +608,21 @@ object ContainsExpressionBuilder extends

Re: [PR] [SPARK-39771][CORE] Add a warning msg in `Dependency` when a too large number of shuffle blocks is to be created. [spark]

2024-03-01 Thread via GitHub
mridulm commented on code in PR #45266: URL: https://github.com/apache/spark/pull/45266#discussion_r1508881617 ## core/src/main/scala/org/apache/spark/Dependency.scala: ## @@ -206,6 +206,21 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: ClassTag]( finalizeTask =

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
uros-db commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1508881130 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -341,6 +343,21 @@ public boolean contains(final UTF8String substring) { return

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-03-01 Thread via GitHub
mridulm commented on code in PR #45327: URL: https://github.com/apache/spark/pull/45327#discussion_r1508880886 ## sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala: ## @@ -1737,3 +1738,33 @@ class JoinSuite extends QueryTest with SharedSparkSession with

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
dbatomic commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1508873927 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -586,9 +608,21 @@ object ContainsExpressionBuilder extends

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
uros-db commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1508872841 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -355,14 +371,43 @@ public boolean matchAt(final UTF8String s, int pos) {

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
dbatomic commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1508868584 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -355,14 +371,43 @@ public boolean matchAt(final UTF8String s, int pos) {

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
dbatomic commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1508867241 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -341,6 +343,21 @@ public boolean contains(final UTF8String substring) {

Re: [PR] [SPARK-47194][BUILD] Upgrade log4j to 2.23.0 [spark]

2024-03-01 Thread via GitHub
panbingkun commented on PR #45292: URL: https://github.com/apache/spark/pull/45292#issuecomment-1973018722 > Let me try to submit an issue to the log4j community. An issue has been submitted to the log4j2 community: https://github.com/apache/logging-log4j2/issues/2337 The

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-03-01 Thread via GitHub
dbatomic commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1508848081 ## sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashMapGenerator.scala: ## @@ -173,7 +173,10 @@ abstract class HashMapGenerator(

Re: [PR] [WIP][SPARK-47194][BUILD] Upgrade log4j to 2.23.0 [spark]

2024-03-01 Thread via GitHub
panbingkun commented on PR #45326: URL: https://github.com/apache/spark/pull/45326#issuecomment-1972974328 Just as a record - log4j2 version 2.22.1

Re: [PR] [SPARK-47247][SQL] Use smaller target size when coalescing partitions with exploding joins [spark]

2024-03-01 Thread via GitHub
cloud-fan commented on PR #45357: URL: https://github.com/apache/spark/pull/45357#issuecomment-1972942423 cc @yaooqinn @ulysses-you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-47247][SQL] Use smaller target size when coalescing partitions with exploding joins [spark]

2024-03-01 Thread via GitHub
cloud-fan opened a new pull request, #45357: URL: https://github.com/apache/spark/pull/45357 ### What changes were proposed in this pull request? This PR changes the target partition size of AQE partition coalescing from `spark.sql.adaptive.advisoryPartitionSizeInBytes`

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
uros-db commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1508803718 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -585,10 +604,23 @@ object ContainsExpressionBuilder extends

Re: [PR] [WIP] DeduplicateRelations keeps original expressions if possible [spark]

2024-03-01 Thread via GitHub
peter-toth closed pull request #45231: [WIP] DeduplicateRelations keeps original expressions if possible URL: https://github.com/apache/spark/pull/45231 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [WIP][SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-03-01 Thread via GitHub
andrej-db commented on code in PR #45293: URL: https://github.com/apache/spark/pull/45293#discussion_r1508757748 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/executeImmediate.scala: ## @@ -122,11 +122,14 @@ class SubstituteExecuteImmediate(val

Re: [PR] [SPARK-47167][SQL] Add concrete class for JDBC anonymous relation [spark]

2024-03-01 Thread via GitHub
urosstan-db commented on code in PR #45259: URL: https://github.com/apache/spark/pull/45259#discussion_r1508739221 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCV1CompatibleRelation.scala: ## @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache

[PR] [SPARK-47246][SQL] Replace 'InternalRow.fromSeq' with 'new GenericInternalRow' to save collection conversion [spark]

2024-03-01 Thread via GitHub
LuciferYang opened a new pull request, #45356: URL: https://github.com/apache/spark/pull/45356 ### What changes were proposed in this pull request? This PR changes the use of `InternalRow.fromSeq` in the following scenarios to directly construct `GenericInternalRow` to save a collection

Re: [PR] [SPARK-43255][SQL] Replace the error class _LEGACY_ERROR_TEMP_2020 by an internal error [spark]

2024-03-01 Thread via GitHub
MaxGekk closed pull request #45302: [SPARK-43255][SQL] Replace the error class _LEGACY_ERROR_TEMP_2020 by an internal error URL: https://github.com/apache/spark/pull/45302 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-43255][SQL] Replace the error class _LEGACY_ERROR_TEMP_2020 by an internal error [spark]

2024-03-01 Thread via GitHub
MaxGekk commented on PR #45302: URL: https://github.com/apache/spark/pull/45302#issuecomment-1972809205 +1, LGTM. Merging to master. Thank you, @JinHelin404 and @eejbyfeldt @amaliujia for review. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
uros-db commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1508704687 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -341,6 +342,21 @@ public boolean contains(final UTF8String substring) { return

Re: [PR] [SPARK-43255][SQL] Replace the error class _LEGACY_ERROR_TEMP_2020 by an internal error [spark]

2024-03-01 Thread via GitHub
JinHelin404 commented on PR #45302: URL: https://github.com/apache/spark/pull/45302#issuecomment-1972782237 > @JinHelin404 Do you have an account at OSS JIRA: https://issues.apache.org/jira/browse/SPARK-43255? If so, please, leave a comment in the ticket (I will assign it to you) otherwise

[PR] [SPARK-47245][SQL] Improve error code for INVALID_PARTITION_COLUMN_DATA_TYPE [spark]

2024-03-01 Thread via GitHub
stefankandic opened a new pull request, #45355: URL: https://github.com/apache/spark/pull/45355 ### What changes were proposed in this pull request? Improving the error code for error class `INVALID_PARTITION_COLUMN_DATA_TYPE`. ### Why are the changes needed?

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
uros-db commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1508674676 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -355,14 +371,43 @@ public boolean matchAt(final UTF8String s, int pos) {

Re: [PR] [SPARK-47131][SQL][COLLATION] String function support: contains, startswith, endswith [spark]

2024-03-01 Thread via GitHub
uros-db commented on code in PR #45216: URL: https://github.com/apache/spark/pull/45216#discussion_r1508674676 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -355,14 +371,43 @@ public boolean matchAt(final UTF8String s, int pos) {

  1   2   >