Re: [PR] [SPARK-47692][SQL] Fix default StringType meaning in implicit casting [spark]

2024-04-23 Thread via GitHub
cloud-fan commented on code in PR #45819: URL: https://github.com/apache/spark/pull/45819#discussion_r1575749860 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -118,17 +121,21 @@ object CollationTypeCasts extends TypeCoercio

Re: [PR] [SPARK-47692][SQL] Fix default StringType meaning in implicit casting [spark]

2024-04-23 Thread via GitHub
cloud-fan commented on code in PR #45819: URL: https://github.com/apache/spark/pull/45819#discussion_r1575751066 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala: ## @@ -118,17 +121,21 @@ object CollationTypeCasts extends TypeCoercio

[PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
jiangzho opened a new pull request, #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8 ### What changes were proposed in this pull request? This PR adds Java API library for Spark Operator, with the ability to generate yaml spec. ### Why are the cha

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
jiangzho commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1575756074 ## gradle.properties: ## @@ -15,6 +15,22 @@ # limitations under the License. # +group=org.apache.spark.kubernetes.operator +version=0.1.0 + +fabric8Ve

Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-23 Thread via GitHub
uros-db commented on code in PR #46165: URL: https://github.com/apache/spark/pull/46165#discussion_r1575760634 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -84,7 +86,7 @@ object MaskExpressionBuilder extends ExpressionBuil

Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-23 Thread via GitHub
cloud-fan commented on code in PR #46165: URL: https://github.com/apache/spark/pull/46165#discussion_r1575775863 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -84,7 +86,7 @@ object MaskExpressionBuilder extends ExpressionBu

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-23 Thread via GitHub
stefankandic commented on code in PR #46083: URL: https://github.com/apache/spark/pull/46083#discussion_r1575780017 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -268,23 +296,6 @@ private[spark] class HiveExternalCatalog(conf: SparkConf,

[PR] [SPARK-47953][DOCS] MsSQLServer: Document Mapping Spark SQL Data Types to Microsoft SQL Server [spark]

2024-04-23 Thread via GitHub
yaooqinn opened a new pull request, #46177: URL: https://github.com/apache/spark/pull/46177 ### What changes were proposed in this pull request? This PR adds Document Mapping Spark SQL Data Types to Microsoft SQL Server ### Why are the changes needed?

[PR] [MINOR][DOCS] [MINOR][DOCS] Add `docs/_generated/` to .gitignore [spark]

2024-04-23 Thread via GitHub
yaooqinn opened a new pull request, #46178: URL: https://github.com/apache/spark/pull/46178 ### What changes were proposed in this pull request? This PR adds the auto-generated content `docs/_generated/` to .gitignore ### Why are the changes needed? vcs improv

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-23 Thread via GitHub
HeartSaVioR commented on PR #45991: URL: https://github.com/apache/spark/pull/45991#issuecomment-2071662357 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-23 Thread via GitHub
HeartSaVioR commented on PR #45991: URL: https://github.com/apache/spark/pull/45991#issuecomment-2071662156 > [Run / Build modules: pyspark-sql, pyspark-resource, pyspark-testing](https://github.com/ericm-db/spark/actions/runs/8794474954/job/24137701938#logs) > failed 2 hours ago in 26m 3

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-23 Thread via GitHub
HeartSaVioR closed pull request #45991: [SPARK-47805][SS] Implementing TTL for MapState URL: https://github.com/apache/spark/pull/45991 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-47604][CORE] Resource managers: Migrate logInfo with variables to structured logging framework [spark]

2024-04-23 Thread via GitHub
panbingkun commented on PR #46130: URL: https://github.com/apache/spark/pull/46130#issuecomment-2071671520 > @panbingkun Thanks. Please resolve the conflicts so that I can merge this one later. Done. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] [SPARK-47352][SQL] Fix Upper, Lower, InitCap collation awareness [spark]

2024-04-23 Thread via GitHub
cloud-fan commented on PR #46104: URL: https://github.com/apache/spark/pull/46104#issuecomment-2071720271 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-47352][SQL] Fix Upper, Lower, InitCap collation awareness [spark]

2024-04-23 Thread via GitHub
cloud-fan closed pull request #46104: [SPARK-47352][SQL] Fix Upper, Lower, InitCap collation awareness URL: https://github.com/apache/spark/pull/46104 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [SPARK-46841] Add collation support for ICU locales and collation specifiers [spark]

2024-04-23 Thread via GitHub
nikolamand-db opened a new pull request, #46180: URL: https://github.com/apache/spark/pull/46180 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-47936][BUILD] Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules [spark]

2024-04-23 Thread via GitHub
panbingkun commented on code in PR #46147: URL: https://github.com/apache/spark/pull/46147#discussion_r1575868146 ## dev/checkstyle.xml: ## @@ -196,6 +196,17 @@ + Review Comment: The rule on the `java side` is `slightly different

Re: [PR] [SPARK-44050][K8S]add retry config when creating Kubernetes resources. [spark]

2024-04-23 Thread via GitHub
beliefer commented on PR #45911: URL: https://github.com/apache/spark/pull/45911#issuecomment-2071740391 It seems you hit some bug of K8s or the using issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-47418][SQL] Add hand-crafted implementations for lowercase unicode-aware contains, startsWith and endsWith and optimize UTF8_BINARY_LCASE [spark]

2024-04-23 Thread via GitHub
vladimirg-db closed pull request #46082: [SPARK-47418][SQL] Add hand-crafted implementations for lowercase unicode-aware contains, startsWith and endsWith and optimize UTF8_BINARY_LCASE URL: https://github.com/apache/spark/pull/46082 -- This is an automated message from the Apache Git Servic

Re: [PR] [SPARK-47764][CORE][SQL] Cleanup shuffle dependencies based on ShuffleCleanupMode [spark]

2024-04-23 Thread via GitHub
bozhang2820 commented on code in PR #45930: URL: https://github.com/apache/spark/pull/45930#discussion_r1575960906 ## core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala: ## @@ -187,6 +187,7 @@ private[spark] class SortShuffleManager(conf: SparkConf) exte

[PR] [SPARK-47418][SQL] Add hand-crafted implementations for lowercase unicode-aware contains, startsWith and endsWith and optimize UTF8_BINARY_LCASE [spark]

2024-04-23 Thread via GitHub
vladimirg-db opened a new pull request, #46181: URL: https://github.com/apache/spark/pull/46181 What changes were proposed in this pull request? Added hand-crafted implementations of unicode-aware lower-case contains, startsWith, endsWith to optimize UTF8_BINARY_LCASE for ASCII-only strin

Re: [PR] [SPARK-47418][SQL] Add hand-crafted implementations for lowercase unicode-aware contains, startsWith and endsWith and optimize UTF8_BINARY_LCASE [spark]

2024-04-23 Thread via GitHub
vladimirg-db commented on PR #46082: URL: https://github.com/apache/spark/pull/46082#issuecomment-2071908572 Accidentally deleted the branch in my fork, opened a new PR: https://github.com/apache/spark/pull/46181 -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] [SPARK-47418][SQL] Add hand-crafted implementations for lowercase unicode-aware contains, startsWith and endsWith and optimize UTF8_BINARY_LCASE [spark]

2024-04-23 Thread via GitHub
vladimirg-db commented on PR #46181: URL: https://github.com/apache/spark/pull/46181#issuecomment-2071925730 Re-running the benchmarks on GHA for the new implementation proposed by @cloud-fan [here](https://github.com/apache/spark/pull/46082#discussion_r1574845082) and after that I will me

[PR] [SPARK-47952][CORE][CONNECT] Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn [spark]

2024-04-23 Thread via GitHub
TakawaAkirayo opened a new pull request, #46182: URL: https://github.com/apache/spark/pull/46182 ### What changes were proposed in this pull request? 1. Add configuration `spark.connect.grpc.port.maxRetries` (default 0, no retries). [Before this PR]: The SparkConnectServ

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-23 Thread via GitHub
stefankandic commented on PR #46083: URL: https://github.com/apache/spark/pull/46083#issuecomment-2071987485 @cloud-fan all tests passing after scalastyle fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-47910][CORE] close stream when DiskBlockObjectWriter closeResources to avoid memory leak [spark]

2024-04-23 Thread via GitHub
JacobZheng0927 commented on PR #46131: URL: https://github.com/apache/spark/pull/46131#issuecomment-2072027749 > Like #35613, do you think you can provide a way to validate your PR, @JacobZheng0927 ? Ok, I'll try to reproduce the problem with a simple script. -- This is an automate

Re: [PR] [SPARK-47910][CORE] close stream when DiskBlockObjectWriter closeResources to avoid memory leak [spark]

2024-04-23 Thread via GitHub
JacobZheng0927 commented on code in PR #46131: URL: https://github.com/apache/spark/pull/46131#discussion_r1576074670 ## core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala: ## @@ -174,23 +174,46 @@ private[spark] class DiskBlockObjectWriter( * Should ca

Re: [PR] [WIP] Only test rocksdbjni 9.x [spark]

2024-04-23 Thread via GitHub
LuciferYang commented on PR #46146: URL: https://github.com/apache/spark/pull/46146#issuecomment-2072066902 Let's test 9.1.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [WIP] Only test rocksdbjni 9.x [spark]

2024-04-23 Thread via GitHub
panbingkun commented on PR #46146: URL: https://github.com/apache/spark/pull/46146#issuecomment-2072080552 > Let's test 9.1.1 Currently, the version 9.1.1 is not available on `Maven Center Repo`, and the `rocksdbjni` has been released too frequently recently. -- This is an automate

Re: [PR] [WIP] Only test rocksdbjni 9.x [spark]

2024-04-23 Thread via GitHub
LuciferYang commented on PR #46146: URL: https://github.com/apache/spark/pull/46146#issuecomment-2072083335 We can wait for a relatively stable version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-46841] Add collation support for ICU locales and collation specifiers [spark]

2024-04-23 Thread via GitHub
dbatomic commented on code in PR #46180: URL: https://github.com/apache/spark/pull/46180#discussion_r1576141539 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -117,76 +118,490 @@ public Collation( } /** - * Construc

Re: [PR] [SPARK-46841] Add collation support for ICU locales and collation specifiers [spark]

2024-04-23 Thread via GitHub
nikolamand-db commented on code in PR #46180: URL: https://github.com/apache/spark/pull/46180#discussion_r157611 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -117,76 +118,490 @@ public Collation( } /** - * Con

Re: [PR] [SPARK-46841] Add collation support for ICU locales and collation specifiers [spark]

2024-04-23 Thread via GitHub
dbatomic commented on code in PR #46180: URL: https://github.com/apache/spark/pull/46180#discussion_r1576145404 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -117,76 +118,490 @@ public Collation( } /** - * Construc

[PR] [SPARK-47954][K8S] Support creating ingress entry for external UI access [spark]

2024-04-23 Thread via GitHub
pan3793 opened a new pull request, #46184: URL: https://github.com/apache/spark/pull/46184 ### What changes were proposed in this pull request? [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) a kind of K8s resource to expose HTTP and HTTPS routes fr

[PR] [SPARK-47956][SQL][MINOR] Sanity check for unresolved LCA reference [spark]

2024-04-23 Thread via GitHub
cloud-fan opened a new pull request, #46185: URL: https://github.com/apache/spark/pull/46185 ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/40558. The sanity check should apply to all plan nodes, not only Projec

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-23 Thread via GitHub
cloud-fan commented on PR #46083: URL: https://github.com/apache/spark/pull/46083#issuecomment-2072351055 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type [spark]

2024-04-23 Thread via GitHub
cloud-fan closed pull request #46083: [SPARK-47873][SQL] Write collated strings to Hive metastore using the regular string type URL: https://github.com/apache/spark/pull/46083 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-47764][CORE][SQL] Cleanup shuffle dependencies based on ShuffleCleanupMode [spark]

2024-04-23 Thread via GitHub
cloud-fan commented on code in PR #45930: URL: https://github.com/apache/spark/pull/45930#discussion_r1576289113 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -869,6 +874,8 @@ case class AdaptiveExecutionContext(session: Sp

[PR] [WIP] Relax the type of MDC#key. [spark]

2024-04-23 Thread via GitHub
LuciferYang opened a new pull request, #46186: URL: https://github.com/apache/spark/pull/46186 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47953][DOCS] MsSQLServer: Document Mapping Spark SQL Data Types to Microsoft SQL Server [spark]

2024-04-23 Thread via GitHub
dongjoon-hyun closed pull request #46177: [SPARK-47953][DOCS] MsSQLServer: Document Mapping Spark SQL Data Types to Microsoft SQL Server URL: https://github.com/apache/spark/pull/46177 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] [MINOR][DOCS] Add `docs/_generated/` to .gitignore [spark]

2024-04-23 Thread via GitHub
dongjoon-hyun closed pull request #46178: [MINOR][DOCS] Add `docs/_generated/` to .gitignore URL: https://github.com/apache/spark/pull/46178 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [MINOR][DOCS] Add `docs/_generated/` to .gitignore [spark]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on PR #46178: URL: https://github.com/apache/spark/pull/46178#issuecomment-2072522656 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [MINOR][DOCS] Fix type hint of 3 functions [spark]

2024-04-23 Thread via GitHub
dongjoon-hyun closed pull request #46179: [MINOR][DOCS] Fix type hint of 3 functions URL: https://github.com/apache/spark/pull/46179 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-47949][SQL][DOCKER][TESTS] MsSQLServer: Bump up mssql docker image version to 2022-CU12-GDR1-ubuntu-22.04 [spark]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #46176: URL: https://github.com/apache/spark/pull/46176#discussion_r1576389644 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSQLServerDatabaseOnDocker.scala: ## @@ -0,0 +1,32 @@ +/* + * Licensed to the Apac

Re: [PR] [SPARK-47949][SQL][DOCKER][TESTS] MsSQLServer: Bump up mssql docker image version to 2022-CU12-GDR1-ubuntu-22.04 [spark]

2024-04-23 Thread via GitHub
dongjoon-hyun closed pull request #46176: [SPARK-47949][SQL][DOCKER][TESTS] MsSQLServer: Bump up mssql docker image version to 2022-CU12-GDR1-ubuntu-22.04 URL: https://github.com/apache/spark/pull/46176 -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] [SPARK-47948][PYTHON] Upgrade the minimum `Pandas` version to 2.0.0 [spark]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #46175: URL: https://github.com/apache/spark/pull/46175#discussion_r1576394905 ## dev/create-release/spark-rm/Dockerfile: ## @@ -37,7 +37,7 @@ ENV DEBCONF_NONINTERACTIVE_SEEN true # These arguments are just for reuse and not really meant to

Re: [PR] [SPARK-47948][PYTHON] Upgrade the minimum `Pandas` version to 2.0.0 [spark]

2024-04-23 Thread via GitHub
dongjoon-hyun closed pull request #46175: [SPARK-47948][PYTHON] Upgrade the minimum `Pandas` version to 2.0.0 URL: https://github.com/apache/spark/pull/46175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [WIP] Relax the type of `MDC#key` [spark]

2024-04-23 Thread via GitHub
LuciferYang closed pull request #46186: [WIP] Relax the type of `MDC#key` URL: https://github.com/apache/spark/pull/46186 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576423719 ## gradle.properties: ## @@ -15,6 +15,22 @@ # limitations under the License. # +group=org.apache.spark.kubernetes.operator Review Comment: Pl

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576425531 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/BaseResource.java: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576426220 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/BaseResource.java: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576427263 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/Constants.java: ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576429287 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/Constants.java: ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576429938 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/Constants.java: ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-04-23 Thread via GitHub
sahnib commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1576435733 ## sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala: ## @@ -676,6 +678,43 @@ class KeyValueGroupedDataset[K, V] private[sql]( ) } + p

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576437086 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/Constants.java: ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576441326 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/spec/ApplicationTimeoutConfig.java: ## @@ -0,0 +1,40 @@ +/* + * Licensed to

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576441814 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/spec/ApplicationTimeoutConfig.java: ## @@ -0,0 +1,40 @@ +/* + * Licensed to

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576444182 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/spec/ApplicationTolerations.java: ## @@ -0,0 +1,52 @@ +/* + * Licensed to th

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576444182 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/spec/ApplicationTolerations.java: ## @@ -0,0 +1,52 @@ +/* + * Licensed to th

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576447066 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/status/ApplicationAttemptSummary.java: ## @@ -0,0 +1,40 @@ +/* + * Licensed

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576449466 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/status/ApplicationAttemptSummary.java: ## @@ -0,0 +1,40 @@ +/* + * Licensed

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576452218 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/status/ApplicationStateSummary.java: ## @@ -0,0 +1,121 @@ +/* + * Licensed t

Re: [PR] [SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-04-23 Thread via GitHub
sahnib commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1576447215 ## sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala: ## @@ -676,6 +678,43 @@ class KeyValueGroupedDataset[K, V] private[sql]( ) } + p

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576455796 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/status/ApplicationStateSummary.java: ## @@ -0,0 +1,121 @@ +/* + * Licensed t

Re: [PR] [SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-04-23 Thread via GitHub
sahnib commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1576456757 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateWatermarkSuite.scala: ## @@ -0,0 +1,193 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-04-23 Thread via GitHub
sahnib commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1576456955 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateWatermarkSuite.scala: ## @@ -0,0 +1,193 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-04-23 Thread via GitHub
sahnib commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1576458950 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateWatermarkSuite.scala: ## @@ -0,0 +1,193 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576460766 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/status/ApplicationStateSummary.java: ## @@ -0,0 +1,121 @@ +/* + * Licensed t

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576467571 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/status/ApplicationStateSummary.java: ## @@ -0,0 +1,121 @@ +/* + * Licensed t

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576466018 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/status/ApplicationStateSummary.java: ## @@ -0,0 +1,121 @@ +/* + * Licensed t

Re: [PR] [SPARK-47954][K8S] Support creating ingress entry for external UI access [spark]

2024-04-23 Thread via GitHub
mridulm commented on PR #46184: URL: https://github.com/apache/spark/pull/46184#issuecomment-2072723680 +CC @zhouyejoe -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576470133 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/status/ApplicationStateSummary.java: ## @@ -0,0 +1,121 @@ +/* + * Licensed t

Re: [PR] [SPARK-47941] [SS] [Connect] Propagate ForeachBatch worker initialization errors to users for PySpark [spark]

2024-04-23 Thread via GitHub
grundprinzip commented on code in PR #46125: URL: https://github.com/apache/spark/pull/46125#discussion_r1576471728 ## core/src/main/scala/org/apache/spark/api/python/StreamingPythonRunner.scala: ## @@ -91,6 +91,12 @@ private[spark] class StreamingPythonRunner( new Buffe

Re: [PR] [SPARK-47941] [SS] [Connect] Propagate ForeachBatch worker initialization errors to users for PySpark [spark]

2024-04-23 Thread via GitHub
grundprinzip commented on code in PR #46125: URL: https://github.com/apache/spark/pull/46125#discussion_r1576472509 ## python/pyspark/sql/connect/streaming/worker/foreach_batch_worker.py: ## @@ -62,10 +62,7 @@ def main(infile: IO, outfile: IO) -> None: assert spark_connect_

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576475079 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/status/ApplicationStateSummary.java: ## @@ -0,0 +1,121 @@ +/* + * Licensed t

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576479411 ## spark-operator-api/src/main/java/org/apache/spark/kubernetes/operator/status/BaseStatus.java: ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache S

Re: [PR] [SPARK-47941] [SS] [Connect] Propagate ForeachBatch worker initialization errors to users for PySpark [spark]

2024-04-23 Thread via GitHub
grundprinzip commented on code in PR #46125: URL: https://github.com/apache/spark/pull/46125#discussion_r1576480636 ## python/pyspark/sql/connect/streaming/worker/foreach_batch_worker.py: ## @@ -76,16 +73,21 @@ def process(df_id, batch_id): # type: ignore[no-untyped-def]

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576483060 ## spark-operator-api/src/main/resources/printer-columns.sh: ## @@ -0,0 +1,27 @@ +#!/bin/bash + +# +# Licensed to the Apache Software Foundation (ASF

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576485029 ## spark-operator-api/src/main/resources/printer-columns.sh: ## @@ -0,0 +1,27 @@ +#!/bin/bash + +# +# Licensed to the Apache Software Foundation (ASF

Re: [PR] [SPARK-47950] Add Java API Module for Spark Operator [spark-kubernetes-operator]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on code in PR #8: URL: https://github.com/apache/spark-kubernetes-operator/pull/8#discussion_r1576486023 ## spark-operator-docs/spark_application.md: ## @@ -0,0 +1,221 @@ +## Spark Application API Review Comment: Please remove all `spark-operator-doc

Re: [PR] [SPARK-47956][SQL] Sanity check for unresolved LCA reference [spark]

2024-04-23 Thread via GitHub
dongjoon-hyun closed pull request #46185: [SPARK-47956][SQL] Sanity check for unresolved LCA reference URL: https://github.com/apache/spark/pull/46185 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47956][SQL] Sanity check for unresolved LCA reference [spark]

2024-04-23 Thread via GitHub
dongjoon-hyun commented on PR #46185: URL: https://github.com/apache/spark/pull/46185#issuecomment-2072773098 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47941] [SS] [Connect] Propagate ForeachBatch worker initialization errors to users for PySpark [spark]

2024-04-23 Thread via GitHub
ericm-db commented on code in PR #46125: URL: https://github.com/apache/spark/pull/46125#discussion_r1576514370 ## core/src/main/scala/org/apache/spark/api/python/StreamingPythonRunner.scala: ## @@ -91,6 +91,12 @@ private[spark] class StreamingPythonRunner( new BufferedI

[PR] [SPARK-47958][TESTS] Change LocalSchedulerBackend to notify scheduler of executor on start [spark]

2024-04-23 Thread via GitHub
davintjong-db opened a new pull request, #46187: URL: https://github.com/apache/spark/pull/46187 ### What changes were proposed in this pull request? Changing to call `reviveOffers` on start (after the local executor is set up) so that the task scheduler knows about it. This m

[PR] [SPARK-47050][SQL] Collect and publish partition level metrics for V1 [spark]

2024-04-23 Thread via GitHub
snmvaughan opened a new pull request, #46188: URL: https://github.com/apache/spark/pull/46188 We currently capture metrics which include the number of files, bytes and rows for a task along with the updated partitions. This change captures metrics for each updated partition, reporting t

Re: [PR] [SPARK-47050][SQL] Collect and publish partition level metrics for V1 [spark]

2024-04-23 Thread via GitHub
snmvaughan commented on PR #46188: URL: https://github.com/apache/spark/pull/46188#issuecomment-2072876115 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47864][FOLLOWUP][PYTHON][DOCS] Fix minor typo: "MLLib" -> "MLlib" [spark]

2024-04-23 Thread via GitHub
xinrong-meng commented on PR #46174: URL: https://github.com/apache/spark/pull/46174#issuecomment-2072955519 LGTM, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] [SPARK-47864][FOLLOWUP][PYTHON][DOCS] Fix minor typo: "MLLib" -> "MLlib" [spark]

2024-04-23 Thread via GitHub
xinrong-meng closed pull request #46174: [SPARK-47864][FOLLOWUP][PYTHON][DOCS] Fix minor typo: "MLLib" -> "MLlib" URL: https://github.com/apache/spark/pull/46174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-47864][FOLLOWUP][PYTHON][DOCS] Fix minor typo: "MLLib" -> "MLlib" [spark]

2024-04-23 Thread via GitHub
xinrong-meng commented on PR #46174: URL: https://github.com/apache/spark/pull/46174#issuecomment-2072958632 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47960][SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-04-23 Thread via GitHub
anishshri-db commented on PR #45376: URL: https://github.com/apache/spark/pull/45376#issuecomment-2072985699 @sahnib - seems like there are still conflicts on the base branch ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-47941] [SS] [Connect] Propagate ForeachBatch worker initialization errors to users for PySpark [spark]

2024-04-23 Thread via GitHub
rangadi commented on code in PR #46125: URL: https://github.com/apache/spark/pull/46125#discussion_r1576629583 ## core/src/main/scala/org/apache/spark/api/python/StreamingPythonRunner.scala: ## @@ -91,11 +91,27 @@ private[spark] class StreamingPythonRunner( new BufferedI

Re: [PR] [SPARK-47941] [SS] [Connect] Propagate ForeachBatch worker initialization errors to users for PySpark [spark]

2024-04-23 Thread via GitHub
ericm-db commented on code in PR #46125: URL: https://github.com/apache/spark/pull/46125#discussion_r1576656803 ## core/src/main/scala/org/apache/spark/api/python/StreamingPythonRunner.scala: ## @@ -91,11 +91,27 @@ private[spark] class StreamingPythonRunner( new Buffered

Re: [PR] [SPARK-47604][CORE] Resource managers: Migrate logInfo with variables to structured logging framework [spark]

2024-04-23 Thread via GitHub
gengliangwang closed pull request #46130: [SPARK-47604][CORE] Resource managers: Migrate logInfo with variables to structured logging framework URL: https://github.com/apache/spark/pull/46130 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-47604][CORE] Resource managers: Migrate logInfo with variables to structured logging framework [spark]

2024-04-23 Thread via GitHub
gengliangwang commented on PR #46130: URL: https://github.com/apache/spark/pull/46130#issuecomment-2073052204 Thanks, merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-47960][SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-04-23 Thread via GitHub
anishshri-db commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1576686293 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -1057,6 +1063,14 @@ }, "sqlState" : "4274K" }, + "EMITTING_ROWS_OLDER_THAN_WATE

Re: [PR] [SPARK-47960][SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-04-23 Thread via GitHub
anishshri-db commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1576692106 ## sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala: ## @@ -762,6 +890,7 @@ class KeyValueGroupedDataset[K, V] private[sql]( timeMo

Re: [PR] [SPARK-47960][SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-04-23 Thread via GitHub
anishshri-db commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1576693458 ## sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala: ## @@ -739,6 +741,128 @@ class KeyValueGroupedDataset[K, V] private[sql]( ) }

Re: [PR] [SPARK-47960][SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-04-23 Thread via GitHub
anishshri-db commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1576691015 ## sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala: ## @@ -762,6 +890,7 @@ class KeyValueGroupedDataset[K, V] private[sql]( timeMo

Re: [PR] [SPARK-47960][SS] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-04-23 Thread via GitHub
anishshri-db commented on code in PR #45376: URL: https://github.com/apache/spark/pull/45376#discussion_r1576698149 ## sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala: ## @@ -739,6 +741,128 @@ class KeyValueGroupedDataset[K, V] private[sql]( ) }

  1   2   >