Re: [PR] [SPARK-48513][SS] Add error class for state schema compatibility and minor refactoring [spark]

2024-06-04 Thread via GitHub
HeartSaVioR commented on code in PR #46856: URL: https://github.com/apache/spark/pull/46856#discussion_r1627034975 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala: ## @@ -44,37 +42,37 @@ class StateSchemaCompatibili

Re: [PR] [SPARK-48535][SS] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled [spark]

2024-06-04 Thread via GitHub
anishshri-db commented on PR #46875: URL: https://github.com/apache/spark/pull/46875#issuecomment-2148959141 cc - @HeartSaVioR - PTAL, thx ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] [SPARK-48535] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled [spark]

2024-06-04 Thread via GitHub
anishshri-db opened a new pull request, #46875: URL: https://github.com/apache/spark/pull/46875 ### What changes were proposed in this pull request? Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled ###

Re: [PR] [SPARK-47552][CORE][FOLLOWUP] Set spark.hadoop.fs.s3a.connection.establish.timeout to numeric [spark]

2024-06-04 Thread via GitHub
yaooqinn commented on PR #46874: URL: https://github.com/apache/spark/pull/46874#issuecomment-2148935108 It's irrelevant to your PR. I mean #45710 brought this behavioral change. If a user defines `s3a.connection.establish.timeout=100s` in `hdfs-site.xml`, then he/she actually gets 30s. -

Re: [PR] [SPARK-47552][CORE][FOLLOWUP] Set spark.hadoop.fs.s3a.connection.establish.timeout to numeric [spark]

2024-06-04 Thread via GitHub
cloud-fan commented on PR #46874: URL: https://github.com/apache/spark/pull/46874#issuecomment-2148913031 @yaooqinn It still uses `setIfMissing`, or did I miss something? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC` [spark]

2024-06-04 Thread via GitHub
LuciferYang commented on PR #46783: URL: https://github.com/apache/spark/pull/46783#issuecomment-2148909524 Thanks @yaooqinn , I reopened SPARK-48505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47552][CORE][FOLLOWUP] Set spark.hadoop.fs.s3a.connection.establish.timeout to numeric [spark]

2024-06-04 Thread via GitHub
yaooqinn commented on PR #46874: URL: https://github.com/apache/spark/pull/46874#issuecomment-2148895576 It looks more reasonable to use `setIfMissing` with `hadoop.Configuration`, #45710 seems to force overriding custom value of `s3a.connection.establish.timeout` if it's missing in spark c

Re: [PR] [SPARK-47552][CORE][FOLLOWUP] Set spark.hadoop.fs.s3a.connection.establish.timeout to numeric [spark]

2024-06-04 Thread via GitHub
cloud-fan commented on PR #46874: URL: https://github.com/apache/spark/pull/46874#issuecomment-2148880836 cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC` [spark]

2024-06-04 Thread via GitHub
yaooqinn commented on PR #46783: URL: https://github.com/apache/spark/pull/46783#issuecomment-2148880697 Thank you @rednaxelafx for the detailed inputs. I reverted this with https://github.com/apache/spark/commit/db527ac346f2f6f6dbddefe292a24848d1120172, since it's controversial

[PR] [SPARK-47552][CORE][FOLLOWUP] Set spark.hadoop.fs.s3a.connection.establish.timeout to numeric [spark]

2024-06-04 Thread via GitHub
cloud-fan opened a new pull request, #46874: URL: https://github.com/apache/spark/pull/46874 ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/45710 . Some custom `FileSystem` implementations read the `hadoop.fs.s3

Re: [PR] [SPARK-48533][CONNECT][PYTHON][TESTS] Add test for cached schema [spark]

2024-06-04 Thread via GitHub
HyukjinKwon commented on PR #46871: URL: https://github.com/apache/spark/pull/46871#issuecomment-2148868899 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48533][CONNECT][PYTHON][TESTS] Add test for cached schema [spark]

2024-06-04 Thread via GitHub
HyukjinKwon closed pull request #46871: [SPARK-48533][CONNECT][PYTHON][TESTS] Add test for cached schema URL: https://github.com/apache/spark/pull/46871 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48498][SQL] Always do char padding in predicates [spark]

2024-06-04 Thread via GitHub
cloud-fan commented on code in PR #46832: URL: https://github.com/apache/spark/pull/46832#discussion_r1626436377 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4603,6 +4603,14 @@ object SQLConf { .booleanConf .createWithDefault(true)

Re: [PR] [SPARK-48505][CORE][FOLLOWUP] Further refactor `Utils#isG1GC` [spark]

2024-06-04 Thread via GitHub
LuciferYang commented on code in PR #46873: URL: https://github.com/apache/spark/pull/46873#discussion_r1626886340 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -3058,8 +3059,13 @@ private[spark] object Utils */ lazy val isG1GC: Boolean = { Try { -

Re: [PR] [SPARK-48505][CORE][FOLLOWUP] Further refactor `Utils#isG1GC` [spark]

2024-06-04 Thread via GitHub
LuciferYang commented on code in PR #46873: URL: https://github.com/apache/spark/pull/46873#discussion_r1626886340 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -3058,8 +3059,13 @@ private[spark] object Utils */ lazy val isG1GC: Boolean = { Try { -

Re: [PR] [SPARK-48505][FOLLOWUP] Further refactor [spark]

2024-06-04 Thread via GitHub
LuciferYang commented on code in PR #46873: URL: https://github.com/apache/spark/pull/46873#discussion_r1626873843 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -3058,8 +3059,13 @@ private[spark] object Utils */ lazy val isG1GC: Boolean = { Try { -

Re: [PR] [SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC` [spark]

2024-06-04 Thread via GitHub
LuciferYang commented on PR #46783: URL: https://github.com/apache/spark/pull/46783#issuecomment-2148790218 > It was using reflection all the way, but it's also possible to only do an initial reflection probe and then conditionally do direct use of HotSpotDiagnosticMXBean so that it "looks"

[PR] [SPARK-48505][FOLLOWUP] Further refactor [spark]

2024-06-04 Thread via GitHub
LuciferYang opened a new pull request, #46873: URL: https://github.com/apache/spark/pull/46873 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48374][PYTHON][TESTS][FOLLOW-UP] Explicitly enable ANSI mode for non-ANSI build [spark]

2024-06-04 Thread via GitHub
HyukjinKwon closed pull request #46872: [SPARK-48374][PYTHON][TESTS][FOLLOW-UP] Explicitly enable ANSI mode for non-ANSI build URL: https://github.com/apache/spark/pull/46872 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-48374][PYTHON][TESTS][FOLLOW-UP] Explicitly enable ANSI mode for non-ANSI build [spark]

2024-06-04 Thread via GitHub
HyukjinKwon commented on PR #46872: URL: https://github.com/apache/spark/pull/46872#issuecomment-2148774084 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-44473][SQL] Overwriting the same partition of a partitioned table multiple times with empty data yields non-idempotent results [spark]

2024-06-04 Thread via GitHub
ychris78 commented on PR #46699: URL: https://github.com/apache/spark/pull/46699#issuecomment-2148764547 ![image](https://github.com/apache/spark/assets/35604105/c0f729b6-2e2e-415b-98d2-96340971ca84) After this pull request is merged, it will be consistent with the hive behavior -- Th

Re: [PR] [SPARK-48510][2/2] Support UDAF `toColumn` API in Spark Connect [spark]

2024-06-04 Thread via GitHub
hvanhovell commented on code in PR #46849: URL: https://github.com/apache/spark/pull/46849#discussion_r1626854039 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -48,6 +48,7 @@ message Expression { CommonInlineUserDefinedFunction common_

Re: [PR] [SPARK-48510][2/2] Support UDAF `toColumn` API in Spark Connect [spark]

2024-06-04 Thread via GitHub
hvanhovell commented on code in PR #46849: URL: https://github.com/apache/spark/pull/46849#discussion_r1626853395 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/Aggregator.scala: ## @@ -52,7 +56,7 @@ import org.apache.spark.sql.{Encoder, TypedCol

Re: [PR] [SPARK-48510][2/2] Support UDAF `toColumn` API in Spark Connect [spark]

2024-06-04 Thread via GitHub
hvanhovell commented on code in PR #46849: URL: https://github.com/apache/spark/pull/46849#discussion_r1626853395 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/Aggregator.scala: ## @@ -52,7 +56,7 @@ import org.apache.spark.sql.{Encoder, TypedCol

Re: [PR] [SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC` [spark]

2024-06-04 Thread via GitHub
LuciferYang commented on PR #46783: URL: https://github.com/apache/spark/pull/46783#issuecomment-2148757330 > Thanks for the ping early on. I was holding back on making a comment to avoid derailing the thread, but thought I'd share some thoughts still. > > I'm neutral to this change.

[PR] [SPARK-48374][PYTHON][TESTS][FOLLOW-UP] Explicitly enable ANSI mode for non-ANSI build [spark]

2024-06-04 Thread via GitHub
HyukjinKwon opened a new pull request, #46872: URL: https://github.com/apache/spark/pull/46872 ### What changes were proposed in this pull request? This PR proposes to explicitly set ANSI mode in `test_toArrow_error` test. ### Why are the changes needed? To make non-ANSI

[PR] [SPARK-48533][CONNECT][PYTHON][TESTS] Add test for cached schema [spark]

2024-06-04 Thread via GitHub
zhengruifeng opened a new pull request, #46871: URL: https://github.com/apache/spark/pull/46871 ### What changes were proposed in this pull request? Add test for cached schema, to make Spark Classic's mapInXXX also works within `SparkConnectSQLTestCase`, also add a new `contextmanager` fo

[PR] [SPARK-48532][BUILD] Upgrade maven plugin to latest version [spark]

2024-06-04 Thread via GitHub
panbingkun opened a new pull request, #46870: URL: https://github.com/apache/spark/pull/46870 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-48498][SQL] Always do char padding in predicates [spark]

2024-06-04 Thread via GitHub
beliefer commented on code in PR #46832: URL: https://github.com/apache/spark/pull/46832#discussion_r1626803012 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4603,6 +4603,14 @@ object SQLConf { .booleanConf .createWithDefault(true)

Re: [PR] [SPARK-48411][SS][PYTHON] Add E2E test for DropDuplicateWithinWatermark [spark]

2024-06-04 Thread via GitHub
anishshri-db commented on code in PR #46740: URL: https://github.com/apache/spark/pull/46740#discussion_r1626795999 ## python/pyspark/sql/tests/streaming/test_streaming.py: ## @@ -392,6 +392,30 @@ def test_streaming_with_temporary_view(self): set([Row(value="vie

Re: [PR] [MINOR][DOCS] Fix a typo in core-migration-guide.md [spark]

2024-06-04 Thread via GitHub
HyukjinKwon commented on PR #46864: URL: https://github.com/apache/spark/pull/46864#issuecomment-2148604941 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [MINOR][DOCS] Fix a typo in core-migration-guide.md [spark]

2024-06-04 Thread via GitHub
HyukjinKwon closed pull request #46864: [MINOR][DOCS] Fix a typo in core-migration-guide.md URL: https://github.com/apache/spark/pull/46864 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-48523][DOCS] Add `grpc_max_message_size ` description to `client-connection-string.md` [spark]

2024-06-04 Thread via GitHub
HyukjinKwon closed pull request #46862: [SPARK-48523][DOCS] Add `grpc_max_message_size ` description to `client-connection-string.md` URL: https://github.com/apache/spark/pull/46862 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48523][DOCS] Add `grpc_max_message_size ` description to `client-connection-string.md` [spark]

2024-06-04 Thread via GitHub
HyukjinKwon commented on PR #46862: URL: https://github.com/apache/spark/pull/46862#issuecomment-2148594827 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48485][CONNECT][SS] Support interruptTag and interruptAll in streaming queries [spark]

2024-06-04 Thread via GitHub
HyukjinKwon closed pull request #46819: [SPARK-48485][CONNECT][SS] Support interruptTag and interruptAll in streaming queries URL: https://github.com/apache/spark/pull/46819 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48485][CONNECT][SS] Support interruptTag and interruptAll in streaming queries [spark]

2024-06-04 Thread via GitHub
HyukjinKwon commented on PR #46819: URL: https://github.com/apache/spark/pull/46819#issuecomment-2148593487 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48495][SQL][DOCS] Describe shredding scheme for Variant [spark]

2024-06-04 Thread via GitHub
shaeqahmed commented on PR #46831: URL: https://github.com/apache/spark/pull/46831#issuecomment-2148582542 I read through the proposal and some thoughts: It would be really useful to add to this PR a list of ways nested structs (struct-of-structs) and array-of-structs can be represent

Re: [PR] [SPARK-42944][FOLLOWUP][SS][CONNECT] Reenable ApplyInPandasWithState tests [spark]

2024-06-04 Thread via GitHub
WweiL commented on PR #46853: URL: https://github.com/apache/spark/pull/46853#issuecomment-2148581632 @HyukjinKwon This is ready to be merged : ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48466][SQL] Create dedicated node for EmptyRelation in AQE [spark]

2024-06-04 Thread via GitHub
cloud-fan commented on code in PR #46830: URL: https://github.com/apache/spark/pull/46830#discussion_r1626727389 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEPropagateEmptyRelation.scala: ## @@ -34,11 +35,14 @@ import org.apache.spark.sql.execution.join

Re: [PR] [SPARK-48466][SQL] Create dedicated node for EmptyRelation in AQE [spark]

2024-06-04 Thread via GitHub
cloud-fan commented on code in PR #46830: URL: https://github.com/apache/spark/pull/46830#discussion_r1626726253 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala: ## @@ -58,6 +58,7 @@ private[execution] object SparkPlanInfo { case a: AdaptiveS

Re: [PR] [SPARK-48307][SQL][FOLLOWUP] Allow outer references in un-referenced CTE relations [spark]

2024-06-04 Thread via GitHub
cloud-fan commented on PR #46869: URL: https://github.com/apache/spark/pull/46869#issuecomment-2148529962 cc @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[PR] [SPARK-48307][SQL][FOLLOWUP] Allow outer references in un-referenced CTE relations [spark]

2024-06-04 Thread via GitHub
cloud-fan opened a new pull request, #46869: URL: https://github.com/apache/spark/pull/46869 ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/46617 . Subquery expression has a bunch of correlation checks which nee

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-04 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1626659308 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,25 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-04 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1626660746 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,25 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-04 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1626660746 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,25 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-04 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1626660746 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,25 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-04 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1626659308 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,25 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC` [spark]

2024-06-04 Thread via GitHub
rednaxelafx commented on PR #46783: URL: https://github.com/apache/spark/pull/46783#issuecomment-2148498853 Thanks for the ping early on. I was holding back on making a comment to avoid derailing the thread, but thought I'd share some thoughts still. I'm neutral to this change. It's a

Re: [PR] [SPARK-48307][SQL] InlineCTE should keep not-inlined relations in the original WithCTE node [spark]

2024-06-04 Thread via GitHub
cloud-fan closed pull request #46617: [SPARK-48307][SQL] InlineCTE should keep not-inlined relations in the original WithCTE node URL: https://github.com/apache/spark/pull/46617 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-48307][SQL] InlineCTE should keep not-inlined relations in the original WithCTE node [spark]

2024-06-04 Thread via GitHub
cloud-fan commented on PR #46617: URL: https://github.com/apache/spark/pull/46617#issuecomment-2148488195 thanks for the review, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-04 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1626623917 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,25 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [SPARK-42944][FOLLOWUP][3.5][SS][CONNECT] Reenable ApplyInPandasWithState tests [spark]

2024-06-04 Thread via GitHub
WweiL commented on PR #46855: URL: https://github.com/apache/spark/pull/46855#issuecomment-2148362457 passed on master branch: https://github.com/apache/spark/pull/46853 will cherry-pick to this branch after it's merged -- This is an automated message from the Apache Git Service. To res

Re: [PR] [DRAFT][SQL] Draft string comparison for UTF8_BINARY_UCASE & UTF8_BINARY_TCASE [spark]

2024-06-04 Thread via GitHub
uros-db closed pull request #46719: [DRAFT][SQL] Draft string comparison for UTF8_BINARY_UCASE & UTF8_BINARY_TCASE URL: https://github.com/apache/spark/pull/46719 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-48528] Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` version [spark-kubernetes-operator]

2024-06-04 Thread via GitHub
dongjoon-hyun closed pull request #14: [SPARK-48528] Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` version URL: https://github.com/apache/spark-kubernetes-operator/pull/14 -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] [SPARK-48528] Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` version [spark-kubernetes-operator]

2024-06-04 Thread via GitHub
dongjoon-hyun commented on PR #14: URL: https://github.com/apache/spark-kubernetes-operator/pull/14#issuecomment-2148210045 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SS][SPARK-48511] Remove TimeMode None from TransformWithState. [spark]

2024-06-04 Thread via GitHub
sahnib commented on PR #46825: URL: https://github.com/apache/spark/pull/46825#issuecomment-2148185451 @HeartSaVioR PTAL, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [SPARK-48528] Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` version [spark-kubernetes-operator]

2024-06-04 Thread via GitHub
viirya commented on PR #14: URL: https://github.com/apache/spark-kubernetes-operator/pull/14#issuecomment-2148164492 Yea, we forgot to update it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [WIP][SQL] Collation support for Char/Varchar [spark]

2024-06-04 Thread via GitHub
uros-db opened a new pull request, #46868: URL: https://github.com/apache/spark/pull/46868 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

Re: [PR] [SPARK-48498][SQL] Always do char padding in predicates [spark]

2024-06-04 Thread via GitHub
cloud-fan commented on code in PR #46832: URL: https://github.com/apache/spark/pull/46832#discussion_r1626436377 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4603,6 +4603,14 @@ object SQLConf { .booleanConf .createWithDefault(true)

Re: [PR] [SPARK-47972][SQL][FOLLOWUP] Restrict CAST expression for collations [spark]

2024-06-04 Thread via GitHub
cloud-fan closed pull request #46860: [SPARK-47972][SQL][FOLLOWUP] Restrict CAST expression for collations URL: https://github.com/apache/spark/pull/46860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-47972][SQL][FOLLOWUP] Restrict CAST expression for collations [spark]

2024-06-04 Thread via GitHub
cloud-fan commented on PR #46860: URL: https://github.com/apache/spark/pull/46860#issuecomment-2148114420 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-48513][SS] Add error class for state schema compatibility and minor refactoring [spark]

2024-06-04 Thread via GitHub
anishshri-db commented on code in PR #46856: URL: https://github.com/apache/spark/pull/46856#discussion_r1626401368 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityCheckerSuite.scala: ## @@ -313,22 +314,13 @@ class StateSchemaCom

Re: [PR] [SPARK-48485][CONNECT][SS] Support interruptTag and interruptAll in streaming queries [spark]

2024-06-04 Thread via GitHub
WweiL commented on code in PR #46819: URL: https://github.com/apache/spark/pull/46819#discussion_r1626398997 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -179,12 +179,13 @@ case class SessionHolder(userId: String, ses

Re: [PR] [SPARK-48485][CONNECT][SS] Support interruptTag and interruptAll in streaming queries [spark]

2024-06-04 Thread via GitHub
WweiL commented on code in PR #46819: URL: https://github.com/apache/spark/pull/46819#discussion_r1626395460 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -179,12 +179,13 @@ case class SessionHolder(userId: String, ses

Re: [PR] [SPARK-48485][CONNECT][SS] Support interruptTag and interruptAll in streaming queries [spark]

2024-06-04 Thread via GitHub
WweiL commented on code in PR #46819: URL: https://github.com/apache/spark/pull/46819#discussion_r1626395460 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -179,12 +179,13 @@ case class SessionHolder(userId: String, ses

Re: [PR] [SPARK-48513][SS] Add error class for state schema compatibility and minor refactoring [spark]

2024-06-04 Thread via GitHub
anishshri-db commented on code in PR #46856: URL: https://github.com/apache/spark/pull/46856#discussion_r1626385375 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala: ## @@ -782,11 +782,12 @@ class StreamingAggregationSuite extends StateS

Re: [PR] [SPARK-48531][INFRA] Fix `Black` target version to Python 3.9 [spark]

2024-06-04 Thread via GitHub
dongjoon-hyun commented on PR #46867: URL: https://github.com/apache/spark/pull/46867#issuecomment-2148056415 Python linter passed. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48531][INFRA] Fix `Black` target version to Python 3.9 [spark]

2024-06-04 Thread via GitHub
dongjoon-hyun closed pull request #46867: [SPARK-48531][INFRA] Fix `Black` target version to Python 3.9 URL: https://github.com/apache/spark/pull/46867 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-48531][INFRA] Fix `Black` target version to Python 3.9 [spark]

2024-06-04 Thread via GitHub
dongjoon-hyun commented on PR #46867: URL: https://github.com/apache/spark/pull/46867#issuecomment-2148011726 Thank you, @huaxingao ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48531][INFRA] Fix `Black` target version to Python 3.9 [spark]

2024-06-04 Thread via GitHub
dongjoon-hyun commented on PR #46867: URL: https://github.com/apache/spark/pull/46867#issuecomment-2148007811 Could you review this PR, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[PR] [SPARK-48531][INFRA] Fix `Black` target version to Python 3.9 [spark]

2024-06-04 Thread via GitHub
dongjoon-hyun opened a new pull request, #46867: URL: https://github.com/apache/spark/pull/46867 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-48528] Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` version [spark-kubernetes-operator]

2024-06-04 Thread via GitHub
dongjoon-hyun commented on PR #14: URL: https://github.com/apache/spark-kubernetes-operator/pull/14#issuecomment-2147981557 Could you review this PR too, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-48528] Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` version [spark-kubernetes-operator]

2024-06-04 Thread via GitHub
dongjoon-hyun commented on code in PR #14: URL: https://github.com/apache/spark-kubernetes-operator/pull/14#discussion_r1626325003 ## dev/merge_spark_pr.py: ## @@ -305,7 +305,9 @@ def resolve_jira_issue(merge_branches, comment, default_jira_id=""): versions = [ x

[PR] [SPARK-48528] Refine K8s Operator `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` version [spark-kubernetes-operator]

2024-06-04 Thread via GitHub
dongjoon-hyun opened a new pull request, #14: URL: https://github.com/apache/spark-kubernetes-operator/pull/14 ### What changes were proposed in this pull request? This PR aims to refine `merge_spark_pr.py` to use `kubernetes-operator-x.y.z` version like the following. ``` E

Re: [PR] [SPARK-42944][FOLLOWUP][SS][CONNECT] Reenable ApplyInPandasWithState tests [spark]

2024-06-04 Thread via GitHub
WweiL commented on PR #46853: URL: https://github.com/apache/spark/pull/46853#issuecomment-2147955289 on local it passed... let me check -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-48326] Use the official Apache Spark 4.0.0-preview1 [spark-kubernetes-operator]

2024-06-04 Thread via GitHub
dongjoon-hyun closed pull request #13: [SPARK-48326] Use the official Apache Spark 4.0.0-preview1 URL: https://github.com/apache/spark-kubernetes-operator/pull/13 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-48326] Use the official Apache Spark 4.0.0-preview1 [spark-kubernetes-operator]

2024-06-04 Thread via GitHub
dongjoon-hyun commented on PR #13: URL: https://github.com/apache/spark-kubernetes-operator/pull/13#issuecomment-2147938015 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-48326] Use the official Apache Spark 4.0.0-preview1 [spark-kubernetes-operator]

2024-06-04 Thread via GitHub
viirya commented on PR #13: URL: https://github.com/apache/spark-kubernetes-operator/pull/13#issuecomment-2147937925 Looks good to me. 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL [spark]

2024-06-04 Thread via GitHub
szehon-ho commented on PR #46707: URL: https://github.com/apache/spark/pull/46707#issuecomment-2147931004 @beliefer i didnt make the initial JIRA title :) , let me know if something makes more sense. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-48326] Use the official Apache Spark 4.0.0-preview1 [spark-kubernetes-operator]

2024-06-04 Thread via GitHub
dongjoon-hyun commented on PR #13: URL: https://github.com/apache/spark-kubernetes-operator/pull/13#issuecomment-2147930156 Could you review this PR, @viirya and @jiangzho ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] [SPARK-48326] Use the official Apache Spark 4.0.0-preview1 [spark-kubernetes-operator]

2024-06-04 Thread via GitHub
dongjoon-hyun opened a new pull request, #13: URL: https://github.com/apache/spark-kubernetes-operator/pull/13 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[PR] [SPARK-48524] Make Not IsNull semantically equal to IsNotNull and vice versa [spark]

2024-06-04 Thread via GitHub
tom-s-powell opened a new pull request, #46865: URL: https://github.com/apache/spark/pull/46865 ### What changes were proposed in this pull request? `Not(IsNull(x))` is made semantically equivalent to `IsNotNull(x)` and vice versa. ### Why are the changes needed? Th

Re: [PR] [SPARK-48360] [WIP] Simplify conditionals with predicate branches [spark]

2024-06-04 Thread via GitHub
tom-s-powell closed pull request #46671: [SPARK-48360] [WIP] Simplify conditionals with predicate branches URL: https://github.com/apache/spark/pull/46671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-48513][SS] Add error class for state schema compatibility and minor refactoring [spark]

2024-06-04 Thread via GitHub
sahnib commented on code in PR #46856: URL: https://github.com/apache/spark/pull/46856#discussion_r1626183498 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -3730,6 +3730,12 @@ ], "sqlState" : "42K06" }, + "STATE_STORE_KEY_SCHEMA_NOT_COMPATIB

Re: [PR] [SPARK-42252][CORE] Add `spark.shuffle.localDisk.file.output.buffer` and deprecate `spark.shuffle.unsafe.file.output.buffer` [spark]

2024-06-04 Thread via GitHub
wayneguow commented on code in PR #39819: URL: https://github.com/apache/spark/pull/39819#discussion_r1626097857 ## docs/core-migration-guide.md: ## @@ -48,7 +48,9 @@ license: | - Since Spark 4.0, the MDC (Mapped Diagnostic Context) key for Spark task names in Spark logs has

[PR] [MINOR][DOCS] Fix a typo in core-migration-guide.md [spark]

2024-06-04 Thread via GitHub
wayneguow opened a new pull request, #46864: URL: https://github.com/apache/spark/pull/46864 ### What changes were proposed in this pull request? Fix a typo in core-migration-guide.md: - agressively -> aggressively ### Why are the changes needed? Fix mista

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-04 Thread via GitHub
dbatomic commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1626078580 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,21 @@ case class Mode( override def inputTypes: Seq

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-04 Thread via GitHub
dbatomic commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1626069626 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,21 @@ case class Mode( override def inputTypes: Seq

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-04 Thread via GitHub
dbatomic commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1626058908 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,25 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [SPARK-48518][CORE] Make LZF compression be able to run in parallel [spark]

2024-06-04 Thread via GitHub
mridulm commented on code in PR #46858: URL: https://github.com/apache/spark/pull/46858#discussion_r1626058226 ## core/src/main/scala/org/apache/spark/io/CompressionCodec.scala: ## @@ -170,9 +171,14 @@ class LZ4CompressionCodec(conf: SparkConf) extends CompressionCodec { */

Re: [PR] [SPARK-42252][CORE] Add `spark.shuffle.localDisk.file.output.buffer` and deprecate `spark.shuffle.unsafe.file.output.buffer` [spark]

2024-06-04 Thread via GitHub
LuciferYang commented on code in PR #39819: URL: https://github.com/apache/spark/pull/39819#discussion_r1626045862 ## docs/core-migration-guide.md: ## @@ -48,7 +48,9 @@ license: | - Since Spark 4.0, the MDC (Mapped Diagnostic Context) key for Spark task names in Spark logs h

Re: [PR] [SPARK-42252][CORE] Add `spark.shuffle.localDisk.file.output.buffer` and deprecate `spark.shuffle.unsafe.file.output.buffer` [spark]

2024-06-04 Thread via GitHub
LuciferYang commented on PR #39819: URL: https://github.com/apache/spark/pull/39819#issuecomment-2147572796 friendly ping @yaooqinn Do you have time to help take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-04 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1626025583 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,21 @@ case class Mode( override def inputTypes: Seq[

Re: [PR] [SPARK-48522][BUILD] Update Stream Library to 2.9.8 and attach its NOTICE [spark]

2024-06-04 Thread via GitHub
LuciferYang commented on PR #46861: URL: https://github.com/apache/spark/pull/46861#issuecomment-2147551130 Merged into master for Spark 4.0. Thanks @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-48522][BUILD] Update Stream Library to 2.9.8 and attach its NOTICE [spark]

2024-06-04 Thread via GitHub
LuciferYang closed pull request #46861: [SPARK-48522][BUILD] Update Stream Library to 2.9.8 and attach its NOTICE URL: https://github.com/apache/spark/pull/46861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-47353][SQL] Enable collation support for the Mode expression [spark]

2024-06-04 Thread via GitHub
dbatomic commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1626015888 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,21 @@ case class Mode( override def inputTypes: Seq

Re: [PR] [SPARK-48280][SQL] Add Expression Walker for Testing [spark]

2024-06-04 Thread via GitHub
dbatomic commented on code in PR #46801: URL: https://github.com/apache/spark/pull/46801#discussion_r1626005024 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -948,6 +952,210 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlan

Re: [PR] [SPARK-48280][SQL] Add Expression Walker for Testing [spark]

2024-06-04 Thread via GitHub
dbatomic commented on code in PR #46801: URL: https://github.com/apache/spark/pull/46801#discussion_r1626000759 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -948,6 +952,210 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlan

Re: [PR] [SPARK-48280][SQL] Add Expression Walker for Testing [spark]

2024-06-04 Thread via GitHub
dbatomic commented on code in PR #46801: URL: https://github.com/apache/spark/pull/46801#discussion_r1625996529 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -948,6 +952,210 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPlan

Re: [PR] [SPARK-48280][SQL] Add Expression Walker for Testing [spark]

2024-06-04 Thread via GitHub
stefankandic commented on code in PR #46801: URL: https://github.com/apache/spark/pull/46801#discussion_r1625986861 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -948,6 +952,210 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSpark

  1   2   >