Re: [PR] [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker [spark]

2024-04-11 Thread via GitHub
grundprinzip commented on code in PR #46002: URL: https://github.com/apache/spark/pull/46002#discussion_r1562045410 ## python/pyspark/sql/connect/session.py: ## @@ -1034,6 +1034,20 @@ def profile(self) -> Profile: profile.__doc__ = PySparkSession.profile.__doc__ +

Re: [PR] [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on code in PR #46002: URL: https://github.com/apache/spark/pull/46002#discussion_r1562042451 ## python/pyspark/sql/connect/session.py: ## @@ -1034,6 +1034,20 @@ def profile(self) -> Profile: profile.__doc__ = PySparkSession.profile.__doc__ +

Re: [PR] [SPARK-47814][DSTREAM] Move `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` from `main` to `test` [spark]

2024-04-11 Thread via GitHub
panbingkun commented on PR #46000: URL: https://github.com/apache/spark/pull/46000#issuecomment-2051022992 > `WriteInputFormatTestDataGenerator` is called by `test_readwrite.py`. Even though the CI has passed, I still want to confirm that putting it into test.jar will not have a negative

Re: [PR] [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on code in PR #46002: URL: https://github.com/apache/spark/pull/46002#discussion_r1562041299 ## python/pyspark/sql/connect/session.py: ## @@ -1034,6 +1034,20 @@ def profile(self) -> Profile: profile.__doc__ = PySparkSession.profile.__doc__ +

Re: [PR] [SPARK-47828][CONNECT][PYTHON] `DataFrameWriterV2.overwrite` fails with invalid plan [spark]

2024-04-11 Thread via GitHub
zhengruifeng commented on PR #46023: URL: https://github.com/apache/spark/pull/46023#issuecomment-2051011882 it will need separate PRs for 3.4/3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47603][KUBERNETES][YARN] Resource managers: Migrate logWarn with variables to structured logging framework [spark]

2024-04-11 Thread via GitHub
panbingkun commented on code in PR #45957: URL: https://github.com/apache/spark/pull/45957#discussion_r1562034403 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -105,9 +105,10 @@ trait Logging { val context = new java.util.HashMap[String,

[PR] [SPARK-47828][CONNECT][PYTHON] `DataFrameWriterV2.overwrite` fails with invalid plan [spark]

2024-04-11 Thread via GitHub
zhengruifeng opened a new pull request, #46023: URL: https://github.com/apache/spark/pull/46023 ### What changes were proposed in this pull request? `DataFrameWriterV2.overwrite` fails with invalid plan ### Why are the changes needed? bug fix ### Does this PR

Re: [PR] [SPARK-47828][CONNECT][PYTHON] `DataFrameWriterV2.overwrite` fails with invalid plan [spark]

2024-04-11 Thread via GitHub
zhengruifeng commented on code in PR #46023: URL: https://github.com/apache/spark/pull/46023#discussion_r1562034412 ## python/pyspark/sql/tests/test_readwriter.py: ## @@ -252,6 +252,11 @@ def test_create_without_provider(self): ):

Re: [PR] [SPARK-47318][CORE][3.5] Adds HKDF round to AuthEngine key derivation to follow standard KEX practices [spark]

2024-04-11 Thread via GitHub
mridulm commented on PR #46014: URL: https://github.com/apache/spark/pull/46014#issuecomment-2051007024 My concern with adding `3.4.3` was, that would typically mean it is available in `3.5.x` - but it wont, except for specific versions of `3.5`. Should we document it as such ? -- This

Re: [PR] [SPARK-47603][KUBERNETES][YARN] Resource managers: Migrate logWarn with variables to structured logging framework [spark]

2024-04-11 Thread via GitHub
gengliangwang commented on code in PR #45957: URL: https://github.com/apache/spark/pull/45957#discussion_r1562024837 ## common/utils/src/main/scala/org/apache/spark/internal/Logging.scala: ## @@ -105,9 +105,10 @@ trait Logging { val context = new

Re: [PR] [SPARK-47792][CORE] Make the value of MDC can support `null` & cannot be `MessageWithContext` [spark]

2024-04-11 Thread via GitHub
gengliangwang closed pull request #45975: [SPARK-47792][CORE] Make the value of MDC can support `null` & cannot be `MessageWithContext` URL: https://github.com/apache/spark/pull/45975 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47792][CORE] Make the value of MDC can support `null` & cannot be `MessageWithContext` [spark]

2024-04-11 Thread via GitHub
gengliangwang commented on PR #45975: URL: https://github.com/apache/spark/pull/45975#issuecomment-2050993550 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker [spark]

2024-04-11 Thread via GitHub
grundprinzip commented on code in PR #46002: URL: https://github.com/apache/spark/pull/46002#discussion_r1562015569 ## python/pyspark/sql/connect/session.py: ## @@ -1034,6 +1034,20 @@ def profile(self) -> Profile: profile.__doc__ = PySparkSession.profile.__doc__ +

Re: [PR] [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on code in PR #46002: URL: https://github.com/apache/spark/pull/46002#discussion_r1562014004 ## python/pyspark/cloudpickle/cloudpickle.py: ## @@ -1461,7 +1461,7 @@ def dump(obj, file, protocol=None, buffer_callback=None): Pickler(file,

Re: [PR] [SPARK-47795][K8S][DOCS] Supplement the doc of job schedule for K8S [spark]

2024-04-11 Thread via GitHub
beliefer commented on PR #45982: URL: https://github.com/apache/spark/pull/45982#issuecomment-2050974804 @yaooqinn @dongjoon-hyun @bjornjorgensen Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker [spark]

2024-04-11 Thread via GitHub
grundprinzip commented on code in PR #46002: URL: https://github.com/apache/spark/pull/46002#discussion_r1562012354 ## python/pyspark/cloudpickle/cloudpickle.py: ## @@ -1461,7 +1461,7 @@ def dump(obj, file, protocol=None, buffer_callback=None): Pickler(file,

Re: [PR] [SPARK-47784][SS] Merge TTLMode and TimeoutMode into a single TimeMode. [spark]

2024-04-11 Thread via GitHub
HeartSaVioR commented on PR #45960: URL: https://github.com/apache/spark/pull/45960#issuecomment-2050965926 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on code in PR #46019: URL: https://github.com/apache/spark/pull/46019#discussion_r1562007347 ## core/src/main/scala/org/apache/spark/api/python/WriteInputFormatTestDataGenerator.scala: ## @@ -104,6 +105,7 @@ private[python] class

Re: [PR] [SPARK-47784][SS] Merge TTLMode and TimeoutMode into a single TimeMode. [spark]

2024-04-11 Thread via GitHub
HeartSaVioR commented on PR #45960: URL: https://github.com/apache/spark/pull/45960#issuecomment-2050965820 The CI failure happened in known flakiness - SparkSessionE2ESuite. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-47784][SS] Merge TTLMode and TimeoutMode into a single TimeMode. [spark]

2024-04-11 Thread via GitHub
HeartSaVioR closed pull request #45960: [SPARK-47784][SS] Merge TTLMode and TimeoutMode into a single TimeMode. URL: https://github.com/apache/spark/pull/45960 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on code in PR #46019: URL: https://github.com/apache/spark/pull/46019#discussion_r1562007477 ## core/src/main/scala/org/apache/spark/api/python/WriteInputFormatTestDataGenerator.scala: ## @@ -104,6 +105,7 @@ private[python] class

Re: [PR] [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on code in PR #46002: URL: https://github.com/apache/spark/pull/46002#discussion_r1562004935 ## python/pyspark/sql/connect/session.py: ## @@ -1034,6 +1034,20 @@ def profile(self) -> Profile: profile.__doc__ = PySparkSession.profile.__doc__ +

Re: [PR] [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on code in PR #46002: URL: https://github.com/apache/spark/pull/46002#discussion_r1562004935 ## python/pyspark/sql/connect/session.py: ## @@ -1034,6 +1034,20 @@ def profile(self) -> Profile: profile.__doc__ = PySparkSession.profile.__doc__ +

Re: [PR] [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on code in PR #46002: URL: https://github.com/apache/spark/pull/46002#discussion_r1562005836 ## python/pyspark/cloudpickle/cloudpickle.py: ## @@ -1461,7 +1461,7 @@ def dump(obj, file, protocol=None, buffer_callback=None): Pickler(file,

Re: [PR] [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on code in PR #46002: URL: https://github.com/apache/spark/pull/46002#discussion_r1562005257 ## python/pyspark/sql/connect/session.py: ## @@ -1034,6 +1034,20 @@ def profile(self) -> Profile: profile.__doc__ = PySparkSession.profile.__doc__ +

Re: [PR] [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker [spark]

2024-04-11 Thread via GitHub
grundprinzip commented on code in PR #46002: URL: https://github.com/apache/spark/pull/46002#discussion_r1562002375 ## python/pyspark/cloudpickle/cloudpickle.py: ## @@ -1461,7 +1461,7 @@ def dump(obj, file, protocol=None, buffer_callback=None): Pickler(file,

Re: [PR] [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker [spark]

2024-04-11 Thread via GitHub
grundprinzip commented on code in PR #46002: URL: https://github.com/apache/spark/pull/46002#discussion_r1562001832 ## python/pyspark/sql/tests/connect/streaming/test_parity_foreach_batch.py: ## @@ -30,33 +30,73 @@ def

Re: [PR] [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on code in PR #46002: URL: https://github.com/apache/spark/pull/46002#discussion_r1562000962 ## python/pyspark/sql/tests/connect/streaming/test_parity_foreach_batch.py: ## @@ -30,33 +30,73 @@ def

Re: [PR] [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on code in PR #46002: URL: https://github.com/apache/spark/pull/46002#discussion_r1562000621 ## python/pyspark/cloudpickle/cloudpickle.py: ## @@ -1461,7 +1461,7 @@ def dump(obj, file, protocol=None, buffer_callback=None): Pickler(file,

Re: [PR] [SPARK-44444][SQL] Enabled ANSI mode by default [spark]

2024-04-11 Thread via GitHub
yaooqinn commented on PR #46013: URL: https://github.com/apache/spark/pull/46013#issuecomment-2050953196 > Many of the workloads will just fail in the parser/analyzer and the failures are not related to data quality issues. Also, there is no easy workaround like "try_*" functions.

Re: [PR] [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker [spark]

2024-04-11 Thread via GitHub
grundprinzip commented on code in PR #46002: URL: https://github.com/apache/spark/pull/46002#discussion_r1561992506 ## python/pyspark/sql/connect/session.py: ## @@ -1034,6 +1034,20 @@ def profile(self) -> Profile: profile.__doc__ = PySparkSession.profile.__doc__ +

Re: [PR] [SPARK-44444][SQL] Enabled ANSI mode by default [spark]

2024-04-11 Thread via GitHub
gengliangwang commented on PR #46013: URL: https://github.com/apache/spark/pull/46013#issuecomment-2050941358 >And what about the default values of the ANI-related sub-configurations, such as spark.sql.ansi.enforceReservedKeywords spark.sql.ansi.relationPrecedence

Re: [PR] [SPARK-47812][CONNECT] Support Serialization of SparkSession for ForEachBatch worker [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on code in PR #46002: URL: https://github.com/apache/spark/pull/46002#discussion_r1561991509 ## python/pyspark/sql/connect/session.py: ## @@ -1034,6 +1034,20 @@ def profile(self) -> Profile: profile.__doc__ = PySparkSession.profile.__doc__ +

Re: [PR] [SPARK-44444][SQL] Enabled ANSI mode by default [spark]

2024-04-11 Thread via GitHub
yaooqinn commented on PR #46013: URL: https://github.com/apache/spark/pull/46013#issuecomment-2050928965 https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html should be updated too. And what about the default values of the ANI-related sub-configurations, such as -

Re: [PR] [SPARK-47814][DSTREAM] Move `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` from `main` to `test` [spark]

2024-04-11 Thread via GitHub
LuciferYang commented on PR #46000: URL: https://github.com/apache/spark/pull/46000#issuecomment-2050911774 `WriteInputFormatTestDataGenerator` is called by `test_readwrite.py`. Even though the CI has passed, I still want to confirm that putting it into test.jar will not have a negative

Re: [PR] [SPARK-47799][BUILD] Add `-g` to javac compile parameters when using SBT package jar [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun commented on PR #45983: URL: https://github.com/apache/spark/pull/45983#issuecomment-2050905018 Thank you for the info. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun commented on PR #46019: URL: https://github.com/apache/spark/pull/46019#issuecomment-2050904083 Thank you for the swift update. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
panbingkun commented on code in PR #46019: URL: https://github.com/apache/spark/pull/46019#discussion_r1561973227 ## connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala: ## @@ -40,6 +40,7 @@ import org.apache.spark.internal.Logging *

Re: [PR] [SPARK-44444][SQL] Enabled ANSI mode by default [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun commented on PR #46013: URL: https://github.com/apache/spark/pull/46013#issuecomment-2050903327 Yes, I want to have AS-IS implementation and to define clear boundary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
panbingkun commented on code in PR #46019: URL: https://github.com/apache/spark/pull/46019#discussion_r1561971976 ## connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala: ## @@ -40,6 +40,7 @@ import org.apache.spark.internal.Logging *

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun commented on code in PR #46019: URL: https://github.com/apache/spark/pull/46019#discussion_r1561971654 ## connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala: ## @@ -40,6 +40,7 @@ import org.apache.spark.internal.Logging

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
panbingkun commented on code in PR #46019: URL: https://github.com/apache/spark/pull/46019#discussion_r1561971448 ## core/src/main/scala/org/apache/spark/api/python/WriteInputFormatTestDataGenerator.scala: ## @@ -56,33 +57,39 @@ case class TestWritable(var str: String, var int:

Re: [PR] [SPARK-44444][SQL] Enabled ANSI mode by default [spark]

2024-04-11 Thread via GitHub
cloud-fan commented on PR #46013: URL: https://github.com/apache/spark/pull/46013#issuecomment-2050899887 Yea there are still some gaps between Spark ANSI mode and the SQL standard. But mostly ANSI is still a better default than no ANSI at all.

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
LuciferYang commented on code in PR #46019: URL: https://github.com/apache/spark/pull/46019#discussion_r1561970178 ## core/src/main/scala/org/apache/spark/api/python/WriteInputFormatTestDataGenerator.scala: ## @@ -56,33 +57,39 @@ case class TestWritable(var str: String, var

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
panbingkun commented on code in PR #46019: URL: https://github.com/apache/spark/pull/46019#discussion_r1561970075 ## connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala: ## @@ -40,6 +40,7 @@ import org.apache.spark.internal.Logging *

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun commented on code in PR #46019: URL: https://github.com/apache/spark/pull/46019#discussion_r1561969234 ## connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala: ## @@ -40,6 +40,7 @@ import org.apache.spark.internal.Logging

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
LuciferYang commented on code in PR #46019: URL: https://github.com/apache/spark/pull/46019#discussion_r1561969284 ## connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala: ## @@ -40,6 +40,7 @@ import org.apache.spark.internal.Logging

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun commented on code in PR #46019: URL: https://github.com/apache/spark/pull/46019#discussion_r1561968756 ## connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala: ## @@ -40,6 +40,7 @@ import org.apache.spark.internal.Logging

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun commented on code in PR #46019: URL: https://github.com/apache/spark/pull/46019#discussion_r1561968756 ## connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala: ## @@ -40,6 +40,7 @@ import org.apache.spark.internal.Logging

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
cloud-fan commented on code in PR #46019: URL: https://github.com/apache/spark/pull/46019#discussion_r1561966672 ## connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala: ## @@ -40,6 +40,7 @@ import org.apache.spark.internal.Logging *

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
panbingkun commented on PR #46019: URL: https://github.com/apache/spark/pull/46019#issuecomment-2050891561 > BTW, to @panbingkun , we need to wait for this kind of change because we need to collect more opinions than a normal PR. It will take at least 72 hours in general. Okay,

Re: [PR] [SPARK-47765][SQL] Add SET COLLATION to parser rules [spark]

2024-04-11 Thread via GitHub
cloud-fan commented on code in PR #45946: URL: https://github.com/apache/spark/pull/45946#discussion_r1561965680 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -202,6 +202,22 @@ public static StringSearch getStringSearch(

[PR] [WIP][SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-11 Thread via GitHub
panbingkun opened a new pull request, #46022: URL: https://github.com/apache/spark/pull/46022 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was

Re: [PR] [SPARK-47799][BUILD] Add `-g` to javac compile parameters when using SBT package jar [spark]

2024-04-11 Thread via GitHub
cxzl25 commented on PR #45983: URL: https://github.com/apache/spark/pull/45983#issuecomment-2050882315 Thanks dongjoon. > which decompiler did you use for you screenshot I use jd-gui. https://github.com/java-decompiler/jd-gui -- This is an automated message from the

Re: [PR] [SPARK-44444][SQL] Enabled ANSI mode by default [spark]

2024-04-11 Thread via GitHub
yaooqinn commented on PR #46013: URL: https://github.com/apache/spark/pull/46013#issuecomment-2050882987 Thank you @dongjoon-hyun for raising that thread. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun commented on PR #46019: URL: https://github.com/apache/spark/pull/46019#issuecomment-2050880237 BTW, to @panbingkun , we need to wait for this kind of change because we need to collect more opinions than a normal PR. It will take at least 72 hours in general. -- This is an

Re: [PR] [SPARK-44444][SQL] Enabled ANSI mode by default [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun commented on PR #46013: URL: https://github.com/apache/spark/pull/46013#issuecomment-2050875592 Thank you for the feedback, @yaooqinn . Yes, I totally understand the situation. The reason why I raise this issue is to make a decision (go/no-go) for this item.

Re: [PR] [SPARK-44444][SQL] Enabled ANSI mode by default [spark]

2024-04-11 Thread via GitHub
yaooqinn commented on PR #46013: URL: https://github.com/apache/spark/pull/46013#issuecomment-2050867779 I have added the above link and issues to SPARK-4. I didn't continue to audit ANSI compatibility as SPARK-46374 didn't get much attention. -- This is an automated message

Re: [PR] [SPARK-47253][CORE] Allow LiveEventBus to stop without the completely draining of event queue [spark]

2024-04-11 Thread via GitHub
TakawaAkirayo commented on code in PR #45367: URL: https://github.com/apache/spark/pull/45367#discussion_r1561947997 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -1014,6 +1014,15 @@ package object config { .timeConf(TimeUnit.NANOSECONDS)

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean [spark]

2024-04-11 Thread via GitHub
wForget commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1561935953 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -187,57 +187,57 @@ class V2ExpressionBuilder(e: Expression, isPredicate:

Re: [PR] [SPARK-47463][SQL] Use V2Predicate to wrap expression with return type of boolean [spark]

2024-04-11 Thread via GitHub
wForget commented on code in PR #45589: URL: https://github.com/apache/spark/pull/45589#discussion_r1561934026 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -187,8 +187,9 @@ class V2ExpressionBuilder(e: Expression, isPredicate:

Re: [PR] [SPARK-44444][SQL] Enabled ANSI mode by default [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun commented on PR #46013: URL: https://github.com/apache/spark/pull/46013#issuecomment-2050839565 Ya, could you add them all to the PR description and SPARK-4, @yaooqinn ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-44444][SQL] Enabled ANSI mode by default [spark]

2024-04-11 Thread via GitHub
yaooqinn commented on PR #46013: URL: https://github.com/apache/spark/pull/46013#issuecomment-2050839295 https://dev.mysql.com/doc/refman/8.3/en/sql-mode.html It seems that ANSI mode stays OFF in MySQL 8 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-44444][SQL] Enabled ANSI mode by default [spark]

2024-04-11 Thread via GitHub
yaooqinn commented on PR #46013: URL: https://github.com/apache/spark/pull/46013#issuecomment-2050833640 There are still many behaviors incompatible w/ ANSI standard, such as SPARK-46374 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-47795][K8S][DOCS] Supplement the doc of job schedule for K8S [spark]

2024-04-11 Thread via GitHub
yaooqinn commented on PR #45982: URL: https://github.com/apache/spark/pull/45982#issuecomment-2050828484 Merged to master Thank you @beliefer @dongjoon-hyun @bjornjorgensen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47795][K8S][DOCS] Supplement the doc of job schedule for K8S [spark]

2024-04-11 Thread via GitHub
yaooqinn closed pull request #45982: [SPARK-47795][K8S][DOCS] Supplement the doc of job schedule for K8S URL: https://github.com/apache/spark/pull/45982 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47813][SQL] Replace getArrayDimension with updateExtraColumnMeta [spark]

2024-04-11 Thread via GitHub
yaooqinn closed pull request #46006: [SPARK-47813][SQL] Replace getArrayDimension with updateExtraColumnMeta URL: https://github.com/apache/spark/pull/46006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47827][PYTHON] Missing warnings for deprecated features [spark]

2024-04-11 Thread via GitHub
itholic commented on PR #46021: URL: https://github.com/apache/spark/pull/46021#issuecomment-2050822266 cc @HyukjinKwon FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-47827][PYTHON] Missing warnings for deprecated features [spark]

2024-04-11 Thread via GitHub
itholic opened a new pull request, #46021: URL: https://github.com/apache/spark/pull/46021 ### What changes were proposed in this pull request? This PR proposes to add missing warnings for deprecated features ### Why are the changes needed? Some APIs will be

Re: [PR] [SPARK-47733][SS] Add custom metrics for transformWithState operator part of query progress [spark]

2024-04-11 Thread via GitHub
HeartSaVioR commented on code in PR #45937: URL: https://github.com/apache/spark/pull/45937#discussion_r1561908874 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithListStateSuite.scala: ## @@ -307,7 +307,10 @@ class TransformWithListStateSuite extends

Re: [PR] [SPARK-47603][KUBERNETES][YARN] Resource managers: Migrate logWarn with variables to structured logging framework [spark]

2024-04-11 Thread via GitHub
panbingkun commented on PR #45957: URL: https://github.com/apache/spark/pull/45957#issuecomment-2050798569 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47318][CORE][3.5] Adds HKDF round to AuthEngine key derivation to follow standard KEX practices [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun commented on code in PR #46014: URL: https://github.com/apache/spark/pull/46014#discussion_r1561825080 ## docs/security.md: ## @@ -169,6 +175,12 @@ The following table describes the different options available for configuring th 2.2.0 + +

Re: [PR] [SPARK-47318][CORE][3.5] Adds HKDF round to AuthEngine key derivation to follow standard KEX practices [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun commented on code in PR #46014: URL: https://github.com/apache/spark/pull/46014#discussion_r1561803406 ## docs/security.md: ## @@ -169,6 +175,12 @@ The following table describes the different options available for configuring th 2.2.0 + +

Re: [PR] [SPARK-47318][CORE][3.4] Adds HKDF round to AuthEngine key derivation to follow standard KEX practices [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun commented on PR #46015: URL: https://github.com/apache/spark/pull/46015#issuecomment-2050796202 To @mridulm , I prefer to pass CI. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on PR #46019: URL: https://github.com/apache/spark/pull/46019#issuecomment-2050795814 cc @HeartSaVioR too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47603][KUBERNETES][YARN] Resource managers: Migrate logWarn with variables to structured logging framework [spark]

2024-04-11 Thread via GitHub
panbingkun commented on code in PR #45957: URL: https://github.com/apache/spark/pull/45957#discussion_r1561900138 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -110,7 +121,8 @@ object LogKey extends Enumeration { val REDUCE_ID = Value val

Re: [PR] [SPARK-47825][DSTREAMS][3.5] Make `KinesisTestUtils` & `WriteInputFormatTestDataGenerator` deprecated [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun commented on PR #46019: URL: https://github.com/apache/spark/pull/46019#issuecomment-2050793224 cc @cloud-fan , @HyukjinKwon , @LuciferYang , @zhengruifeng, @yaooqinn , too -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-47541][SQL][FOLLOWUP] Fix `AnsiTypeCoercion` to handle ArrayType [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun commented on PR #46016: URL: https://github.com/apache/spark/pull/46016#issuecomment-2050791776 Merged to master for Apache Spark 4.0.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47541][SQL][FOLLOWUP] Fix `AnsiTypeCoercion` to handle ArrayType [spark]

2024-04-11 Thread via GitHub
dongjoon-hyun closed pull request #46016: [SPARK-47541][SQL][FOLLOWUP] Fix `AnsiTypeCoercion` to handle ArrayType URL: https://github.com/apache/spark/pull/46016 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47174][CONNECT][SS][1/2] Server side SparkConnectListenerBusListener for Client side streaming query listener [spark]

2024-04-11 Thread via GitHub
HyukjinKwon closed pull request #45988: [SPARK-47174][CONNECT][SS][1/2] Server side SparkConnectListenerBusListener for Client side streaming query listener URL: https://github.com/apache/spark/pull/45988 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47603][KUBERNETES][YARN] Resource managers: Migrate logWarn with variables to structured logging framework [spark]

2024-04-11 Thread via GitHub
panbingkun commented on code in PR #45957: URL: https://github.com/apache/spark/pull/45957#discussion_r1561897003 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala: ## @@ -857,7 +858,7 @@ private[spark] class ApplicationMaster(

Re: [PR] [SPARK-47174][CONNECT][SS][1/2] Server side SparkConnectListenerBusListener for Client side streaming query listener [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on PR #45988: URL: https://github.com/apache/spark/pull/45988#issuecomment-2050790536 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47603][KUBERNETES][YARN] Resource managers: Migrate logWarn with variables to structured logging framework [spark]

2024-04-11 Thread via GitHub
panbingkun commented on code in PR #45957: URL: https://github.com/apache/spark/pull/45957#discussion_r1561894980 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverCommandFeatureStep.scala: ## @@ -24,7 +24,8 @@ import

Re: [PR] [SPARK-47822][SQL] Prohibit Hash Expressions from hashing the Variant Data Type [spark]

2024-04-11 Thread via GitHub
harshmotw-db commented on code in PR #46017: URL: https://github.com/apache/spark/pull/46017#discussion_r1561892813 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ExpressionTypeCheckingSuite.scala: ## @@ -747,6 +748,18 @@ class ExpressionTypeCheckingSuite

Re: [PR] [SPARK-47821][SQL] Implement is_variant_null expression [spark]

2024-04-11 Thread via GitHub
harshmotw-db commented on code in PR #46011: URL: https://github.com/apache/spark/pull/46011#discussion_r1561890832 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -822,6 +822,7 @@ object FunctionRegistry { // Variant

Re: [PR] [SPARK-47821][SQL] Implement is_variant_null expression [spark]

2024-04-11 Thread via GitHub
harshmotw-db commented on code in PR #46011: URL: https://github.com/apache/spark/pull/46011#discussion_r1561889662 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -822,6 +822,7 @@ object FunctionRegistry { // Variant

Re: [PR] [SPARK-47822][SQL] Prohibit Hash Expressions from hashing the Variant Data Type [spark]

2024-04-11 Thread via GitHub
harshmotw-db commented on code in PR #46017: URL: https://github.com/apache/spark/pull/46017#discussion_r1561887921 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -914,6 +914,11 @@ "The must be between (current value = )." ] },

Re: [PR] [SPARK-47824][PS] Fix nondeterminism in pyspark.pandas.series.asof [spark]

2024-04-11 Thread via GitHub
HyukjinKwon closed pull request #46018: [SPARK-47824][PS] Fix nondeterminism in pyspark.pandas.series.asof URL: https://github.com/apache/spark/pull/46018 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47824][PS] Fix nondeterminism in pyspark.pandas.series.asof [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on PR #46018: URL: https://github.com/apache/spark/pull/46018#issuecomment-2050768062 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47816][CONNECT][DOCS] Document the lazy evaluation of views in `spark.{sql, table}` [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on code in PR #46007: URL: https://github.com/apache/spark/pull/46007#discussion_r1561880330 ## python/pyspark/sql/session.py: ## @@ -1756,6 +1763,13 @@ def table(self, tableName: str) -> DataFrame: --- :class:`DataFrame` +

Re: [PR] [SPARK-47816][CONNECT][DOCS] Document the lazy evaluation of views in `spark.{sql, table}` [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on code in PR #46007: URL: https://github.com/apache/spark/pull/46007#discussion_r1561880283 ## python/pyspark/sql/session.py: ## @@ -1630,6 +1630,13 @@ def sql( --- :class:`DataFrame` +Notes +- +In

Re: [PR] [SPARK-47366][SQL][PYTHON] Add VariantVal for PySpark [spark]

2024-04-11 Thread via GitHub
gene-db commented on PR #45826: URL: https://github.com/apache/spark/pull/45826#issuecomment-2050758504 > Yeah, would be great if we can file a JIRA. then I will edit the title, and link the Pr. Here is the new jira: https://issues.apache.org/jira/browse/SPARK-47826 Thanks!

Re: [PR] [SPARK-47366][SQL][PYTHON] Add VariantVal for PySpark [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on PR #45826: URL: https://github.com/apache/spark/pull/45826#issuecomment-2050753028 Yeah, would be great if we can file a JIRA. then I will edit the title, and link the Pr. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-39800][SQL][WIP] DataSourceV2: View Support [spark]

2024-04-11 Thread via GitHub
github-actions[bot] commented on PR #44197: URL: https://github.com/apache/spark/pull/44197#issuecomment-2050753239 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-46477][SQL] Add bucket info to SD in toHivePartition [spark]

2024-04-11 Thread via GitHub
github-actions[bot] commented on PR #1: URL: https://github.com/apache/spark/pull/1#issuecomment-2050753214 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1561875139 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,188 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): +def

Re: [PR] [MINOR][PS] Use expression instead of a string column in Series.asof [spark]

2024-04-11 Thread via GitHub
HyukjinKwon closed pull request #46020: [MINOR][PS] Use expression instead of a string column in Series.asof URL: https://github.com/apache/spark/pull/46020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [MINOR][PS] Use expression instead of a string column in Series.asof [spark]

2024-04-11 Thread via GitHub
HyukjinKwon commented on PR #46020: URL: https://github.com/apache/spark/pull/46020#issuecomment-2050749076 Oh, it's a duplicate of https://github.com/apache/spark/pull/46018. Let me merge that instead -- This is an automated message from the Apache Git Service. To respond to the

[PR] [MINOR][PS] Use expression instead of a string column in Series.asof [spark]

2024-04-11 Thread via GitHub
HyukjinKwon opened a new pull request, #46020: URL: https://github.com/apache/spark/pull/46020 ### What changes were proposed in this pull request? This PR proposes to use expression instead of a string column in Series.asof ### Why are the changes needed? It's better to

Re: [PR] [SPARK-47811][PYTHON][CONNECT][TESTS] Run ML tests for pyspark-connect package [spark]

2024-04-11 Thread via GitHub
HyukjinKwon closed pull request #45941: [SPARK-47811][PYTHON][CONNECT][TESTS] Run ML tests for pyspark-connect package URL: https://github.com/apache/spark/pull/45941 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

  1   2   3   >