Re: [PR] [SPARK-50278][BUILD] Upgrade `netty-tcnative` to 2.0.69.Final [spark]

2024-11-10 Thread via GitHub
panbingkun commented on PR #48810: URL: https://github.com/apache/spark/pull/48810#issuecomment-248217 ```shell ./build/sbt -Phadoop-3 -Pkubernetes -Pkinesis-asl -Phive-thriftserver -Pdocker-integration-tests -Pyarn -Phadoop-cloud -Pspark-ganglia-lgpl -Phive -Pjvm-profiler clean pack

Re: [PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-10 Thread via GitHub
HyukjinKwon commented on PR #48807: URL: https://github.com/apache/spark/pull/48807#issuecomment-241993 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-50278][BUILD] Upgrade `netty-tcnative` to 2.0.69.Final [spark]

2024-11-10 Thread via GitHub
panbingkun opened a new pull request, #48810: URL: https://github.com/apache/spark/pull/48810 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-10 Thread via GitHub
HyukjinKwon closed pull request #48807: [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream URL: https://github.com/apache/spark/pull/48807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-10 Thread via GitHub
HyukjinKwon commented on PR #48807: URL: https://github.com/apache/spark/pull/48807#issuecomment-241968 Let me merge this. I think the test failure isn't related this to PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] [MINOR][DOCS][PYTHON] Fix groupped aggreagte pandas UDF example in df.groupby.agg [spark]

2024-11-10 Thread via GitHub
HyukjinKwon opened a new pull request, #48809: URL: https://github.com/apache/spark/pull/48809 ### What changes were proposed in this pull request? This PR proposes to fix the groupped aggreagte pandas UDF example in `df.groupby.agg` by using type hints. ### Why are the changes

Re: [PR] [SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs [spark]

2024-11-10 Thread via GitHub
HyukjinKwon commented on PR #48770: URL: https://github.com/apache/spark/pull/48770#issuecomment-2466659766 Seems mostly fine to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs [spark]

2024-11-10 Thread via GitHub
HyukjinKwon commented on code in PR #48770: URL: https://github.com/apache/spark/pull/48770#discussion_r1835636736 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala: ## @@ -169,6 +148,23 @@ case class PythonUDAF( override protected def

Re: [PR] [SPARK-49058][SQL] Display more primitive name for the `::` operator [spark]

2024-11-09 Thread via GitHub
github-actions[bot] closed pull request #47535: [SPARK-49058][SQL] Display more primitive name for the `::` operator URL: https://github.com/apache/spark/pull/47535 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-50267][ML] Improve `TargetEncoder.fit` with DataFrame APIs [spark]

2024-11-09 Thread via GitHub
HyukjinKwon closed pull request #48797: [SPARK-50267][ML] Improve `TargetEncoder.fit` with DataFrame APIs URL: https://github.com/apache/spark/pull/48797 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-50267][ML] Improve `TargetEncoder.fit` with DataFrame APIs [spark]

2024-11-09 Thread via GitHub
HyukjinKwon commented on PR #48797: URL: https://github.com/apache/spark/pull/48797#issuecomment-2466524313 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-50258][SQL] Keep the output order after AQE optimization [spark]

2024-11-09 Thread via GitHub
wangyum commented on code in PR #48789: URL: https://github.com/apache/spark/pull/48789#discussion_r1835547499 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -773,7 +773,15 @@ case class AdaptiveSparkPlanExec( case

Re: [PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-09 Thread via GitHub
ankurdave commented on code in PR #48807: URL: https://github.com/apache/spark/pull/48807#discussion_r1835516840 ## core/src/main/scala/org/apache/spark/util/DirectByteBufferOutputStream.scala: ## @@ -63,15 +65,29 @@ private[spark] class DirectByteBufferOutputStream(capacity: I

Re: [PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-09 Thread via GitHub
MaxGekk commented on code in PR #48807: URL: https://github.com/apache/spark/pull/48807#discussion_r1835484120 ## core/src/main/scala/org/apache/spark/util/DirectByteBufferOutputStream.scala: ## @@ -63,15 +65,29 @@ private[spark] class DirectByteBufferOutputStream(capacity: Int

Re: [PR] [SPARK-42838][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_2000 [spark]

2024-11-09 Thread via GitHub
MaxGekk commented on code in PR #48332: URL: https://github.com/apache/spark/pull/48332#discussion_r1835483214 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala: ## @@ -436,9 +436,8 @@ class DateExpressionsSuite extends SparkFunS

Re: [PR] [SPARK-42838][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_2000 [spark]

2024-11-09 Thread via GitHub
MaxGekk commented on code in PR #48332: URL: https://github.com/apache/spark/pull/48332#discussion_r1835482299 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -277,21 +277,22 @@ private[sql] object QueryExecutionErrors extends QueryE

Re: [PR] [SPARK-50083][SQL] Integrate `_LEGACY_ERROR_TEMP_1231` into `PARTITIONS_NOT_FOUND` [spark]

2024-11-09 Thread via GitHub
MaxGekk commented on PR #48614: URL: https://github.com/apache/spark/pull/48614#issuecomment-2466302791 The failed tests is related to your changes, it seems. Please, fix it: ``` [info] - TRUNCATE TABLE using V1 catalog V1 command: truncate a partition of non partitioned table *** FAIL

Re: [PR] [SPARK-49670][SQL]Enable trim collation for all passthrough expressions [spark]

2024-11-09 Thread via GitHub
MaxGekk commented on code in PR #48739: URL: https://github.com/apache/spark/pull/48739#discussion_r1835472216 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala: ## @@ -1720,9 +1981,13 @@ class CollationSQLExpressionsSuite case class DateFor

Re: [PR] [SPARK-50245][SQL][TESTS] Extended CollationSuite and added tests where SortMergeJoin is forced [spark]

2024-11-09 Thread via GitHub
MaxGekk commented on code in PR #48774: URL: https://github.com/apache/spark/pull/48774#discussion_r1835463819 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -1795,43 +1825,40 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPla

Re: [PR] [SPARK-50250][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_2075`: `UNSUPPORTED_FEATURE.WRITE_FOR_BINARY_SOURCE` [spark]

2024-11-09 Thread via GitHub
MaxGekk commented on code in PR #48780: URL: https://github.com/apache/spark/pull/48780#discussion_r1835447328 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -5315,6 +5315,11 @@ "message" : [ "Update column nullability for MySQL and MS

Re: [PR] [SPARK-50270][SS][PYTHON] Added custom state metrics for TransformWithStateInPandas [spark]

2024-11-09 Thread via GitHub
HeartSaVioR closed pull request #48808: [SPARK-50270][SS][PYTHON] Added custom state metrics for TransformWithStateInPandas URL: https://github.com/apache/spark/pull/48808 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-50270][SS][PYTHON] Added custom state metrics for TransformWithStateInPandas [spark]

2024-11-09 Thread via GitHub
HeartSaVioR commented on PR #48808: URL: https://github.com/apache/spark/pull/48808#issuecomment-2466115481 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-50270][SS][PYTHON] Added custom state metrics for TransformWithStateInPandas [spark]

2024-11-09 Thread via GitHub
HeartSaVioR commented on PR #48808: URL: https://github.com/apache/spark/pull/48808#issuecomment-2466115453 CI failed with only docker integration test. https://github.com/bogao007/spark/runs/32741105303 -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] [SPARK-50275][SS][PYTHON] Enable test_pandas_transform_with_state unit test [spark]

2024-11-08 Thread via GitHub
HyukjinKwon commented on code in PR #48805: URL: https://github.com/apache/spark/pull/48805#discussion_r1835311867 ## python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py: ## @@ -109,6 +109,7 @@ def _test_transform_with_state_in_pandas_basic( input_path =

Re: [PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-08 Thread via GitHub
ankurdave commented on PR #48807: URL: https://github.com/apache/spark/pull/48807#issuecomment-2466011812 @JoshRosen Thanks! I changed to `private[this]` and updated the PR description to mention it. -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] [SPARK-50275][SS][PYTHON] Enable test_pandas_transform_with_state unit test [spark]

2024-11-08 Thread via GitHub
HeartSaVioR commented on code in PR #48805: URL: https://github.com/apache/spark/pull/48805#discussion_r1835213388 ## python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py: ## @@ -109,6 +109,7 @@ def _test_transform_with_state_in_pandas_basic( input_path =

Re: [PR] [SPARK-50275][PYTHON][SS] Enable test_pandas_transform_with_state unit test [spark]

2024-11-08 Thread via GitHub
HeartSaVioR commented on code in PR #48805: URL: https://github.com/apache/spark/pull/48805#discussion_r1835210545 ## dev/sparktestsupport/modules.py: ## @@ -526,6 +526,7 @@ def __hash__(self): "pyspark.sql.tests.pandas.test_pandas_grouped_map", "pyspark.sql.te

Re: [PR] [SPARK-50273][SS] Improve logging for RocksDB lock acquire/release cases [spark]

2024-11-08 Thread via GitHub
HeartSaVioR closed pull request #48806: [SPARK-50273][SS] Improve logging for RocksDB lock acquire/release cases URL: https://github.com/apache/spark/pull/48806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-50273][SS] Improve logging for RocksDB lock acquire/release cases [spark]

2024-11-08 Thread via GitHub
HeartSaVioR commented on PR #48806: URL: https://github.com/apache/spark/pull/48806#issuecomment-2465985323 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-08 Thread via GitHub
JoshRosen commented on code in PR #48807: URL: https://github.com/apache/spark/pull/48807#discussion_r1835187704 ## core/src/main/scala/org/apache/spark/util/DirectByteBufferOutputStream.scala: ## @@ -80,6 +96,7 @@ private[spark] class DirectByteBufferOutputStream(capacity: Int

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-08 Thread via GitHub
anishshri-db commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1835184033 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -563,13 +684,113 @@ class RangeKeyScanStateEncoder(

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-08 Thread via GitHub
anishshri-db commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1835184154 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -374,6 +479,22 @@ class RangeKeyScanStateEncoder( U

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-08 Thread via GitHub
anishshri-db commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1835181294 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -563,13 +684,113 @@ class RangeKeyScanStateEncoder(

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-08 Thread via GitHub
anishshri-db commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1835179392 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/package.scala: ## @@ -89,6 +89,49 @@ package object state { extraOptions,

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-08 Thread via GitHub
anishshri-db commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1835178324 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinHelper.scala: ## @@ -303,6 +303,56 @@ object StreamingSymmetricHashJo

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-08 Thread via GitHub
anishshri-db commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1835178017 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -259,6 +259,19 @@ class IncrementalExecution( } } +

Re: [PR] [SPARK-50275][PYTHON][SS] Enable test_pandas_transform_with_state unit test [spark]

2024-11-08 Thread via GitHub
bogao007 commented on PR #48805: URL: https://github.com/apache/spark/pull/48805#issuecomment-2465942441 @HeartSaVioR Could you help review this change? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-50270][PYTHON][SS] Added custom state metrics for TransformWithStateInPandas [spark]

2024-11-08 Thread via GitHub
bogao007 commented on PR #48808: URL: https://github.com/apache/spark/pull/48808#issuecomment-2465942249 @HeartSaVioR Could you help review this change? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-50268][SQL][TESTS] Upgrade oracle jdbc driver to `ojdbc17:23.6.0.24.10` [spark]

2024-11-08 Thread via GitHub
panbingkun commented on PR #48798: URL: https://github.com/apache/spark/pull/48798#issuecomment-2465930108 > @milastdbx Could you review this PR, please. Thanks @MaxGekk ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48210][DOC]Modify the description of whether dynamic partition… [spark]

2024-11-08 Thread via GitHub
github-actions[bot] closed pull request #46496: [SPARK-48210][DOC]Modify the description of whether dynamic partition… URL: https://github.com/apache/spark/pull/46496 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-49037][SQL][TESTS] Replace `schema()` with `columns()` [spark]

2024-11-08 Thread via GitHub
github-actions[bot] closed pull request #47515: [SPARK-49037][SQL][TESTS] Replace `schema()` with `columns()` URL: https://github.com/apache/spark/pull/47515 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-49058][SQL] Display more primitive name for the `::` operator [spark]

2024-11-08 Thread via GitHub
github-actions[bot] commented on PR #47535: URL: https://github.com/apache/spark/pull/47535#issuecomment-2465934354 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-50110][SQL] Fix `from_csv`: parse fails when data contains spaces before and after [spark]

2024-11-08 Thread via GitHub
panbingkun commented on PR #48653: URL: https://github.com/apache/spark/pull/48653#issuecomment-2465929498 > @panbingkun There are CSV options: > > * ignoreLeadingWhiteSpace > * ignoreTrailingWhiteSpace > > They are off in read by default, but when you set them on, do they so

[PR] [SPARK-50270][PYTHON][SS] Added custom state metrics for TransformWithStateInPandas [spark]

2024-11-08 Thread via GitHub
bogao007 opened a new pull request, #48808: URL: https://github.com/apache/spark/pull/48808 ### What changes were proposed in this pull request? - Added custom state metrics for TransformWithStateInPandas. - Clean up TTL properly. ### Why are the changes needed?

Re: [PR] [SPARK-50222][PYTHON][FOLLOWUP] Support `spark.submit.appName` in PySpark [spark]

2024-11-08 Thread via GitHub
dongjoon-hyun commented on PR #48788: URL: https://github.com/apache/spark/pull/48788#issuecomment-2465908616 Merged to master~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-50222][PYTHON][FOLLOWUP] Support `spark.submit.appName` in PySpark [spark]

2024-11-08 Thread via GitHub
dongjoon-hyun commented on code in PR #48788: URL: https://github.com/apache/spark/pull/48788#discussion_r1835160420 ## python/pyspark/sql/session.py: ## @@ -543,9 +543,12 @@ def getOrCreate(self) -> "SparkSession": session = SparkSession._instantiatedSession

Re: [PR] [SPARK-50222][PYTHON][FOLLOWUP] Support `spark.submit.appName` in PySpark [spark]

2024-11-08 Thread via GitHub
dongjoon-hyun closed pull request #48788: [SPARK-50222][PYTHON][FOLLOWUP] Support `spark.submit.appName` in PySpark URL: https://github.com/apache/spark/pull/48788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-50273][SS] Improve logging for RocksDB lock acquire/release cases [spark]

2024-11-08 Thread via GitHub
anishshri-db commented on PR #48806: URL: https://github.com/apache/spark/pull/48806#issuecomment-2465899935 cc - @HeartSaVioR @liviazhu-db - PTAL, thx ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-08 Thread via GitHub
ankurdave commented on PR #48807: URL: https://github.com/apache/spark/pull/48807#issuecomment-2465903933 cc @JoshRosen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-08 Thread via GitHub
ankurdave opened a new pull request, #48807: URL: https://github.com/apache/spark/pull/48807 ### What changes were proposed in this pull request? `DirectByteBufferOutputStream#close()` calls `StorageUtils.dispose()` to free its direct byte buffer. This puts the object into

Re: [PR] [SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub
mihailom-db commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1835071345 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalMathUtils.scala: ## @@ -31,16 +31,17 @@ object IntervalMathUtils { def subtractExact(

Re: [PR] [SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub
MaxGekk commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1835001042 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala: ## @@ -785,10 +785,15 @@ object IntervalUtils extends SparkIntervalUtils {

Re: [PR] [SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub
MaxGekk commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834996914 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -636,22 +636,26 @@ private[sql] object QueryExecutionErrors extends QueryE

Re: [PR] [SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub
MaxGekk commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834995566 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2022,8 +2022,20 @@ }, "INTERVAL_ARITHMETIC_OVERFLOW" : { "message" : [ - "." +

Re: [PR] [SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub
MaxGekk commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834992804 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2022,8 +2022,20 @@ }, "INTERVAL_ARITHMETIC_OVERFLOW" : { "message" : [ - "." +

Re: [PR] [SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs [spark]

2024-11-08 Thread via GitHub
ueshin commented on code in PR #48770: URL: https://github.com/apache/spark/pull/48770#discussion_r1834889583 ## python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py: ## @@ -752,46 +752,86 @@ def check_vectorized_udf_return_scalar(self): def test_udf_with_variant_inp

Re: [PR] [SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs [spark]

2024-11-08 Thread via GitHub
ueshin commented on code in PR #48770: URL: https://github.com/apache/spark/pull/48770#discussion_r1834889583 ## python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py: ## @@ -752,46 +752,86 @@ def check_vectorized_udf_return_scalar(self): def test_udf_with_variant_inp

Re: [PR] [SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs [spark]

2024-11-08 Thread via GitHub
ueshin commented on code in PR #48770: URL: https://github.com/apache/spark/pull/48770#discussion_r1834889583 ## python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py: ## @@ -752,46 +752,86 @@ def check_vectorized_udf_return_scalar(self): def test_udf_with_variant_inp

Re: [PR] [SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs [spark]

2024-11-08 Thread via GitHub
harshmotw-db commented on code in PR #48770: URL: https://github.com/apache/spark/pull/48770#discussion_r1834847538 ## python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py: ## @@ -752,46 +752,86 @@ def check_vectorized_udf_return_scalar(self): def test_udf_with_varia

Re: [PR] [SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs [spark]

2024-11-08 Thread via GitHub
harshmotw-db commented on code in PR #48770: URL: https://github.com/apache/spark/pull/48770#discussion_r1834847538 ## python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py: ## @@ -752,46 +752,86 @@ def check_vectorized_udf_return_scalar(self): def test_udf_with_varia

Re: [PR] [WIP][SPARK-50221][SQL] GROUP BY ALL support for SQL pipe aggregation [spark]

2024-11-08 Thread via GitHub
dtenedor closed pull request #48754: [WIP][SPARK-50221][SQL] GROUP BY ALL support for SQL pipe aggregation URL: https://github.com/apache/spark/pull/48754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [WIP][SPARK-50221][SQL] GROUP BY ALL support for SQL pipe aggregation [spark]

2024-11-08 Thread via GitHub
dtenedor commented on PR #48754: URL: https://github.com/apache/spark/pull/48754#issuecomment-2465418016 Looking at this more, the proposed semantics would not be consistent with how aggregation works with pipe operators, where the GROUP BY expressions arrive out of the operator followed by

Re: [PR] [SPARK-48898][SQL] Add Variant shredding functions [spark]

2024-11-08 Thread via GitHub
cashmand commented on PR #48779: URL: https://github.com/apache/spark/pull/48779#issuecomment-2465413855 > So If I understand correctly, the shredding write chain may be like this: Get the expected shredded schema (DataType) through some ways (sampling or justs user defined?) -> parquet w

[PR] Enable test_pandas_transform_with_state unit test [spark]

2024-11-08 Thread via GitHub
bogao007 opened a new pull request, #48805: URL: https://github.com/apache/spark/pull/48805 ### What changes were proposed in this pull request? Enable test_pandas_transform_with_state unit test ### Why are the changes needed? Improve python test coverage #

Re: [PR] [SPARK-50110][SQL] Fix `from_csv`: parse fails when data contains spaces before and after [spark]

2024-11-08 Thread via GitHub
MaxGekk commented on PR #48653: URL: https://github.com/apache/spark/pull/48653#issuecomment-2465244299 @panbingkun There are CSV options: - ignoreLeadingWhiteSpace - ignoreTrailingWhiteSpace They are off by default, but when you set them on, do they solve your issue? -- This

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub
gotocoding-DB commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834695708 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalMathUtils.scala: ## @@ -31,16 +31,17 @@ object IntervalMathUtils { def subtractExac

Re: [PR] [SPARK-50216][SQL] Address UTF8_BINARY performance regression [spark]

2024-11-08 Thread via GitHub
MaxGekk commented on code in PR #48804: URL: https://github.com/apache/spark/pull/48804#discussion_r1834614134 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -509,6 +509,10 @@ private CollationSpecUTF8( private static int

Re: [PR] [SPARK-50216][SQL] Address UTF8_BINARY performance regression [spark]

2024-11-08 Thread via GitHub
MaxGekk commented on code in PR #48804: URL: https://github.com/apache/spark/pull/48804#discussion_r1834614134 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -509,6 +509,10 @@ private CollationSpecUTF8( private static int

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub
davidm-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834598344 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -649,3 +652,145 @@ class LoopStatementExec( body.reset() } }

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub
davidm-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834596748 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -649,3 +652,145 @@ class LoopStatementExec( body.reset() } }

Re: [PR] [SPARK-50216][SQL] Address UTF8_BINARY performance regression [spark]

2024-11-08 Thread via GitHub
stevomitric commented on PR #48804: URL: https://github.com/apache/spark/pull/48804#issuecomment-2465047392 > @stevomitric Could you regenerate results of the benchmark CollationBenchmark, please. We should expect better numbers after your changes, right? They are running, I will pos

Re: [PR] [SPARK-49913][SQL] Add check for unique label names in nested labeled scopes [spark]

2024-11-08 Thread via GitHub
davidm-db commented on code in PR #48795: URL: https://github.com/apache/spark/pull/48795#discussion_r1834587169 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala: ## @@ -134,3 +139,70 @@ object ParserUtils extends SparkParserUtils { sb.t

Re: [PR] [SPARK-49913][SQL] Add check for unique label names in nested labeled scopes [spark]

2024-11-08 Thread via GitHub
davidm-db commented on code in PR #48795: URL: https://github.com/apache/spark/pull/48795#discussion_r1834586363 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala: ## @@ -134,3 +139,70 @@ object ParserUtils extends SparkParserUtils { sb.t

Re: [PR] [SPARK-50222][PYTHON][FOLLOWUP] Support `spark.submit.appName` in PySpark [spark]

2024-11-08 Thread via GitHub
viirya commented on code in PR #48788: URL: https://github.com/apache/spark/pull/48788#discussion_r1834526551 ## python/pyspark/sql/session.py: ## @@ -543,9 +543,12 @@ def getOrCreate(self) -> "SparkSession": session = SparkSession._instantiatedSession

Re: [PR] [SPARK-50222][PYTHON][FOLLOWUP] Support `spark.submit.appName` in PySpark [spark]

2024-11-08 Thread via GitHub
viirya commented on code in PR #48788: URL: https://github.com/apache/spark/pull/48788#discussion_r1834526551 ## python/pyspark/sql/session.py: ## @@ -543,9 +543,12 @@ def getOrCreate(self) -> "SparkSession": session = SparkSession._instantiatedSession

Re: [PR] [SPARK-50222][PYTHON][FOLLOWUP] Support `spark.submit.appName` in PySpark [spark]

2024-11-08 Thread via GitHub
dongjoon-hyun commented on PR #48788: URL: https://github.com/apache/spark/pull/48788#issuecomment-2464903788 Could you review this PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[PR] [SPARK-50216][SQL] Address UTF8_BINARY performance regression [spark]

2024-11-08 Thread via GitHub
stevomitric opened a new pull request, #48804: URL: https://github.com/apache/spark/pull/48804 ### What changes were proposed in this pull request? In this PR, UTF8_BINARY performance regression is addressed, that was first identified here https://github.com/apache/spark/pull/48721. The r

[PR] [SPARK-50262][SQL] Forbid specification complex types during altering collation [spark]

2024-11-08 Thread via GitHub
Alexvsalexvsalex opened a new pull request, #48803: URL: https://github.com/apache/spark/pull/48803 ### What changes were proposed in this pull request? [SPARK-48413](https://issues.apache.org/jira/browse/SPARK-48413) has brought ability to change collation on table. So I s

[PR] expose configure_logging [spark]

2024-11-08 Thread via GitHub
nija-at opened a new pull request, #48802: URL: https://github.com/apache/spark/pull/48802 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[PR] [SPARK-50272][SQL] Merge options of table and relation in FallBackFileSourceV2 [spark]

2024-11-08 Thread via GitHub
Zouxxyy opened a new pull request, #48801: URL: https://github.com/apache/spark/pull/48801 ### What changes were proposed in this pull request? Merge options of table and relation in FallBackFileSourceV2 ### Why are the changes needed? SPARK-49519

Re: [PR] [SPARK-49913][SQL] Add check for unique label names in nested labeled scopes [spark]

2024-11-08 Thread via GitHub
miland-db commented on code in PR #48795: URL: https://github.com/apache/spark/pull/48795#discussion_r1834376456 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala: ## @@ -134,3 +139,70 @@ object ParserUtils extends SparkParserUtils { sb.t

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub
mihailom-db commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834371951 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2022,8 +2022,20 @@ }, "INTERVAL_ARITHMETIC_OVERFLOW" : { "message" : [ - "."

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub
mihailom-db commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834371951 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2022,8 +2022,20 @@ }, "INTERVAL_ARITHMETIC_OVERFLOW" : { "message" : [ - "."

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub
mihailom-db commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834371951 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2022,8 +2022,20 @@ }, "INTERVAL_ARITHMETIC_OVERFLOW" : { "message" : [ - "."

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub
gotocoding-DB commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834364947 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2020,11 +2020,23 @@ ], "sqlState" : "XX000" }, - "INTERVAL_ARITHMETIC_OVERFLO

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub
gotocoding-DB commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834181627 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala: ## @@ -561,17 +562,28 @@ case class MakeYMInterval(years: Expr

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub
gotocoding-DB commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834360587 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala: ## @@ -1488,12 +1488,14 @@ class DataFrameAggregateSuite extends QueryTest val

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub
dusantism-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834333970 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -649,3 +652,145 @@ class LoopStatementExec( body.reset() }

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub
dusantism-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834332113 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreter.scala: ## @@ -124,6 +124,22 @@ case class SqlScriptingInterpreter() { va

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub
dusantism-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834330570 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -649,3 +652,145 @@ class LoopStatementExec( body.reset() }

Re: [PR] [SPARK-49913][SQL] Add check for unique label names in nested labeled scopes [spark]

2024-11-08 Thread via GitHub
miland-db commented on code in PR #48795: URL: https://github.com/apache/spark/pull/48795#discussion_r1834265053 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala: ## @@ -134,3 +139,87 @@ object ParserUtils extends SparkParserUtils { sb.t

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub
davidm-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834247770 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -649,3 +652,145 @@ class LoopStatementExec( body.reset() } }

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub
davidm-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834234445 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -649,3 +652,145 @@ class LoopStatementExec( body.reset() } }

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub
gotocoding-DB commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834181627 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala: ## @@ -561,17 +562,28 @@ case class MakeYMInterval(years: Expr

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub
davidm-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834167674 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreter.scala: ## @@ -124,6 +124,22 @@ case class SqlScriptingInterpreter() { val b

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub
mihailom-db commented on PR #48773: URL: https://github.com/apache/spark/pull/48773#issuecomment-2464401231 You can remove [WIP] and Draft as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Add support for Config() API [spark-connect-go]

2024-11-08 Thread via GitHub
grundprinzip closed issue #48: Add support for Config() API URL: https://github.com/apache/spark-connect-go/issues/48 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub
davidm-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834131331 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -19,12 +19,15 @@ package org.apache.spark.sql.scripting import org

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub
mihailom-db commented on PR #48773: URL: https://github.com/apache/spark/pull/48773#issuecomment-2464399881 @srielau Does this look good to you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub
gotocoding-DB commented on PR #48773: URL: https://github.com/apache/spark/pull/48773#issuecomment-2464285445 CI fails, but `testOnly org.apache.spark.sql.SparkSessionE2ESuite` locally works fine (I've checked it 5 times, always green). Could it be useful just to restart CI (if yes – how to

  1   2   3   4   5   6   7   8   9   10   >