from:"via GitHub"

Re: [PR] [SPARK-50278][BUILD] Upgrade `netty-tcnative` to 2.0.69.Final [spark]

2024-11-10 Thread via GitHub

panbingkun commented on PR #48810: URL: https://github.com/apache/spark/pull/48810#issuecomment-248217 ```shell ./build/sbt -Phadoop-3 -Pkubernetes -Pkinesis-asl -Phive-thriftserver -Pdocker-integration-tests -Pyarn -Phadoop-cloud -Pspark-ganglia-lgpl -Phive -Pjvm-profiler clean pack

Re: [PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-10 Thread via GitHub

HyukjinKwon commented on PR #48807: URL: https://github.com/apache/spark/pull/48807#issuecomment-241993 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-50278][BUILD] Upgrade `netty-tcnative` to 2.0.69.Final [spark]

2024-11-10 Thread via GitHub

panbingkun opened a new pull request, #48810: URL: https://github.com/apache/spark/pull/48810 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-10 Thread via GitHub

HyukjinKwon closed pull request #48807: [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream URL: https://github.com/apache/spark/pull/48807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-10 Thread via GitHub

HyukjinKwon commented on PR #48807: URL: https://github.com/apache/spark/pull/48807#issuecomment-241968 Let me merge this. I think the test failure isn't related this to PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] [MINOR][DOCS][PYTHON] Fix groupped aggreagte pandas UDF example in df.groupby.agg [spark]

2024-11-10 Thread via GitHub

HyukjinKwon opened a new pull request, #48809: URL: https://github.com/apache/spark/pull/48809 ### What changes were proposed in this pull request? This PR proposes to fix the groupped aggreagte pandas UDF example in `df.groupby.agg` by using type hints. ### Why are the changes

Re: [PR] [SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs [spark]

2024-11-10 Thread via GitHub

HyukjinKwon commented on PR #48770: URL: https://github.com/apache/spark/pull/48770#issuecomment-2466659766 Seems mostly fine to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs [spark]

2024-11-10 Thread via GitHub

HyukjinKwon commented on code in PR #48770: URL: https://github.com/apache/spark/pull/48770#discussion_r1835636736 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala: ## @@ -169,6 +148,23 @@ case class PythonUDAF( override protected def

Re: [PR] [SPARK-49058][SQL] Display more primitive name for the `::` operator [spark]

2024-11-09 Thread via GitHub

github-actions[bot] closed pull request #47535: [SPARK-49058][SQL] Display more primitive name for the `::` operator URL: https://github.com/apache/spark/pull/47535 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-50267][ML] Improve `TargetEncoder.fit` with DataFrame APIs [spark]

2024-11-09 Thread via GitHub

HyukjinKwon closed pull request #48797: [SPARK-50267][ML] Improve `TargetEncoder.fit` with DataFrame APIs URL: https://github.com/apache/spark/pull/48797 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-50267][ML] Improve `TargetEncoder.fit` with DataFrame APIs [spark]

2024-11-09 Thread via GitHub

HyukjinKwon commented on PR #48797: URL: https://github.com/apache/spark/pull/48797#issuecomment-2466524313 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-50258][SQL] Keep the output order after AQE optimization [spark]

2024-11-09 Thread via GitHub

wangyum commented on code in PR #48789: URL: https://github.com/apache/spark/pull/48789#discussion_r1835547499 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -773,7 +773,15 @@ case class AdaptiveSparkPlanExec( case

Re: [PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-09 Thread via GitHub

ankurdave commented on code in PR #48807: URL: https://github.com/apache/spark/pull/48807#discussion_r1835516840 ## core/src/main/scala/org/apache/spark/util/DirectByteBufferOutputStream.scala: ## @@ -63,15 +65,29 @@ private[spark] class DirectByteBufferOutputStream(capacity: I

Re: [PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-09 Thread via GitHub

MaxGekk commented on code in PR #48807: URL: https://github.com/apache/spark/pull/48807#discussion_r1835484120 ## core/src/main/scala/org/apache/spark/util/DirectByteBufferOutputStream.scala: ## @@ -63,15 +65,29 @@ private[spark] class DirectByteBufferOutputStream(capacity: Int

Re: [PR] [SPARK-42838][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_2000 [spark]

2024-11-09 Thread via GitHub

MaxGekk commented on code in PR #48332: URL: https://github.com/apache/spark/pull/48332#discussion_r1835483214 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala: ## @@ -436,9 +436,8 @@ class DateExpressionsSuite extends SparkFunS

Re: [PR] [SPARK-42838][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_2000 [spark]

2024-11-09 Thread via GitHub

MaxGekk commented on code in PR #48332: URL: https://github.com/apache/spark/pull/48332#discussion_r1835482299 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -277,21 +277,22 @@ private[sql] object QueryExecutionErrors extends QueryE

Re: [PR] [SPARK-50083][SQL] Integrate `_LEGACY_ERROR_TEMP_1231` into `PARTITIONS_NOT_FOUND` [spark]

2024-11-09 Thread via GitHub

MaxGekk commented on PR #48614: URL: https://github.com/apache/spark/pull/48614#issuecomment-2466302791 The failed tests is related to your changes, it seems. Please, fix it: ``` [info] - TRUNCATE TABLE using V1 catalog V1 command: truncate a partition of non partitioned table *** FAIL

Re: [PR] [SPARK-49670][SQL]Enable trim collation for all passthrough expressions [spark]

2024-11-09 Thread via GitHub

MaxGekk commented on code in PR #48739: URL: https://github.com/apache/spark/pull/48739#discussion_r1835472216 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala: ## @@ -1720,9 +1981,13 @@ class CollationSQLExpressionsSuite case class DateFor

Re: [PR] [SPARK-50245][SQL][TESTS] Extended CollationSuite and added tests where SortMergeJoin is forced [spark]

2024-11-09 Thread via GitHub

MaxGekk commented on code in PR #48774: URL: https://github.com/apache/spark/pull/48774#discussion_r1835463819 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -1795,43 +1825,40 @@ class CollationSuite extends DatasourceV2SQLBase with AdaptiveSparkPla

Re: [PR] [SPARK-50250][SQL] Assign appropriate error condition for `_LEGACY_ERROR_TEMP_2075`: `UNSUPPORTED_FEATURE.WRITE_FOR_BINARY_SOURCE` [spark]

2024-11-09 Thread via GitHub

MaxGekk commented on code in PR #48780: URL: https://github.com/apache/spark/pull/48780#discussion_r1835447328 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -5315,6 +5315,11 @@ "message" : [ "Update column nullability for MySQL and MS

Re: [PR] [SPARK-50270][SS][PYTHON] Added custom state metrics for TransformWithStateInPandas [spark]

2024-11-09 Thread via GitHub

HeartSaVioR closed pull request #48808: [SPARK-50270][SS][PYTHON] Added custom state metrics for TransformWithStateInPandas URL: https://github.com/apache/spark/pull/48808 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-50270][SS][PYTHON] Added custom state metrics for TransformWithStateInPandas [spark]

2024-11-09 Thread via GitHub

HeartSaVioR commented on PR #48808: URL: https://github.com/apache/spark/pull/48808#issuecomment-2466115481 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-50270][SS][PYTHON] Added custom state metrics for TransformWithStateInPandas [spark]

2024-11-09 Thread via GitHub

HeartSaVioR commented on PR #48808: URL: https://github.com/apache/spark/pull/48808#issuecomment-2466115453 CI failed with only docker integration test. https://github.com/bogao007/spark/runs/32741105303 -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] [SPARK-50275][SS][PYTHON] Enable test_pandas_transform_with_state unit test [spark]

2024-11-08 Thread via GitHub

HyukjinKwon commented on code in PR #48805: URL: https://github.com/apache/spark/pull/48805#discussion_r1835311867 ## python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py: ## @@ -109,6 +109,7 @@ def _test_transform_with_state_in_pandas_basic( input_path =

Re: [PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-08 Thread via GitHub

ankurdave commented on PR #48807: URL: https://github.com/apache/spark/pull/48807#issuecomment-2466011812 @JoshRosen Thanks! I changed to `private[this]` and updated the PR description to mention it. -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] [SPARK-50275][SS][PYTHON] Enable test_pandas_transform_with_state unit test [spark]

2024-11-08 Thread via GitHub

HeartSaVioR commented on code in PR #48805: URL: https://github.com/apache/spark/pull/48805#discussion_r1835213388 ## python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py: ## @@ -109,6 +109,7 @@ def _test_transform_with_state_in_pandas_basic( input_path =

Re: [PR] [SPARK-50275][PYTHON][SS] Enable test_pandas_transform_with_state unit test [spark]

2024-11-08 Thread via GitHub

HeartSaVioR commented on code in PR #48805: URL: https://github.com/apache/spark/pull/48805#discussion_r1835210545 ## dev/sparktestsupport/modules.py: ## @@ -526,6 +526,7 @@ def __hash__(self): "pyspark.sql.tests.pandas.test_pandas_grouped_map", "pyspark.sql.te

Re: [PR] [SPARK-50273][SS] Improve logging for RocksDB lock acquire/release cases [spark]

2024-11-08 Thread via GitHub

HeartSaVioR closed pull request #48806: [SPARK-50273][SS] Improve logging for RocksDB lock acquire/release cases URL: https://github.com/apache/spark/pull/48806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-50273][SS] Improve logging for RocksDB lock acquire/release cases [spark]

2024-11-08 Thread via GitHub

HeartSaVioR commented on PR #48806: URL: https://github.com/apache/spark/pull/48806#issuecomment-2465985323 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-08 Thread via GitHub

JoshRosen commented on code in PR #48807: URL: https://github.com/apache/spark/pull/48807#discussion_r1835187704 ## core/src/main/scala/org/apache/spark/util/DirectByteBufferOutputStream.scala: ## @@ -80,6 +96,7 @@ private[spark] class DirectByteBufferOutputStream(capacity: Int

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-08 Thread via GitHub

anishshri-db commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1835184033 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -563,13 +684,113 @@ class RangeKeyScanStateEncoder(

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-08 Thread via GitHub

anishshri-db commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1835184154 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -374,6 +479,22 @@ class RangeKeyScanStateEncoder( U

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-08 Thread via GitHub

anishshri-db commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1835181294 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -563,13 +684,113 @@ class RangeKeyScanStateEncoder(

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-08 Thread via GitHub

anishshri-db commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1835179392 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/package.scala: ## @@ -89,6 +89,49 @@ package object state { extraOptions,

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-08 Thread via GitHub

anishshri-db commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1835178324 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinHelper.scala: ## @@ -303,6 +303,56 @@ object StreamingSymmetricHashJo

Re: [PR] [SPARK-50017] Support Avro encoding for TransformWithState operator [spark]

2024-11-08 Thread via GitHub

anishshri-db commented on code in PR #48401: URL: https://github.com/apache/spark/pull/48401#discussion_r1835178017 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala: ## @@ -259,6 +259,19 @@ class IncrementalExecution( } } +

Re: [PR] [SPARK-50275][PYTHON][SS] Enable test_pandas_transform_with_state unit test [spark]

2024-11-08 Thread via GitHub

bogao007 commented on PR #48805: URL: https://github.com/apache/spark/pull/48805#issuecomment-2465942441 @HeartSaVioR Could you help review this change? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-50270][PYTHON][SS] Added custom state metrics for TransformWithStateInPandas [spark]

2024-11-08 Thread via GitHub

bogao007 commented on PR #48808: URL: https://github.com/apache/spark/pull/48808#issuecomment-2465942249 @HeartSaVioR Could you help review this change? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-50268][SQL][TESTS] Upgrade oracle jdbc driver to `ojdbc17:23.6.0.24.10` [spark]

2024-11-08 Thread via GitHub

panbingkun commented on PR #48798: URL: https://github.com/apache/spark/pull/48798#issuecomment-2465930108 > @milastdbx Could you review this PR, please. Thanks @MaxGekk ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-48210][DOC]Modify the description of whether dynamic partition… [spark]

2024-11-08 Thread via GitHub

github-actions[bot] closed pull request #46496: [SPARK-48210][DOC]Modify the description of whether dynamic partition… URL: https://github.com/apache/spark/pull/46496 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-49037][SQL][TESTS] Replace `schema()` with `columns()` [spark]

2024-11-08 Thread via GitHub

github-actions[bot] closed pull request #47515: [SPARK-49037][SQL][TESTS] Replace `schema()` with `columns()` URL: https://github.com/apache/spark/pull/47515 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-49058][SQL] Display more primitive name for the `::` operator [spark]

2024-11-08 Thread via GitHub

github-actions[bot] commented on PR #47535: URL: https://github.com/apache/spark/pull/47535#issuecomment-2465934354 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-50110][SQL] Fix `from_csv`: parse fails when data contains spaces before and after [spark]

2024-11-08 Thread via GitHub

panbingkun commented on PR #48653: URL: https://github.com/apache/spark/pull/48653#issuecomment-2465929498 > @panbingkun There are CSV options: > > * ignoreLeadingWhiteSpace > * ignoreTrailingWhiteSpace > > They are off in read by default, but when you set them on, do they so

[PR] [SPARK-50270][PYTHON][SS] Added custom state metrics for TransformWithStateInPandas [spark]

2024-11-08 Thread via GitHub

bogao007 opened a new pull request, #48808: URL: https://github.com/apache/spark/pull/48808 ### What changes were proposed in this pull request? - Added custom state metrics for TransformWithStateInPandas. - Clean up TTL properly. ### Why are the changes needed?

Re: [PR] [SPARK-50222][PYTHON][FOLLOWUP] Support `spark.submit.appName` in PySpark [spark]

2024-11-08 Thread via GitHub

dongjoon-hyun commented on PR #48788: URL: https://github.com/apache/spark/pull/48788#issuecomment-2465908616 Merged to master~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-50222][PYTHON][FOLLOWUP] Support `spark.submit.appName` in PySpark [spark]

2024-11-08 Thread via GitHub

dongjoon-hyun commented on code in PR #48788: URL: https://github.com/apache/spark/pull/48788#discussion_r1835160420 ## python/pyspark/sql/session.py: ## @@ -543,9 +543,12 @@ def getOrCreate(self) -> "SparkSession": session = SparkSession._instantiatedSession

Re: [PR] [SPARK-50222][PYTHON][FOLLOWUP] Support `spark.submit.appName` in PySpark [spark]

2024-11-08 Thread via GitHub

dongjoon-hyun closed pull request #48788: [SPARK-50222][PYTHON][FOLLOWUP] Support `spark.submit.appName` in PySpark URL: https://github.com/apache/spark/pull/48788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-50273][SS] Improve logging for RocksDB lock acquire/release cases [spark]

2024-11-08 Thread via GitHub

anishshri-db commented on PR #48806: URL: https://github.com/apache/spark/pull/48806#issuecomment-2465899935 cc - @HeartSaVioR @liviazhu-db - PTAL, thx ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-08 Thread via GitHub

ankurdave commented on PR #48807: URL: https://github.com/apache/spark/pull/48807#issuecomment-2465903933 cc @JoshRosen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] [SPARK-50274][CORE] Guard against use-after-close in DirectByteBufferOutputStream [spark]

2024-11-08 Thread via GitHub

ankurdave opened a new pull request, #48807: URL: https://github.com/apache/spark/pull/48807 ### What changes were proposed in this pull request? `DirectByteBufferOutputStream#close()` calls `StorageUtils.dispose()` to free its direct byte buffer. This puts the object into

Re: [PR] [SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub

mihailom-db commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1835071345 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalMathUtils.scala: ## @@ -31,16 +31,17 @@ object IntervalMathUtils { def subtractExact(

Re: [PR] [SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub

MaxGekk commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1835001042 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala: ## @@ -785,10 +785,15 @@ object IntervalUtils extends SparkIntervalUtils {

Re: [PR] [SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub

MaxGekk commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834996914 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -636,22 +636,26 @@ private[sql] object QueryExecutionErrors extends QueryE

Re: [PR] [SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub

MaxGekk commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834995566 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2022,8 +2022,20 @@ }, "INTERVAL_ARITHMETIC_OVERFLOW" : { "message" : [ - "." +

Re: [PR] [SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub

MaxGekk commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834992804 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2022,8 +2022,20 @@ }, "INTERVAL_ARITHMETIC_OVERFLOW" : { "message" : [ - "." +

Re: [PR] [SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs [spark]

2024-11-08 Thread via GitHub

ueshin commented on code in PR #48770: URL: https://github.com/apache/spark/pull/48770#discussion_r1834889583 ## python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py: ## @@ -752,46 +752,86 @@ def check_vectorized_udf_return_scalar(self): def test_udf_with_variant_inp

Re: [PR] [SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs [spark]

2024-11-08 Thread via GitHub

ueshin commented on code in PR #48770: URL: https://github.com/apache/spark/pull/48770#discussion_r1834889583 ## python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py: ## @@ -752,46 +752,86 @@ def check_vectorized_udf_return_scalar(self): def test_udf_with_variant_inp

Re: [PR] [SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs [spark]

2024-11-08 Thread via GitHub

ueshin commented on code in PR #48770: URL: https://github.com/apache/spark/pull/48770#discussion_r1834889583 ## python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py: ## @@ -752,46 +752,86 @@ def check_vectorized_udf_return_scalar(self): def test_udf_with_variant_inp

Re: [PR] [SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs [spark]

2024-11-08 Thread via GitHub

harshmotw-db commented on code in PR #48770: URL: https://github.com/apache/spark/pull/48770#discussion_r1834847538 ## python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py: ## @@ -752,46 +752,86 @@ def check_vectorized_udf_return_scalar(self): def test_udf_with_varia

Re: [PR] [SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs [spark]

2024-11-08 Thread via GitHub

harshmotw-db commented on code in PR #48770: URL: https://github.com/apache/spark/pull/48770#discussion_r1834847538 ## python/pyspark/sql/tests/pandas/test_pandas_udf_scalar.py: ## @@ -752,46 +752,86 @@ def check_vectorized_udf_return_scalar(self): def test_udf_with_varia

Re: [PR] [WIP][SPARK-50221][SQL] GROUP BY ALL support for SQL pipe aggregation [spark]

2024-11-08 Thread via GitHub

dtenedor closed pull request #48754: [WIP][SPARK-50221][SQL] GROUP BY ALL support for SQL pipe aggregation URL: https://github.com/apache/spark/pull/48754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [WIP][SPARK-50221][SQL] GROUP BY ALL support for SQL pipe aggregation [spark]

2024-11-08 Thread via GitHub

dtenedor commented on PR #48754: URL: https://github.com/apache/spark/pull/48754#issuecomment-2465418016 Looking at this more, the proposed semantics would not be consistent with how aggregation works with pipe operators, where the GROUP BY expressions arrive out of the operator followed by

Re: [PR] [SPARK-48898][SQL] Add Variant shredding functions [spark]

2024-11-08 Thread via GitHub

cashmand commented on PR #48779: URL: https://github.com/apache/spark/pull/48779#issuecomment-2465413855 > So If I understand correctly, the shredding write chain may be like this: Get the expected shredded schema (DataType) through some ways (sampling or justs user defined?) -> parquet w

[PR] Enable test_pandas_transform_with_state unit test [spark]

2024-11-08 Thread via GitHub

bogao007 opened a new pull request, #48805: URL: https://github.com/apache/spark/pull/48805 ### What changes were proposed in this pull request? Enable test_pandas_transform_with_state unit test ### Why are the changes needed? Improve python test coverage #

Re: [PR] [SPARK-50110][SQL] Fix `from_csv`: parse fails when data contains spaces before and after [spark]

2024-11-08 Thread via GitHub

MaxGekk commented on PR #48653: URL: https://github.com/apache/spark/pull/48653#issuecomment-2465244299 @panbingkun There are CSV options: - ignoreLeadingWhiteSpace - ignoreTrailingWhiteSpace They are off by default, but when you set them on, do they solve your issue? -- This

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub

gotocoding-DB commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834695708 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalMathUtils.scala: ## @@ -31,16 +31,17 @@ object IntervalMathUtils { def subtractExac

Re: [PR] [SPARK-50216][SQL] Address UTF8_BINARY performance regression [spark]

2024-11-08 Thread via GitHub

MaxGekk commented on code in PR #48804: URL: https://github.com/apache/spark/pull/48804#discussion_r1834614134 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -509,6 +509,10 @@ private CollationSpecUTF8( private static int

Re: [PR] [SPARK-50216][SQL] Address UTF8_BINARY performance regression [spark]

2024-11-08 Thread via GitHub

MaxGekk commented on code in PR #48804: URL: https://github.com/apache/spark/pull/48804#discussion_r1834614134 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -509,6 +509,10 @@ private CollationSpecUTF8( private static int

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub

davidm-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834598344 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -649,3 +652,145 @@ class LoopStatementExec( body.reset() } }

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub

davidm-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834596748 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -649,3 +652,145 @@ class LoopStatementExec( body.reset() } }

Re: [PR] [SPARK-50216][SQL] Address UTF8_BINARY performance regression [spark]

2024-11-08 Thread via GitHub

stevomitric commented on PR #48804: URL: https://github.com/apache/spark/pull/48804#issuecomment-2465047392 > @stevomitric Could you regenerate results of the benchmark CollationBenchmark, please. We should expect better numbers after your changes, right? They are running, I will pos

Re: [PR] [SPARK-49913][SQL] Add check for unique label names in nested labeled scopes [spark]

2024-11-08 Thread via GitHub

davidm-db commented on code in PR #48795: URL: https://github.com/apache/spark/pull/48795#discussion_r1834587169 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala: ## @@ -134,3 +139,70 @@ object ParserUtils extends SparkParserUtils { sb.t

Re: [PR] [SPARK-49913][SQL] Add check for unique label names in nested labeled scopes [spark]

2024-11-08 Thread via GitHub

davidm-db commented on code in PR #48795: URL: https://github.com/apache/spark/pull/48795#discussion_r1834586363 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala: ## @@ -134,3 +139,70 @@ object ParserUtils extends SparkParserUtils { sb.t

Re: [PR] [SPARK-50222][PYTHON][FOLLOWUP] Support `spark.submit.appName` in PySpark [spark]

2024-11-08 Thread via GitHub

viirya commented on code in PR #48788: URL: https://github.com/apache/spark/pull/48788#discussion_r1834526551 ## python/pyspark/sql/session.py: ## @@ -543,9 +543,12 @@ def getOrCreate(self) -> "SparkSession": session = SparkSession._instantiatedSession

Re: [PR] [SPARK-50222][PYTHON][FOLLOWUP] Support `spark.submit.appName` in PySpark [spark]

2024-11-08 Thread via GitHub

viirya commented on code in PR #48788: URL: https://github.com/apache/spark/pull/48788#discussion_r1834526551 ## python/pyspark/sql/session.py: ## @@ -543,9 +543,12 @@ def getOrCreate(self) -> "SparkSession": session = SparkSession._instantiatedSession

Re: [PR] [SPARK-50222][PYTHON][FOLLOWUP] Support `spark.submit.appName` in PySpark [spark]

2024-11-08 Thread via GitHub

dongjoon-hyun commented on PR #48788: URL: https://github.com/apache/spark/pull/48788#issuecomment-2464903788 Could you review this PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[PR] [SPARK-50216][SQL] Address UTF8_BINARY performance regression [spark]

2024-11-08 Thread via GitHub

stevomitric opened a new pull request, #48804: URL: https://github.com/apache/spark/pull/48804 ### What changes were proposed in this pull request? In this PR, UTF8_BINARY performance regression is addressed, that was first identified here https://github.com/apache/spark/pull/48721. The r

[PR] [SPARK-50262][SQL] Forbid specification complex types during altering collation [spark]

2024-11-08 Thread via GitHub

Alexvsalexvsalex opened a new pull request, #48803: URL: https://github.com/apache/spark/pull/48803 ### What changes were proposed in this pull request? [SPARK-48413](https://issues.apache.org/jira/browse/SPARK-48413) has brought ability to change collation on table. So I s

[PR] expose configure_logging [spark]

2024-11-08 Thread via GitHub

nija-at opened a new pull request, #48802: URL: https://github.com/apache/spark/pull/48802 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[PR] [SPARK-50272][SQL] Merge options of table and relation in FallBackFileSourceV2 [spark]

2024-11-08 Thread via GitHub

Zouxxyy opened a new pull request, #48801: URL: https://github.com/apache/spark/pull/48801 ### What changes were proposed in this pull request? Merge options of table and relation in FallBackFileSourceV2 ### Why are the changes needed? SPARK-49519

Re: [PR] [SPARK-49913][SQL] Add check for unique label names in nested labeled scopes [spark]

2024-11-08 Thread via GitHub

miland-db commented on code in PR #48795: URL: https://github.com/apache/spark/pull/48795#discussion_r1834376456 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala: ## @@ -134,3 +139,70 @@ object ParserUtils extends SparkParserUtils { sb.t

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub

mihailom-db commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834371951 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2022,8 +2022,20 @@ }, "INTERVAL_ARITHMETIC_OVERFLOW" : { "message" : [ - "."

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub

mihailom-db commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834371951 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2022,8 +2022,20 @@ }, "INTERVAL_ARITHMETIC_OVERFLOW" : { "message" : [ - "."

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub

mihailom-db commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834371951 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2022,8 +2022,20 @@ }, "INTERVAL_ARITHMETIC_OVERFLOW" : { "message" : [ - "."

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub

gotocoding-DB commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834364947 ## common/utils/src/main/resources/error/error-conditions.json: ## @@ -2020,11 +2020,23 @@ ], "sqlState" : "XX000" }, - "INTERVAL_ARITHMETIC_OVERFLO

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub

gotocoding-DB commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834181627 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala: ## @@ -561,17 +562,28 @@ case class MakeYMInterval(years: Expr

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub

gotocoding-DB commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834360587 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala: ## @@ -1488,12 +1488,14 @@ class DataFrameAggregateSuite extends QueryTest val

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub

dusantism-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834333970 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -649,3 +652,145 @@ class LoopStatementExec( body.reset() }

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub

dusantism-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834332113 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreter.scala: ## @@ -124,6 +124,22 @@ case class SqlScriptingInterpreter() { va

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub

dusantism-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834330570 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -649,3 +652,145 @@ class LoopStatementExec( body.reset() }

Re: [PR] [SPARK-49913][SQL] Add check for unique label names in nested labeled scopes [spark]

2024-11-08 Thread via GitHub

miland-db commented on code in PR #48795: URL: https://github.com/apache/spark/pull/48795#discussion_r1834265053 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala: ## @@ -134,3 +139,87 @@ object ParserUtils extends SparkParserUtils { sb.t

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub

davidm-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834247770 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -649,3 +652,145 @@ class LoopStatementExec( body.reset() } }

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub

davidm-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834234445 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -649,3 +652,145 @@ class LoopStatementExec( body.reset() } }

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub

gotocoding-DB commented on code in PR #48773: URL: https://github.com/apache/spark/pull/48773#discussion_r1834181627 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/intervalExpressions.scala: ## @@ -561,17 +562,28 @@ case class MakeYMInterval(years: Expr

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub

davidm-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834167674 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingInterpreter.scala: ## @@ -124,6 +124,22 @@ case class SqlScriptingInterpreter() { val b

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub

mihailom-db commented on PR #48773: URL: https://github.com/apache/spark/pull/48773#issuecomment-2464401231 You can remove [WIP] and Draft as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Add support for Config() API [spark-connect-go]

2024-11-08 Thread via GitHub

grundprinzip closed issue #48: Add support for Config() API URL: https://github.com/apache/spark-connect-go/issues/48 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] [WIP][SPARK-48356][SQL] Support for FOR statement [spark]

2024-11-08 Thread via GitHub

davidm-db commented on code in PR #48794: URL: https://github.com/apache/spark/pull/48794#discussion_r1834131331 ## sql/core/src/main/scala/org/apache/spark/sql/scripting/SqlScriptingExecutionNode.scala: ## @@ -19,12 +19,15 @@ package org.apache.spark.sql.scripting import org

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub

mihailom-db commented on PR #48773: URL: https://github.com/apache/spark/pull/48773#issuecomment-2464399881 @srielau Does this look good to you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [WIP][SPARK-50226][SQL] Correct MakeDTInterval and MakeYMInterval to catch Java exceptions [spark]

2024-11-08 Thread via GitHub

gotocoding-DB commented on PR #48773: URL: https://github.com/apache/spark/pull/48773#issuecomment-2464285445 CI fails, but `testOnly org.apache.spark.sql.SparkSessionE2ESuite` locally works fine (I've checked it 5 times, always green). Could it be useful just to restart CI (if yes – how to

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4119 matches

Mail list logo