date:20240304

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-04 Thread via GitHub

cloud-fan commented on code in PR #45368: URL: https://github.com/apache/spark/pull/45368#discussion_r1512285216 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryRowLevelOperationTableCatalog.scala: ## @@ -31,13 +31,23 @@ class

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-04 Thread via GitHub

cloud-fan commented on code in PR #45368: URL: https://github.com/apache/spark/pull/45368#discussion_r1512284803 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryPartitionTableCatalog.scala: ## @@ -31,12 +31,22 @@ class InMemoryPartitionTableCatalog

[PR] [SPARK-47248][SQL][COLLATION] Extended string function support: contains [spark]

2024-03-04 Thread via GitHub

uros-db opened a new pull request, #45382: URL: https://github.com/apache/spark/pull/45382 ### What changes were proposed in this pull request? Extend built-in string functions to support non-binary, non-lowercase collation for: contains. ### Why are the changes needed?

Re: [PR] Add Support for Scala 2.13 in Spark 3.4.1 [spark-docker]

2024-03-04 Thread via GitHub

databius commented on PR #52: URL: https://github.com/apache/spark-docker/pull/52#issuecomment-1978112303 It would be great if we could support old versions instead of only spark 3.5+. I need an image that supports scala 2.13 and spark 3.4.2. Currently, I am building my own image

Re: [PR] [DO-NOT-MERGE] Restructuring MasterSuite [spark]

2024-03-04 Thread via GitHub

HyukjinKwon commented on PR #45366: URL: https://github.com/apache/spark/pull/45366#issuecomment-1978108094 https://github.com/HyukjinKwon/spark/actions/runs/8152472664/job/22282001033 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47033][SQL] Fix EXECUTE IMMEDIATE USING does not recognize session variable names [spark]

2024-03-04 Thread via GitHub

andrej-db commented on code in PR #45293: URL: https://github.com/apache/spark/pull/45293#discussion_r1511301326 ## sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala: ## @@ -336,6 +336,19 @@ class QueryExecutionSuite extends SharedSparkSession {

Re: [PR] [SPARK-47176][SQL][FOLLOW-UP] resolveExpressions should have three versions which is the same as resolveOperators [spark]

2024-03-04 Thread via GitHub

cloud-fan closed pull request #45321: [SPARK-47176][SQL][FOLLOW-UP] resolveExpressions should have three versions which is the same as resolveOperators URL: https://github.com/apache/spark/pull/45321 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47176][SQL][FOLLOW-UP] resolveExpressions should have three versions which is the same as resolveOperators [spark]

2024-03-04 Thread via GitHub

cloud-fan commented on PR #45321: URL: https://github.com/apache/spark/pull/45321#issuecomment-1978055064 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-04 Thread via GitHub

cloud-fan commented on code in PR #45125: URL: https://github.com/apache/spark/pull/45125#discussion_r1512183617 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -328,6 +328,31 @@ abstract class Optimizer(catalogManager:

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-04 Thread via GitHub

cloud-fan commented on code in PR #45125: URL: https://github.com/apache/spark/pull/45125#discussion_r1512181287 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -328,6 +328,31 @@ abstract class Optimizer(catalogManager:

Re: [PR] [SPARK-46350][SS] Fix state removal for stream-stream join with one watermark and one time-interval condition [spark]

2024-03-04 Thread via GitHub

rangadi commented on code in PR #44323: URL: https://github.com/apache/spark/pull/44323#discussion_r1512166292 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinHelper.scala: ## @@ -198,31 +198,50 @@ object

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-04 Thread via GitHub

panbingkun commented on PR #45368: URL: https://github.com/apache/spark/pull/45368#issuecomment-1978003238 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [DO-NOT-MERGE] Test Hive pre-2.3.10 [spark]

2024-03-04 Thread via GitHub

LuciferYang commented on code in PR #45372: URL: https://github.com/apache/spark/pull/45372#discussion_r1512146209 ## pom.xml: ## @@ -199,14 +197,14 @@ 2.12.0 4.1.17 -14.0.1 +33.0.0-jre Review Comment: @pan3793 If we upgrade the version of Guava, and

Re: [PR] [SPARK-47270][SQL] Dataset.isEmpty projects CommandResults locally [spark]

2024-03-04 Thread via GitHub

wForget commented on code in PR #45373: URL: https://github.com/apache/spark/pull/45373#discussion_r1512148037 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -655,8 +655,17 @@ class Dataset[T] private[sql]( * @group basic * @since 2.4.0 */ -

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-03-04 Thread via GitHub

mridulm commented on PR #45327: URL: https://github.com/apache/spark/pull/45327#issuecomment-1977986879 @JacobZheng0927, might be a good idea to backport this to 3.5 as well - will you be able to create a backport PR ? (I ran into some issue locally when trying to merge to branch-3.5 and

Re: [PR] [DO-NOT-MERGE] Test Hive pre-2.3.10 [spark]

2024-03-04 Thread via GitHub

LuciferYang commented on code in PR #45372: URL: https://github.com/apache/spark/pull/45372#discussion_r1512146209 ## pom.xml: ## @@ -199,14 +197,14 @@ 2.12.0 4.1.17 -14.0.1 +33.0.0-jre Review Comment: @pan3793 If we upgrade the version of Guava, and

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-03-04 Thread via GitHub

mridulm commented on PR #45327: URL: https://github.com/apache/spark/pull/45327#issuecomment-1977985641 Merged to master. Thanks for fixing this @JacobZheng0927 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47146][CORE] Possible thread leak when doing sort merge join [spark]

2024-03-04 Thread via GitHub

mridulm closed pull request #45327: [SPARK-47146][CORE] Possible thread leak when doing sort merge join URL: https://github.com/apache/spark/pull/45327 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [DO-NOT-MERGE] Test Hive pre-2.3.10 [spark]

2024-03-04 Thread via GitHub

LuciferYang commented on PR #45372: URL: https://github.com/apache/spark/pull/45372#issuecomment-1977981587 happy to see this hive upgrade, thanks to @pan3793 and @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47270][SQL] Dataset.isEmpty projects CommandResults locally [spark]

2024-03-04 Thread via GitHub

cloud-fan commented on code in PR #45373: URL: https://github.com/apache/spark/pull/45373#discussion_r1512136164 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -655,8 +655,17 @@ class Dataset[T] private[sql]( * @group basic * @since 2.4.0 */ -

Re: [PR] [DO-NOT-MERGE] Restructuring MasterSuite [spark]

2024-03-04 Thread via GitHub

HyukjinKwon commented on PR #45366: URL: https://github.com/apache/spark/pull/45366#issuecomment-1977962956 https://github.com/HyukjinKwon/spark/actions/runs/8151190050 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47270][SQL] Dataset.isEmpty projects CommandResults locally [spark]

2024-03-04 Thread via GitHub

HyukjinKwon commented on code in PR #45373: URL: https://github.com/apache/spark/pull/45373#discussion_r1512129374 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -655,8 +655,17 @@ class Dataset[T] private[sql]( * @group basic * @since 2.4.0 */

Re: [PR] [SPARK-47253][CORE] Allow LiveEventBus to stop without the completely draining of event queue [spark]

2024-03-04 Thread via GitHub

TakawaAkirayo commented on code in PR #45367: URL: https://github.com/apache/spark/pull/45367#discussion_r1512107482 ## core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala: ## @@ -142,9 +142,11 @@ private class AsyncEventQueue(

Re: [PR] [SPARK-47218] [SQL] XML: Changed schemOfXml to fail on DROPMALFORMED mode [spark]

2024-03-04 Thread via GitHub

sandip-db commented on PR #45379: URL: https://github.com/apache/spark/pull/45379#issuecomment-1977944645 Why are the changes needed? DROPMALFORMED parse mode imply silently dropping the malformed record. But SchemaOfXml is expected to return a schema and may not have a valid schema

Re: [PR] [SPARK-47218] [SQL] XML: Changed schemOfXml to fail on DROPMALFORMED mode [spark]

2024-03-04 Thread via GitHub

sandip-db commented on PR #45379: URL: https://github.com/apache/spark/pull/45379#issuecomment-1977938882 nit in title: schemOfXml --> SchemaOfXml -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47253][CORE] Allow LiveEventBus to stop without the completely draining of event queue [spark]

2024-03-04 Thread via GitHub

TakawaAkirayo commented on code in PR #45367: URL: https://github.com/apache/spark/pull/45367#discussion_r1512107482 ## core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala: ## @@ -142,9 +142,11 @@ private class AsyncEventQueue(

Re: [PR] [SPARK-47270][SQL] Dataset.isEmpty projects CommandResults locally [spark]

2024-03-04 Thread via GitHub

wForget commented on PR #45373: URL: https://github.com/apache/spark/pull/45373#issuecomment-1977928924 @peter-toth @HyukjinKwon @cloud-fan could you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47253][CORE] Allow LiveEventBus to stop without the completely draining of event queue [spark]

2024-03-04 Thread via GitHub

beliefer commented on code in PR #45367: URL: https://github.com/apache/spark/pull/45367#discussion_r1512093558 ## core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala: ## @@ -142,9 +142,11 @@ private class AsyncEventQueue( eventCount.incrementAndGet()

Re: [PR] [SPARK-46989][SQL][CONNECT] Improve concurrency performance for SparkSession [spark]

2024-03-04 Thread via GitHub

beliefer commented on code in PR #45046: URL: https://github.com/apache/spark/pull/45046#discussion_r1482658250 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -854,7 +855,7 @@ object SparkSession extends Logging { // the

Re: [PR] [SPARK-47278][BUILD] Upgrade rocksdbjni to 8.11.3 [spark]

2024-03-04 Thread via GitHub

LuciferYang commented on PR #45365: URL: https://github.com/apache/spark/pull/45365#issuecomment-1977889225 Let's run another two or three rounds of tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47271][DOCS] Explain importance of statistics on SQL performance tuning page [spark]

2024-03-04 Thread via GitHub

beliefer commented on code in PR #45374: URL: https://github.com/apache/spark/pull/45374#discussion_r1512072983 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -582,11 +582,7 @@ object SQLConf { val AUTO_BROADCASTJOIN_THRESHOLD =

Re: [PR] [SPARK-47176][SQL][FOLLOW-UP] resolveExpressions should have three versions which is the same as resolveOperators [spark]

2024-03-04 Thread via GitHub

amaliujia commented on code in PR #45321: URL: https://github.com/apache/spark/pull/45321#discussion_r1512059410 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/AnalysisHelper.scala: ## @@ -229,6 +229,14 @@ trait AnalysisHelper extends

Re: [PR] [DO-NOT-MERGE] Restructuring MasterSuite [spark]

2024-03-04 Thread via GitHub

HyukjinKwon commented on PR #45366: URL: https://github.com/apache/spark/pull/45366#issuecomment-1977840217 https://github.com/HyukjinKwon/spark/actions/runs/8150215353/job/22276105311 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47177][SQL][3.4] Cached SQL plan do not display final AQE plan in explain string [spark]

2024-03-04 Thread via GitHub

dongjoon-hyun commented on PR #45381: URL: https://github.com/apache/spark/pull/45381#issuecomment-1977838516 Thank you, @ulysses-you . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47177][SQL] Cached SQL plan do not display final AQE plan in explain string [spark]

2024-03-04 Thread via GitHub

dongjoon-hyun commented on PR #45282: URL: https://github.com/apache/spark/pull/45282#issuecomment-1977838135 Thank you! That's better and safe. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47177][SQL] Cached SQL plan do not display final AQE plan in explain string [spark]

2024-03-04 Thread via GitHub

ulysses-you commented on PR #45282: URL: https://github.com/apache/spark/pull/45282#issuecomment-1977837540 @dongjoon-hyun there are some conflicts, I created a new pr https://github.com/apache/spark/pull/45381 for branch-3.4 -- This is an automated message from the Apache Git Service.

[PR] [SPARK-47177][SQL][3.4] Cached SQL plan do not display final AQE plan in explain string [spark]

2024-03-04 Thread via GitHub

ulysses-you opened a new pull request, #45381: URL: https://github.com/apache/spark/pull/45381 This pr backport https://github.com/apache/spark/pull/45282 to branch-3.4 ### What changes were proposed in this pull request? This pr adds lock for ExplainUtils.processPlan

Re: [PR] [SPARK-47176][SQL][FOLLOW-UP] resolveExpressions should have three versions which is the same as resolveOperators [spark]

2024-03-04 Thread via GitHub

cloud-fan commented on code in PR #45321: URL: https://github.com/apache/spark/pull/45321#discussion_r1512050470 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/AnalysisHelper.scala: ## @@ -229,6 +229,14 @@ trait AnalysisHelper extends

Re: [PR] [SPARK-47177][SQL] Cached SQL plan do not display final AQE plan in explain string [spark]

2024-03-04 Thread via GitHub

dongjoon-hyun commented on PR #45282: URL: https://github.com/apache/spark/pull/45282#issuecomment-1977829088 BTW, #40812 landed at Apache Spark 3.4.1, doesn't it? If then, it seems that we need to backport this to branch-3.4, @ulysses-you . -- This is an automated message from the

Re: [PR] [SPARK-47277] PySpark util function assertDataFrameEqual should not support streaming DF [spark]

2024-03-04 Thread via GitHub

HyukjinKwon commented on PR #45380: URL: https://github.com/apache/spark/pull/45380#issuecomment-1977823298 cc @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47277] PySpark util function assertDataFrameEqual should not support streaming DF [spark]

2024-03-04 Thread via GitHub

WweiL commented on PR #45380: URL: https://github.com/apache/spark/pull/45380#issuecomment-1977822310 I'm having some local build issue, since this is a small change I want to defer the test to remote CI -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47177][SQL] Cached SQL plan do not display final AQE plan in explain string [spark]

2024-03-04 Thread via GitHub

ulysses-you closed pull request #45282: [SPARK-47177][SQL] Cached SQL plan do not display final AQE plan in explain string URL: https://github.com/apache/spark/pull/45282 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] [SPARK-47277] PySpark util function assertDataFrameEqual should not support streaming DF [spark]

2024-03-04 Thread via GitHub

WweiL opened a new pull request, #45380: URL: https://github.com/apache/spark/pull/45380 ### What changes were proposed in this pull request? The handy util function should not support streaming dataframes, currently if you call it upon streaming queries, it throws a

Re: [PR] [SPARK-47177][SQL] Cached SQL plan do not display final AQE plan in explain string [spark]

2024-03-04 Thread via GitHub

ulysses-you commented on PR #45282: URL: https://github.com/apache/spark/pull/45282#issuecomment-1977821766 thanks for review, merging to master/branch-1.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47177][SQL] Cached SQL plan do not display final AQE plan in explain string [spark]

2024-03-04 Thread via GitHub

liuzqt commented on code in PR #45282: URL: https://github.com/apache/spark/pull/45282#discussion_r1512035940 ## sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryRelationSuite.scala: ## @@ -18,27 +18,42 @@ package org.apache.spark.sql.execution.columnar

Re: [PR] [SPARK-47247][SQL] Use smaller target size when coalescing partitions with exploding joins [spark]

2024-03-04 Thread via GitHub

yaooqinn commented on code in PR #45357: URL: https://github.com/apache/spark/pull/45357#discussion_r1512030230 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala: ## @@ -126,9 +126,12 @@ case class

Re: [PR] [SPARK-47271][DOCS] Explain importance of statistics on SQL performance tuning page [spark]

2024-03-04 Thread via GitHub

nchammas commented on code in PR #45374: URL: https://github.com/apache/spark/pull/45374#discussion_r1512030136 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -582,11 +582,7 @@ object SQLConf { val AUTO_BROADCASTJOIN_THRESHOLD =

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-04 Thread via GitHub

anishshri-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1512013529 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -29,6 +29,14 @@ sealed trait RocksDBKeyStateEncoder {

Re: [PR] [SPARK-44746][Python] Add more Python UDTF documentation for functions that accept input tables [spark]

2024-03-04 Thread via GitHub

dtenedor commented on code in PR #45375: URL: https://github.com/apache/spark/pull/45375#discussion_r1511980443 ## python/docs/source/user_guide/sql/python_udtf.rst: ## @@ -63,6 +63,7 @@ To implement a Python UDTF, you first need to define a class implementing the me

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-04 Thread via GitHub

anishshri-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1512002599 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImpl.scala: ## @@ -0,0 +1,114 @@ +/* + * Licensed to the Apache Software Foundation

[PR] [SPARK-47218] [SQL] XML: Changed schemOfXml to fail on DROPMALFORMED mode [spark]

2024-03-04 Thread via GitHub

yhosny opened a new pull request, #45379: URL: https://github.com/apache/spark/pull/45379 ### What changes were proposed in this pull request? Changed schema_of_xml should fail with an error even on DROPMALFORMED mode to avoid creating schemas out of invalid XML.

Re: [PR] [WIP][SPARK-47274][PYTHON][CONNECT] Provide more useful context for PySpark DataFrame API errors [spark]

2024-03-04 Thread via GitHub

itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-191911 On second thought, we need to keep the previous transformation stacktrace to provide more accurate context. Will push more commit to update it. -- This is an automated message

[PR] [WIP] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-04 Thread via GitHub

xinrong-meng opened a new pull request, #45378: URL: https://github.com/apache/spark/pull/45378 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

Re: [PR] [SPARK-47177][SQL] Cached SQL plan do not display final AQE plan in explain string [spark]

2024-03-04 Thread via GitHub

ulysses-you commented on code in PR #45282: URL: https://github.com/apache/spark/pull/45282#discussion_r1511998525 ## sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryRelationSuite.scala: ## @@ -18,27 +18,42 @@ package

Re: [PR] [SPARK-47155][PYTHON] Fix Error Class Issue [spark]

2024-03-04 Thread via GitHub

HyukjinKwon commented on code in PR #45306: URL: https://github.com/apache/spark/pull/45306#discussion_r1511990741 ## python/pyspark/sql/worker/create_data_source.py: ## @@ -150,8 +150,8 @@ def main(infile: IO, outfile: IO) -> None: is_ddl_string = True

Re: [PR] [SPARK-47155][PYTHON] Fix Error Class Issue [spark]

2024-03-04 Thread via GitHub

HyukjinKwon commented on code in PR #45306: URL: https://github.com/apache/spark/pull/45306#discussion_r1511990741 ## python/pyspark/sql/worker/create_data_source.py: ## @@ -150,8 +150,8 @@ def main(infile: IO, outfile: IO) -> None: is_ddl_string = True

Re: [PR] [SPARK-47252][DOCS] Clarify that pivot may trigger an eager computation [spark]

2024-03-04 Thread via GitHub

HyukjinKwon closed pull request #45363: [SPARK-47252][DOCS] Clarify that pivot may trigger an eager computation URL: https://github.com/apache/spark/pull/45363 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47271][DOCS] Explain importance of statistics on SQL performance tuning page [spark]

2024-03-04 Thread via GitHub

HyukjinKwon commented on code in PR #45374: URL: https://github.com/apache/spark/pull/45374#discussion_r1511987787 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -582,11 +582,7 @@ object SQLConf { val AUTO_BROADCASTJOIN_THRESHOLD =

Re: [PR] [SPARK-47252][DOCS] Clarify that pivot may trigger an eager computation [spark]

2024-03-04 Thread via GitHub

HyukjinKwon commented on PR #45363: URL: https://github.com/apache/spark/pull/45363#issuecomment-1977745985 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [WIP][SPARK-47274][PYTHON][CONNECT] Provide more useful context for PySpark DataFrame API errors [spark]

2024-03-04 Thread via GitHub

itholic commented on code in PR #45377: URL: https://github.com/apache/spark/pull/45377#discussion_r1511983720 ## python/pyspark/errors/utils.py: ## @@ -119,3 +127,73 @@ def get_message_template(self, error_class: str) -> str: message_template =

Re: [PR] [DO-NOT-MERGE] Avoid OOM in MasterSuite with Mac OS [spark]

2024-03-04 Thread via GitHub

dongjoon-hyun commented on code in PR #45366: URL: https://github.com/apache/spark/pull/45366#discussion_r1511982665 ## core/src/test/scala/org/apache/spark/deploy/master/WorkerSelectionSuite.scala: ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [WIP][SPARK-47274][PYTHON][CONNECT] Provide more useful context for PySpark DataFrame API errors [spark]

2024-03-04 Thread via GitHub

itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-1977737988 cc @HyukjinKwon FYI, I'm still working on Spark Connect support and unit tests but the basic structure is ready for review. FYI, also cc @MaxGekk as you made a similar contribution

Re: [PR] [WIP][SPARK-47274][PYTHON][CONNECT] Provide more useful context for PySpark DataFrame API errors [spark]

2024-03-04 Thread via GitHub

itholic commented on PR #45377: URL: https://github.com/apache/spark/pull/45377#issuecomment-1977735804 I'm still working on Spark Connect support and unit tests, but the basic structure is ready for review. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [DO-NOT-MERGE] Avoid OOM in MasterSuite with Mac OS [spark]

2024-03-04 Thread via GitHub

HyukjinKwon commented on PR #45366: URL: https://github.com/apache/spark/pull/45366#issuecomment-1977735310 test: https://github.com/HyukjinKwon/spark/actions/runs/8149143761/job/22273296949 -- This is an automated message from the Apache Git Service. To respond to the message, please

[PR] [WIP][SPARK-47274][PYTHON][CONNECT] Provide more useful context for PySpark DataFrame API errors [spark]

2024-03-04 Thread via GitHub

itholic opened a new pull request, #45377: URL: https://github.com/apache/spark/pull/45377 ### What changes were proposed in this pull request? This PR introduces an enhancement to the error messages generated by PySpark's DataFrame API, adding detailed context about the location

Re: [PR] [SPARK-47202][PYTHON] Fix typo breaking datetimes with tzinfo [spark]

2024-03-04 Thread via GitHub

arzavj commented on PR #45301: URL: https://github.com/apache/spark/pull/45301#issuecomment-1977729895 @HyukjinKwon do you know when I can expect 3.5.2 to be released to be able to take advantage of this bug fix? -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [SPARK-47155][PYTHON] Fix Error Class Issue [spark]

2024-03-04 Thread via GitHub

HyukjinKwon commented on PR #45306: URL: https://github.com/apache/spark/pull/45306#issuecomment-1977726281 I think the actions should be enabled at https://github.com/sunan135/spark/settings/actions by `Allow all actions and reusable workflows` -- This is an automated message from the

Re: [PR] [SPARK-45954][SQL] Remove redundant shuffles [spark]

2024-03-04 Thread via GitHub

github-actions[bot] commented on PR #43841: URL: https://github.com/apache/spark/pull/43841#issuecomment-1977715036 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[PR] Allow chaining other stateful operators after transformWIthState operator. [spark]

2024-03-04 Thread via GitHub

sahnib opened a new pull request, #45376: URL: https://github.com/apache/spark/pull/45376 ### What changes were proposed in this pull request? This PR adds support to define event time column in the output dataset of `TransformWithState` operator. The new event time column

Re: [PR] [SPARK-36691][PYTHON] PythonRunner failed should pass error message to ApplicationMaster too [spark]

2024-03-04 Thread via GitHub

helenweng-stripe commented on PR #33934: URL: https://github.com/apache/spark/pull/33934#issuecomment-1977631285 Wonder if we can reconsider merging this PR in? We've had to make a similar patch internally to support PySpark users. -- This is an automated message from the Apache Git

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-04 Thread via GitHub

chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1511914150 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,104 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [WIP] Test rocksdbjni 8.11.3 [spark]

2024-03-04 Thread via GitHub

dongjoon-hyun commented on PR #45365: URL: https://github.com/apache/spark/pull/45365#issuecomment-1977616307 Thank you, @LuciferYang . Is it ready? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers [spark]

2024-03-04 Thread via GitHub

ueshin commented on code in PR #45269: URL: https://github.com/apache/spark/pull/45269#discussion_r1511898364 ## python/docs/source/reference/pyspark.sql/spark_session.rst: ## @@ -49,6 +49,7 @@ See also :class:`SparkSession`. SparkSession.createDataFrame

Re: [PR] [SPARK-44746][Python] Add more Python UDTF documentation for functions that accept input tables [spark]

2024-03-04 Thread via GitHub

allisonwang-db commented on code in PR #45375: URL: https://github.com/apache/spark/pull/45375#discussion_r1511851865 ## python/docs/source/user_guide/sql/python_udtf.rst: ## @@ -63,6 +63,7 @@ To implement a Python UDTF, you first need to define a class implementing the me

Re: [PR] [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers [spark]

2024-03-04 Thread via GitHub

xinrong-meng commented on code in PR #45269: URL: https://github.com/apache/spark/pull/45269#discussion_r1511840961 ## python/docs/source/reference/pyspark.sql/spark_session.rst: ## @@ -49,6 +49,7 @@ See also :class:`SparkSession`. SparkSession.createDataFrame

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-04 Thread via GitHub

agubichev commented on PR #45125: URL: https://github.com/apache/spark/pull/45125#issuecomment-1977518691 > Thanks for the fix, looks good overall. > > Let's add a gating flag for this change just in case of any issues. added a flag -- This is an automated message from the

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-04 Thread via GitHub

agubichev commented on PR #45125: URL: https://github.com/apache/spark/pull/45125#issuecomment-1977518871 @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-04 Thread via GitHub

agubichev commented on code in PR #45125: URL: https://github.com/apache/spark/pull/45125#discussion_r1511831412 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -328,6 +328,30 @@ abstract class Optimizer(catalogManager:

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-04 Thread via GitHub

agubichev commented on PR #45125: URL: https://github.com/apache/spark/pull/45125#issuecomment-1977503018 > What about if there's another node above the aggregate in the subquery, such as a filter after the aggregate (having clause)? added a test, but any non-trivial node about the

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-04 Thread via GitHub

agubichev commented on code in PR #45125: URL: https://github.com/apache/spark/pull/45125#discussion_r1511830190 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -328,6 +328,30 @@ abstract class Optimizer(catalogManager:

Re: [PR] [SPARK-42627][SPARK-26494][SQL] Support Oracle TIMESTAMP WITH LOCAL TIME ZONE [spark]

2024-03-04 Thread via GitHub

steveloughran commented on PR #45337: URL: https://github.com/apache/spark/pull/45337#issuecomment-1977449543 @dongjoon-hyun I'm just thinking of all the timestamps in ORC and parquet and when they are local vs UTC... -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-44746][Python] Add more Python UDTF documentation for functions that accept input tables [spark]

2024-03-04 Thread via GitHub

dtenedor commented on PR #45375: URL: https://github.com/apache/spark/pull/45375#issuecomment-1977448390 cc @allisonwang-db @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-39771][CORE] Add a warning msg in `Dependency` when a too large number of shuffle blocks is to be created. [spark]

2024-03-04 Thread via GitHub

y-wei commented on code in PR #45266: URL: https://github.com/apache/spark/pull/45266#discussion_r1511783477 ## core/src/main/scala/org/apache/spark/Dependency.scala: ## @@ -206,6 +206,21 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: ClassTag]( finalizeTask =

[PR] [SPARK-44746][Python] Add more Python UDTF documentation for functions that accept input tables [spark]

2024-03-04 Thread via GitHub

dtenedor opened a new pull request, #45375: URL: https://github.com/apache/spark/pull/45375 ### What changes were proposed in this pull request? This PR adds more Python UDTF documentation for functions that accept input tables. ### Why are the changes needed? This

Re: [PR] [SPARK-39771][CORE] Add a warning msg in `Dependency` when a too large number of shuffle blocks is to be created. [spark]

2024-03-04 Thread via GitHub

sadikovi commented on PR #45266: URL: https://github.com/apache/spark/pull/45266#issuecomment-1977437455 cc @y-wei to address the remaining comments and retrigger the tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-39771][CORE] Add a warning msg in `Dependency` when a too large number of shuffle blocks is to be created. [spark]

2024-03-04 Thread via GitHub

dongjoon-hyun commented on PR #45266: URL: https://github.com/apache/spark/pull/45266#issuecomment-1977373125 How about the AS-IS status, @mridulm ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-46350][SS] Fix state removal for stream-stream join with one watermark and one time-interval condition [spark]

2024-03-04 Thread via GitHub

neilramaswamy commented on code in PR #44323: URL: https://github.com/apache/spark/pull/44323#discussion_r1511719668 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinHelper.scala: ## @@ -198,31 +198,52 @@ object

Re: [PR] [SPARK-47271][DOCS] Explain importance of statistics on SQL performance tuning page [spark]

2024-03-04 Thread via GitHub

nchammas commented on code in PR #45374: URL: https://github.com/apache/spark/pull/45374#discussion_r1511715423 ## docs/sql-performance-tuning.md: ## @@ -157,6 +157,18 @@ SELECT /*+ REBALANCE(3, c) */ * FROM t; For more details please refer to the documentation of

Re: [PR] [SS] Add MapState implementation for State API v2. [spark]

2024-03-04 Thread via GitHub

jingz-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1511697862 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -29,6 +29,14 @@ sealed trait RocksDBKeyStateEncoder {

Re: [PR] [SS] Add MapState implementation for State API v2. [spark]

2024-03-04 Thread via GitHub

jingz-db commented on PR #45341: URL: https://github.com/apache/spark/pull/45341#issuecomment-1977336528 Thanks Eric for reviews on my old PR. I've resolved them and incorporated in this one already. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47242][BUILD] Bump ap-loader 3.0(v8) to support for async-profiler 3.0 [spark]

2024-03-04 Thread via GitHub

dongjoon-hyun commented on PR #45351: URL: https://github.com/apache/spark/pull/45351#issuecomment-1977336300 I added you to the Apache Spark contributor group, @SteNicholas , and assigned SPARK-47242 to you. Welcome to the Apache Spark community! -- This is an automated message from

Re: [PR] [SPARK-47242][BUILD] Bump ap-loader 3.0(v8) to support for async-profiler 3.0 [spark]

2024-03-04 Thread via GitHub

dongjoon-hyun closed pull request #45351: [SPARK-47242][BUILD] Bump ap-loader 3.0(v8) to support for async-profiler 3.0 URL: https://github.com/apache/spark/pull/45351 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers [spark]

2024-03-04 Thread via GitHub

xinrong-meng commented on code in PR #45269: URL: https://github.com/apache/spark/pull/45269#discussion_r1511707998 ## python/docs/source/development/debugging.rst: ## @@ -341,7 +372,12 @@ Python/Pandas UDF ~ To use this on Python/Pandas UDFs, PySpark

Re: [PR] [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers [spark]

2024-03-04 Thread via GitHub

xinrong-meng commented on code in PR #45269: URL: https://github.com/apache/spark/pull/45269#discussion_r1511705327 ## python/docs/source/reference/pyspark.sql/spark_session.rst: ## @@ -49,6 +49,7 @@ See also :class:`SparkSession`. SparkSession.createDataFrame

Re: [PR] [SPARK-47242][BUILD] Bump ap-loader 3.0(v8) to support for async-profiler 3.0 [spark]

2024-03-04 Thread via GitHub

parthchandra commented on PR #45351: URL: https://github.com/apache/spark/pull/45351#issuecomment-1977330468 > @parthchandra, thank you try it out. Have you tried anything wrong? I was able to try it out locally (non production) and the jfr files written were fine. I didn't see much

Re: [PR] [SS] Add MapState implementation for State API v2. [spark]

2024-03-04 Thread via GitHub

jingz-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1511697862 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -29,6 +29,14 @@ sealed trait RocksDBKeyStateEncoder {

Re: [PR] [SS] Add MapState implementation for State API v2. [spark]

2024-03-04 Thread via GitHub

jingz-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1511697862 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -29,6 +29,14 @@ sealed trait RocksDBKeyStateEncoder {

Re: [PR] [SS] Add MapState implementation for State API v2. [spark]

2024-03-04 Thread via GitHub

jingz-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1511693823 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala: ## @@ -60,13 +60,25 @@ trait ReadStateStore { /** Version of the data

Re: [PR] [SS] Add MapState implementation for State API v2. [spark]

2024-03-04 Thread via GitHub

jingz-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1511697862 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -29,6 +29,14 @@ sealed trait RocksDBKeyStateEncoder {

Re: [PR] [SS] Add MapState implementation for State API v2. [spark]

2024-03-04 Thread via GitHub

jingz-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1511697862 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateEncoder.scala: ## @@ -29,6 +29,14 @@ sealed trait RocksDBKeyStateEncoder {

1 2 >

1 - 100 of 185 matches

Mail list logo