Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-16 Thread via GitHub
uros-db commented on PR #46076: URL: https://github.com/apache/spark/pull/46076#issuecomment-2060508366 if this PR is no longer related to https://issues.apache.org/jira/browse/SPARK-47416, please delete tag in PR title -- This is an automated message from the Apache Git Service. To respo

Re: [PR] [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy [spark]

2024-04-16 Thread via GitHub
zhengruifeng commented on code in PR #46095: URL: https://github.com/apache/spark/pull/46095#discussion_r1568311255 ## core/src/main/scala/org/apache/spark/util/collection/Utils.scala: ## @@ -42,6 +42,23 @@ private[spark] object Utils extends SparkCollectionUtils { ordering

[PR] [SPARK-47883][SQL] Make `CollectTailExec.doExecute` lazy [spark]

2024-04-16 Thread via GitHub
zhengruifeng opened a new pull request, #46095: URL: https://github.com/apache/spark/pull/46095 ### What changes were proposed in this pull request? Make CollectTailExec execute lazily ### Why are the changes needed? In Spark Connect, `dataframe.tail` is based on `Tail(...).c

[PR] [SPARK-43025][SQL] Eliminate Union if filters have the same child plan [spark]

2024-04-16 Thread via GitHub
beliefer opened a new pull request, #40661: URL: https://github.com/apache/spark/pull/40661 ### What changes were proposed in this pull request? There are a lot of SQL with union multiple subquery with filter in user scenarios. Take an example, **q1** ``` SELECT ss_item_sk, ss_ti

Re: [PR] [SPARK-46812][CONNECT][PYTHON][FOLLOW-UP] Add pyspark.pyspark.sql.connect.resource into PyPi packaging [spark]

2024-04-16 Thread via GitHub
HyukjinKwon closed pull request #46094: [SPARK-46812][CONNECT][PYTHON][FOLLOW-UP] Add pyspark.pyspark.sql.connect.resource into PyPi packaging URL: https://github.com/apache/spark/pull/46094 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-46812][CONNECT][PYTHON][FOLLOW-UP] Add pyspark.pyspark.sql.connect.resource into PyPi packaging [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on PR #46094: URL: https://github.com/apache/spark/pull/46094#issuecomment-2060471004 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [WIP][SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
panbingkun commented on code in PR #46057: URL: https://github.com/apache/spark/pull/46057#discussion_r1568270967 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousTextSocketSource.scala: ## @@ -179,7 +180,7 @@ class TextSocketContinuousStr

Re: [PR] [SPARK-47839][SQL] Fix aggregate bug in RewriteWithExpression [spark]

2024-04-16 Thread via GitHub
cloud-fan commented on PR #46034: URL: https://github.com/apache/spark/pull/46034#issuecomment-2060437664 The test fails: `org.apache.spark.sql.connect.ProtoToParsedPlanTestSuite` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-47839][SQL] Fix aggregate bug in RewriteWithExpression [spark]

2024-04-16 Thread via GitHub
cloud-fan commented on code in PR #46034: URL: https://github.com/apache/spark/pull/46034#discussion_r1568259745 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpression.scala: ## @@ -21,36 +21,68 @@ import scala.collection.mutable import o

[PR] [SPARK-46812][CONNECT][PYTHON][FOLLOW-UP] Add pyspark.pyspark.sql.connect.resource into PyPi packaging [spark]

2024-04-16 Thread via GitHub
HyukjinKwon opened a new pull request, #46094: URL: https://github.com/apache/spark/pull/46094 ### What changes were proposed in this pull request? This PR proposes to add `pyspark.pyspark.sql.connect.resource` into PyPi packaging. ### Why are the changes needed? In orde

Re: [PR] [SPARK-47591][SQL] Hive-thriftserver: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang commented on PR #45926: URL: https://github.com/apache/spark/pull/45926#issuecomment-2060387686 @itholic Please resolve the conflict so that I can merge this one. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang closed pull request #46086: [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework URL: https://github.com/apache/spark/pull/46086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang commented on PR #46086: URL: https://github.com/apache/spark/pull/46086#issuecomment-2060385871 @dongjoon-hyun @HyukjinKwon Thanks for the review. Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [WIP][SPARK-47584][SQL] SQL core: Migrate logWarn with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
panbingkun commented on code in PR #46057: URL: https://github.com/apache/spark/pull/46057#discussion_r1568232939 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -172,12 +228,16 @@ object LogKey extends Enumeration { val TOPIC_PARTITION = Value

Re: [PR] [SPARK-43861][CORE] Do not delete inprogress log [spark]

2024-04-16 Thread via GitHub
bluzy commented on PR #46025: URL: https://github.com/apache/spark/pull/46025#issuecomment-2060354963 @dongjoon-hyun @mridulm I think incorrect inprogress file would be deleted on cleaner's schedule, isn't it? I concen that many spark streaming application can lives forever until nee

[PR] [SPARK-47882][SQL] createTableColumnTypes need to be mapped to database types instead of using directly [spark]

2024-04-16 Thread via GitHub
yaooqinn opened a new pull request, #46093: URL: https://github.com/apache/spark/pull/46093 … ### What changes were proposed in this pull request? createTableColumnTypes contains Spark SQL data type definitions. The underlying database might not recognize them, boolean

Re: [PR] [SPARK-47880][SQL][DOCS] Oracle: Document Mapping Spark SQL Data Types to Oracle [spark]

2024-04-16 Thread via GitHub
yaooqinn closed pull request #46092: [SPARK-47880][SQL][DOCS] Oracle: Document Mapping Spark SQL Data Types to Oracle URL: https://github.com/apache/spark/pull/46092 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47879][SQL] Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping [spark]

2024-04-16 Thread via GitHub
yaooqinn commented on PR #46091: URL: https://github.com/apache/spark/pull/46091#issuecomment-2060344927 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47880][SQL][DOCS] Oracle: Document Mapping Spark SQL Data Types to Oracle [spark]

2024-04-16 Thread via GitHub
yaooqinn commented on PR #46092: URL: https://github.com/apache/spark/pull/46092#issuecomment-2060346342 Merged to master, Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-47879][SQL] Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping [spark]

2024-04-16 Thread via GitHub
yaooqinn closed pull request #46091: [SPARK-47879][SQL] Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping URL: https://github.com/apache/spark/pull/46091 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568210917 ## python/pyspark/errors/utils.py: ## @@ -16,9 +16,14 @@ # import re -from typing import Dict, Match - +import functools +import inspect +from typing import Any, Ca

Re: [PR] [SPARK-47850][SQL] Support `spark.sql.hive.convertInsertingUnpartitionedTable` [spark]

2024-04-16 Thread via GitHub
pan3793 commented on PR #46052: URL: https://github.com/apache/spark/pull/46052#issuecomment-2060319567 cc @ulysses-you who made refactor on this part -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47880][SQL][DOCS] Oracle: Document Mapping Spark SQL Data Types to Oracle [spark]

2024-04-16 Thread via GitHub
yaooqinn commented on code in PR #46092: URL: https://github.com/apache/spark/pull/46092#discussion_r1568183291 ## docs/sql-data-sources-jdbc.md: ## @@ -1335,3 +1335,109 @@ as the activated JDBC Driver. + +### Mapping Spark SQL Data Types to Oracle + +The below table

Re: [PR] [SPARK-43861][CORE] Do not delete inprogress log [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on PR #46025: URL: https://github.com/apache/spark/pull/46025#issuecomment-2060295412 Yes, Mridul's comment is correct. I believe the AS-IS behavior is robust and safe and intended one instead of a bug. WDTY, @bluzy ? -- This is an automated message from the

Re: [PR] [SPARK-47880][SQL] Oracle: Document Mapping Spark SQL Data Types to Oracle [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on code in PR #46092: URL: https://github.com/apache/spark/pull/46092#discussion_r1568177905 ## docs/sql-data-sources-jdbc.md: ## @@ -1335,3 +1335,109 @@ as the activated JDBC Driver. + +### Mapping Spark SQL Data Types to Oracle + +The below

Re: [PR] [SPARK-47880][SQL] Oracle: Document Mapping Spark SQL Data Types to Oracle [spark]

2024-04-16 Thread via GitHub
yaooqinn commented on code in PR #46092: URL: https://github.com/apache/spark/pull/46092#discussion_r1568168654 ## docs/sql-data-sources-jdbc.md: ## @@ -1335,3 +1335,109 @@ as the activated JDBC Driver. + +### Mapping Spark SQL Data Types to Oracle + +The below table

Re: [PR] [SPARK-47880][SQL] Oracle: Document Mapping Spark SQL Data Types to Oracle [spark]

2024-04-16 Thread via GitHub
yaooqinn commented on code in PR #46092: URL: https://github.com/apache/spark/pull/46092#discussion_r1568169069 ## docs/sql-data-sources-jdbc.md: ## @@ -1335,3 +1335,109 @@ as the activated JDBC Driver. + +### Mapping Spark SQL Data Types to Oracle + +The below table

Re: [PR] [SPARK-47879][SQL] Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping [spark]

2024-04-16 Thread via GitHub
yaooqinn commented on PR #46091: URL: https://github.com/apache/spark/pull/46091#issuecomment-2060277921 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[PR] [SPARK-47880][SQL] Oracle: Document Mapping Spark SQL Data Types to Oracle [spark]

2024-04-16 Thread via GitHub
yaooqinn opened a new pull request, #46092: URL: https://github.com/apache/spark/pull/46092 ### What changes were proposed in this pull request? Documents Mapping Spark SQL Data Types to Oracle ### Why are the changes needed? documentation improvement

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568166135 ## python/pyspark/errors/utils.py: ## @@ -16,9 +16,14 @@ # import re -from typing import Dict, Match - +import functools +import inspect +from typing import Any

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568165854 ## python/pyspark/errors/utils.py: ## @@ -119,3 +124,68 @@ def get_message_template(self, error_class: str) -> str: message_template = main_message_tem

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568162931 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -1011,36 +1011,6 @@ def test_dataframe_error_context(self): pyspark_fragment="eqNullSafe",

Re: [PR] [SPARK-47838][BUILD] Upgrade `rocksdbjni` to 8.11.4 [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on PR #46065: URL: https://github.com/apache/spark/pull/46065#issuecomment-2060267298 Merged to master for Apache Spark 4.0.0. Thank YOU for the contribution, @neilramaswamy . -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] [SPARK-47838][BUILD] Upgrade `rocksdbjni` to 8.11.4 [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun closed pull request #46065: [SPARK-47838][BUILD] Upgrade `rocksdbjni` to 8.11.4 URL: https://github.com/apache/spark/pull/46065 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568157339 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -1011,36 +1011,6 @@ def test_dataframe_error_context(self): pyspark_fragment="eqNullSafe",

Re: [PR] [SPARK-47765][SQL] Add SET COLLATION to parser rules [spark]

2024-04-16 Thread via GitHub
cloud-fan commented on PR #45946: URL: https://github.com/apache/spark/pull/45946#issuecomment-2060252622 Shall we fail this command if the string collation feature flag is turned off? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] [SPARK-47879][SQL] Oracle: Use VARCHAR2 instead of VARCHAR for VarcharType mapping [spark]

2024-04-16 Thread via GitHub
yaooqinn opened a new pull request, #46091: URL: https://github.com/apache/spark/pull/46091 ### What changes were proposed in this pull request? Use VARCHAR2 instead of VARCHAR for VarcharType mapping on the write-side. VARCHAR is a synonym of VARCHAR2 but it's uns

Re: [PR] [SPARK-47871][SQL] Oracle: Map TimestampType to TIMESTAMP WITH LOCAL TIME ZONE [spark]

2024-04-16 Thread via GitHub
yaooqinn commented on PR #46080: URL: https://github.com/apache/spark/pull/46080#issuecomment-2060233886 Thank you very much as always @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
ueshin commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568131652 ## python/pyspark/sql/tests/test_dataframe.py: ## @@ -1011,36 +1011,6 @@ def test_dataframe_error_context(self): pyspark_fragment="eqNullSafe",

Re: [PR] [SPARK-47838][BUILD] Upgrade `rocksdbjni` to 8.11.4 [spark]

2024-04-16 Thread via GitHub
neilramaswamy commented on PR #46065: URL: https://github.com/apache/spark/pull/46065#issuecomment-2060226204 @dongjoon-hyun, should be ready to merge now. Appreciate your feedback! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568119580 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -160,6 +160,8 @@ case class DataFrameQueryContext( val pysparkFragment: Stri

Re: [PR] [SPARK-47870][SQL] Optimize predicate after push extra predicate through join [spark]

2024-04-16 Thread via GitHub
zml1206 commented on PR #46085: URL: https://github.com/apache/spark/pull/46085#issuecomment-2060202987 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] [SPARK-47870][SQL] Optimize predicate after push extra predicate through join [spark]

2024-04-16 Thread via GitHub
zml1206 commented on code in PR #46085: URL: https://github.com/apache/spark/pull/46085#discussion_r1568115344 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala: ## @@ -46,15 +46,30 @@ class FilterPushdownSuite extends PlanTest {

Re: [PR] [SPARK-43861][CORE] Do not delete inprogress log [spark]

2024-04-16 Thread via GitHub
mridulm commented on PR #46025: URL: https://github.com/apache/spark/pull/46025#issuecomment-2060196960 Note that when driver crashes, the event file remains with `.inprogress` suffix. Not deleting these files would result in filling up the event directory - and eventually fail all jobs

Re: [PR] [SPARK-47858][SPARK-47852][PYTHON][SQL] Refactoring the structure for DataFrame error context [spark]

2024-04-16 Thread via GitHub
itholic commented on code in PR #46063: URL: https://github.com/apache/spark/pull/46063#discussion_r1568104641 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/trees/QueryContexts.scala: ## @@ -160,6 +160,8 @@ case class DataFrameQueryContext( val pysparkFragment: Stri

Re: [PR] [SPARK-47810][SQL] Replace equivalent expression to <=> in join condition [spark]

2024-04-16 Thread via GitHub
cloud-fan commented on code in PR #45999: URL: https://github.com/apache/spark/pull/45999#discussion_r1568103738 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJoinCondition.scala: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] [SPARK-47767][SQL] Show offset value in TakeOrderedAndProjectExec [spark]

2024-04-16 Thread via GitHub
guixiaowen commented on PR #45931: URL: https://github.com/apache/spark/pull/45931#issuecomment-2060168379 > Could you add one test case like `EXPLAIN ... LIMIT ... OFFSET ... ORDER BY ...` at https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/inputs/explain.s

Re: [PR] [SPARK-47810][SQL] Replace equivalent expression to <=> in join condition [spark]

2024-04-16 Thread via GitHub
zml1206 commented on code in PR #45999: URL: https://github.com/apache/spark/pull/45999#discussion_r1568084128 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJoinCondition.scala: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundati

Re: [PR] [SPARK-47876][PYTHON][DOCS] Improve docstring of mapInArrow [spark]

2024-04-16 Thread via GitHub
xinrong-meng closed pull request #46088: [SPARK-47876][PYTHON][DOCS] Improve docstring of mapInArrow URL: https://github.com/apache/spark/pull/46088 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-47876][PYTHON][DOCS] Improve docstring of mapInArrow [spark]

2024-04-16 Thread via GitHub
xinrong-meng commented on PR #46088: URL: https://github.com/apache/spark/pull/46088#issuecomment-2060150462 Thank you all, merged to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-47810][SQL] Replace equivalent expression to <=> in join condition [spark]

2024-04-16 Thread via GitHub
zml1206 commented on code in PR #45999: URL: https://github.com/apache/spark/pull/45999#discussion_r1568083598 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala: ## @@ -621,4 +621,14 @@ class DataFrameJoinSuite extends QueryTest checkAnswer(joined, Ro

Re: [PR] [SPARK-46375][DOCS] Add user guide for Python data source API [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on code in PR #46089: URL: https://github.com/apache/spark/pull/46089#discussion_r1568048732 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -0,0 +1,139 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contr

Re: [PR] [SPARK-46375][DOCS] Add user guide for Python data source API [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on PR #46089: URL: https://github.com/apache/spark/pull/46089#issuecomment-2060146837 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47876][PYTHON][DOCS] Improve docstring of mapInArrow [spark]

2024-04-16 Thread via GitHub
xinrong-meng commented on PR #46088: URL: https://github.com/apache/spark/pull/46088#issuecomment-2060148294 Good idea! I'll file a separate PR @zhengruifeng thanks! Thanks @allisonwang-db I'll create tickets under the umbrella. -- This is an automated message from the Apache Git Se

Re: [PR] [SPARK-46375][DOCS] Add user guide for Python data source API [spark]

2024-04-16 Thread via GitHub
HyukjinKwon closed pull request #46089: [SPARK-46375][DOCS] Add user guide for Python data source API URL: https://github.com/apache/spark/pull/46089 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [WIP][SPARK-47763][CONNECT][TESTS] Enable local-cluster tests with pyspark-connect package [spark]

2024-04-16 Thread via GitHub
HyukjinKwon opened a new pull request, #46090: URL: https://github.com/apache/spark/pull/46090 ### What changes were proposed in this pull request? TBD ### Why are the changes needed? TBD ### Does this PR introduce _any_ user-facing change? TBD ### Ho

Re: [PR] [SPARK-47846][SQL] Add support for Variant type in from_json expression [spark]

2024-04-16 Thread via GitHub
harshmotw-db commented on PR #46046: URL: https://github.com/apache/spark/pull/46046#issuecomment-2060102585 @chenhao-db can you please look at this whenever you're free? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47418][SQL] Add hand-crafted implementations for lowercase uni… [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on PR #46082: URL: https://github.com/apache/spark/pull/46082#issuecomment-2060101653 Mind making the PR title complete? It's truncated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-47416][SQL] Add new functions to CollationBenchmark [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on code in PR #46078: URL: https://github.com/apache/spark/pull/46078#discussion_r1568055547 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CollationBenchmark.scala: ## @@ -100,6 +100,90 @@ abstract class CollationBenchmarkBase extends

Re: [PR] [SPARK-47845][SQL][PYTHON][CONNECT] Support Column type in split function for scala and python [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on PR #46045: URL: https://github.com/apache/spark/pull/46045#issuecomment-2060090501 I am fine with this change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46375][DOCS] Add user guide for Python data source API [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on code in PR #46089: URL: https://github.com/apache/spark/pull/46089#discussion_r1568048989 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -0,0 +1,139 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contr

Re: [PR] [SPARK-46375][DOCS] Add user guide for Python data source API [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on code in PR #46089: URL: https://github.com/apache/spark/pull/46089#discussion_r1568048732 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -0,0 +1,139 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contr

Re: [PR] [SPARK-47877][SS][CONNECT] Speed up test_parity_listener [spark]

2024-04-16 Thread via GitHub
HyukjinKwon closed pull request #46072: [SPARK-47877][SS][CONNECT] Speed up test_parity_listener URL: https://github.com/apache/spark/pull/46072 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-47877][SS][CONNECT] Speed up test_parity_listener [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on PR #46072: URL: https://github.com/apache/spark/pull/46072#issuecomment-2060083261 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun closed pull request #46087: [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` URL: https://github.com/apache/spark/pull/46087 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47760][SPARK-47763][CONNECT][TESTS] Reeanble Avro and Protobuf function doctests [spark]

2024-04-16 Thread via GitHub
HyukjinKwon closed pull request #46055: [SPARK-47760][SPARK-47763][CONNECT][TESTS] Reeanble Avro and Protobuf function doctests URL: https://github.com/apache/spark/pull/46055 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47760][SPARK-47763][CONNECT][TESTS] Reeanble Avro and Protobuf function doctests [spark]

2024-04-16 Thread via GitHub
HyukjinKwon commented on PR #46055: URL: https://github.com/apache/spark/pull/46055#issuecomment-2060081075 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47816][CONNECT][DOCS] Document the lazy evaluation of views in `spark.{sql, table}` [spark]

2024-04-16 Thread via GitHub
allisonwang-db commented on code in PR #46007: URL: https://github.com/apache/spark/pull/46007#discussion_r1568042050 ## python/pyspark/sql/session.py: ## @@ -1630,6 +1630,13 @@ def sql( --- :class:`DataFrame` +Notes +- +In Spa

Re: [PR] [SPARK-47868][CONNECT] Fix recursion limit error in SparkConnectPlanner and SparkSession [spark]

2024-04-16 Thread via GitHub
zhengruifeng closed pull request #46075: [SPARK-47868][CONNECT] Fix recursion limit error in SparkConnectPlanner and SparkSession URL: https://github.com/apache/spark/pull/46075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47868][CONNECT] Fix recursion limit error in SparkConnectPlanner and SparkSession [spark]

2024-04-16 Thread via GitHub
zhengruifeng commented on PR #46075: URL: https://github.com/apache/spark/pull/46075#issuecomment-2060071663 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-46375][DOCS] Add user guide for Python data source API [spark]

2024-04-16 Thread via GitHub
allisonwang-db opened a new pull request, #46089: URL: https://github.com/apache/spark/pull/46089 ### What changes were proposed in this pull request? This PR adds a new user guide for the Python data source API with a simple example. More examples (including streaming) will b

Re: [PR] [SPARK-47876][PYTHON][DOCS] Improve docstring of mapInArrow [spark]

2024-04-16 Thread via GitHub
zhengruifeng commented on PR #46088: URL: https://github.com/apache/spark/pull/46088#issuecomment-2060067749 the doc of `mapInArrow` is similar to `mapInPandas`, shall we refine the latter too? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-47618][CORE] Use `Magic Committer` for all S3 buckets by default [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on PR #45740: URL: https://github.com/apache/spark/pull/45740#issuecomment-2060044971 Thank you for your feedback, @steveloughran . Ya, as you mentioned, this is blocked by exactly those two configurations. ``` spark.sql.parquet.output.committer.class=org.apach

Re: [PR] [SPARK-47877][SS][CONNECT] Speed up test_parity_listener [spark]

2024-04-16 Thread via GitHub
WweiL commented on PR #46072: URL: https://github.com/apache/spark/pull/46072#issuecomment-2060036826 @HyukjinKwon Can you take a look? Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47810][SQL] Replace equivalent expression to <=> in join condition [spark]

2024-04-16 Thread via GitHub
anton5798 commented on code in PR #45999: URL: https://github.com/apache/spark/pull/45999#discussion_r1568012332 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala: ## @@ -621,4 +621,14 @@ class DataFrameJoinSuite extends QueryTest checkAnswer(joined,

Re: [PR] [SPARK-47810][SQL] Replace equivalent expression to <=> in join condition [spark]

2024-04-16 Thread via GitHub
anton5798 commented on code in PR #45999: URL: https://github.com/apache/spark/pull/45999#discussion_r1568011225 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJoinCondition.scala: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] [SPARK-47810][SQL] Replace equivalent expression to <=> in join condition [spark]

2024-04-16 Thread via GitHub
anton5798 commented on code in PR #45999: URL: https://github.com/apache/spark/pull/45999#discussion_r1568005755 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJoinCondition.scala: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on PR #46087: URL: https://github.com/apache/spark/pull/46087#issuecomment-2060008470 I removed the missed SPARK-46205 test case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang commented on code in PR #46086: URL: https://github.com/apache/spark/pull/46086#discussion_r1567997561 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala: ## @@ -20,18 +20,18 @@ package org.apache.spark.sql.hive.client import ja

Re: [PR] [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on PR #46087: URL: https://github.com/apache/spark/pull/46087#issuecomment-2060007634 Yes, there are other commits about `compression` code and some neutral changes. I believe it will be okay and the final goal is to bring it back again. -- This is an automated mes

[PR] [SPARK-47876][PYTHON][DOCS] Improve docstring of mapInArrow [spark]

2024-04-16 Thread via GitHub
xinrong-meng opened a new pull request, #46088: URL: https://github.com/apache/spark/pull/46088 ### What changes were proposed in this pull request? Improve docstring of mapInArrow: - "using a Python native function that takes and outputs a PyArrow's RecordBatch" is confusing cause

Re: [PR] [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on PR #46087: URL: https://github.com/apache/spark/pull/46087#issuecomment-2060004457 Thank you so much for swift help. I'll make it sure that all CIes passes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on code in PR #46086: URL: https://github.com/apache/spark/pull/46086#discussion_r1567994515 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala: ## @@ -20,18 +20,18 @@ package org.apache.spark.sql.hive.client import ja

Re: [PR] [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` [spark]

2024-04-16 Thread via GitHub
viirya commented on PR #46087: URL: https://github.com/apache/spark/pull/46087#issuecomment-2060002347 Pending CI. Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang commented on code in PR #46086: URL: https://github.com/apache/spark/pull/46086#discussion_r1567992633 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -31,10 +32,10 @@ import org.apache.hadoop.hive.metastore.api.hive_metastore

Re: [PR] [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on PR #46087: URL: https://github.com/apache/spark/pull/46087#issuecomment-2059998699 Sorry but could you review this reverting PR, @viirya ? While I've running this, I found my mistake. -- This is an automated message from the Apache Git Service. To respond to th

[PR] [SPARK-47875][CORE] Remove `spark.deploy.recoverySerializer` [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun opened a new pull request, #46087: URL: https://github.com/apache/spark/pull/46087 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
dongjoon-hyun commented on code in PR #46086: URL: https://github.com/apache/spark/pull/46086#discussion_r1567987357 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala: ## @@ -31,10 +32,10 @@ import org.apache.hadoop.hive.metastore.api.hive_metastore

Re: [PR] [SPARK-47838][BUILD] Upgrade `rocksdbjni` to 8.11.4 [spark]

2024-04-16 Thread via GitHub
neilramaswamy commented on PR #46065: URL: https://github.com/apache/spark/pull/46065#issuecomment-2059969162 @dongjoon-hyun numbers are still approximately the same (I just updated with the latest results), a few are better. Seems safe to merge when CI passes. Thanks! -- This is an auto

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-16 Thread via GitHub
ericm-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1567940592 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImplWithTTL.scala: ## @@ -0,0 +1,265 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang commented on PR #45923: URL: https://github.com/apache/spark/pull/45923#issuecomment-2059908947 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang closed pull request #45923: [SPARK-47590][SQL] Hive-thriftserver: Migrate logWarn with variables to structured logging framework URL: https://github.com/apache/spark/pull/45923 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SHUFFLE] [WIP] Prototype: store shuffle file on external storage like S3 [spark]

2024-04-16 Thread via GitHub
steveloughran commented on PR #34864: URL: https://github.com/apache/spark/pull/34864#issuecomment-2059907838 @michaelbilow hadoop s3a is on v2 sdk; the com.amazonaws classes are not on the CP and amazon are slowly stopping support. you cannot for example use the lower latency S3 express st

Re: [PR] [SPARK-47618][CORE] Use `Magic Committer` for all S3 buckets by default [spark]

2024-04-16 Thread via GitHub
steveloughran commented on PR #45740: URL: https://github.com/apache/spark/pull/45740#issuecomment-2059899891 I have no problems with the PR; we have made it the default in our releases. This could be a good time to revisit "why there's some separate PathOutputCommitter" stuff; origin

Re: [PR] [SPARK-47627][SQL] Add SQL MERGE syntax to enable schema evolution [spark]

2024-04-16 Thread via GitHub
xupefei commented on PR #45748: URL: https://github.com/apache/spark/pull/45748#issuecomment-2059899197 > @xupefei could you provide more details in the PR description? For example, what is the difference with/without `WITH SCHEMA EVOLUTION` Hi @gengliangwang, I added to the PR descri

Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]

2024-04-16 Thread via GitHub
ericm-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1567908966 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImplWithTTL.scala: ## @@ -0,0 +1,265 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang commented on PR #46086: URL: https://github.com/apache/spark/pull/46086#issuecomment-2059874634 cc @panbingkun @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-47588][CORE] Hive module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang opened a new pull request, #46086: URL: https://github.com/apache/spark/pull/46086 ### What changes were proposed in this pull request? Migrate logInfo in Hive module with variables to structured logging framework. ### Why are the changes needed?

Re: [PR] [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang closed pull request #46022: [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework URL: https://github.com/apache/spark/pull/46022 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework [spark]

2024-04-16 Thread via GitHub
gengliangwang commented on PR #46022: URL: https://github.com/apache/spark/pull/46022#issuecomment-2059867235 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

  1   2   3   >