[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41505: [SPARK-43938][CONNECT][PYTHON] Add to_* functions to Scala and Python

2023-06-08 Thread via GitHub
HyukjinKwon commented on code in PR #41505: URL: https://github.com/apache/spark/pull/41505#discussion_r1222556739 ## python/pyspark/sql/functions.py: ## @@ -9929,6 +10078,26 @@ def map_zip_with( return _invoke_higher_order_function("MapZipWith", [col1, col2], [f]) +def

[GitHub] [spark] Tagar commented on pull request #40204: [SPARK-42601][SQL] New physical type Decimal128 for DecimalType

2023-06-08 Thread via GitHub
Tagar commented on PR #40204: URL: https://github.com/apache/spark/pull/40204#issuecomment-1582014032 > The implementation of Spark Decimal holds a BigDecimal or Long value Do you mind pointing me in the code when `Long` is used by `Decimal` instead of `BigDecimal`? Thanks -- This

[GitHub] [spark] zhengruifeng opened a new pull request, #41511: [SPARK-43613][PS][CONNECT] Enable `pyspark.pandas.spark.functions.covar` in Spark Connect

2023-06-08 Thread via GitHub
zhengruifeng opened a new pull request, #41511: URL: https://github.com/apache/spark/pull/41511 ### What changes were proposed in this pull request? Enable `pyspark.pandas.spark.functions.covar` in Spark Connect ### Why are the changes needed? to support pandas dedicated expr

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41505: [SPARK-43938][CONNECT][PYTHON] Add to_* functions to Scala and Python

2023-06-08 Thread via GitHub
HyukjinKwon commented on code in PR #41505: URL: https://github.com/apache/spark/pull/41505#discussion_r1222562522 ## python/pyspark/sql/functions.py: ## @@ -9929,6 +10078,26 @@ def map_zip_with( return _invoke_higher_order_function("MapZipWith", [col1, col2], [f]) +def

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41505: [SPARK-43938][CONNECT][PYTHON] Add to_* functions to Scala and Python

2023-06-08 Thread via GitHub
zhengruifeng commented on code in PR #41505: URL: https://github.com/apache/spark/pull/41505#discussion_r1222575032 ## python/pyspark/sql/functions.py: ## @@ -9929,6 +10078,26 @@ def map_zip_with( return _invoke_higher_order_function("MapZipWith", [col1, col2], [f]) +de

[GitHub] [spark] zhengruifeng closed pull request #41508: [SPARK-43712][SPARK-43713][CONNECT][PS] Enable parity test: `test_line_plot`, `test_pie_plot`.

2023-06-08 Thread via GitHub
zhengruifeng closed pull request #41508: [SPARK-43712][SPARK-43713][CONNECT][PS] Enable parity test: `test_line_plot`, `test_pie_plot`. URL: https://github.com/apache/spark/pull/41508 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [spark] zhengruifeng commented on pull request #41508: [SPARK-43712][SPARK-43713][CONNECT][PS] Enable parity test: `test_line_plot`, `test_pie_plot`.

2023-06-08 Thread via GitHub
zhengruifeng commented on PR #41508: URL: https://github.com/apache/spark/pull/41508#issuecomment-1582056737 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng commented on pull request #41444: [SPARK-43916][SQL][PYTHON][CONNECT] Add percentile to Scala and Python API

2023-06-08 Thread via GitHub
zhengruifeng commented on PR #41444: URL: https://github.com/apache/spark/pull/41444#issuecomment-1582058017 @beliefer would you mind re-trigger the failed test? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on a diff in pull request #41348: [SPARK-43203][SQL] Move all Drop Table case to DataSource V2

2023-06-08 Thread via GitHub
cloud-fan commented on code in PR #41348: URL: https://github.com/apache/spark/pull/41348#discussion_r1222589939 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/IdentifierImpl.scala: ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] [spark] cloud-fan commented on a diff in pull request #41348: [SPARK-43203][SQL] Move all Drop Table case to DataSource V2

2023-06-08 Thread via GitHub
cloud-fan commented on code in PR #41348: URL: https://github.com/apache/spark/pull/41348#discussion_r1222590407 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/DropTableSuite.scala: ## @@ -26,7 +26,7 @@ class DropTableSuite extends v1.DropTableSuiteBase w

[GitHub] [spark] cloud-fan commented on pull request #41251: [SPARK-43521][SQL] Add `CREATE TABLE LIKE FILE` statement

2023-06-08 Thread via GitHub
cloud-fan commented on PR #41251: URL: https://github.com/apache/spark/pull/41251#issuecomment-1582070117 Is it a popular SQL syntax in other databases? And is it the same with `CREATE TABLE t LOCATION path`? -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [spark] LuciferYang commented on a diff in pull request #41505: [SPARK-43938][CONNECT][PYTHON] Add to_* functions to Scala and Python

2023-06-08 Thread via GitHub
LuciferYang commented on code in PR #41505: URL: https://github.com/apache/spark/pull/41505#discussion_r1222624212 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -5292,6 +5292,183 @@ object functions { */ def hours(e: Column): Column = withExpr { H

[GitHub] [spark] beliefer commented on pull request #41444: [SPARK-43916][SQL][PYTHON][CONNECT] Add percentile to Scala and Python API

2023-06-08 Thread via GitHub
beliefer commented on PR #41444: URL: https://github.com/apache/spark/pull/41444#issuecomment-1582112096 > @beliefer would you mind re-trigger the failed test? triggered! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] panbingkun commented on a diff in pull request #41505: [SPARK-43938][CONNECT][PYTHON] Add to_* functions to Scala and Python

2023-06-08 Thread via GitHub
panbingkun commented on code in PR #41505: URL: https://github.com/apache/spark/pull/41505#discussion_r1222631192 ## python/pyspark/sql/functions.py: ## @@ -9929,6 +10078,26 @@ def map_zip_with( return _invoke_higher_order_function("MapZipWith", [col1, col2], [f]) +def

[GitHub] [spark] itholic commented on a diff in pull request #41504: [SPARK-44004][SQL] Assign name & improve error message for frequent LEGACY errors.

2023-06-08 Thread via GitHub
itholic commented on code in PR #41504: URL: https://github.com/apache/spark/pull/41504#discussion_r1222632858 ## core/src/main/resources/error/error-classes.json: ## @@ -2082,6 +2117,11 @@ ], "sqlState" : "42704" }, + "UNRESOLVABLE_TABLE_VALUED_FUNCTION" : { +

[GitHub] [spark] itholic commented on pull request #41504: [SPARK-44004][SQL] Assign name & improve error message for frequent LEGACY errors.

2023-06-08 Thread via GitHub
itholic commented on PR #41504: URL: https://github.com/apache/spark/pull/41504#issuecomment-1582120903 Thanks @cloud-fan for review. Just adjusted the comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] shuyouZZ commented on a diff in pull request #41491: [SPARK-43991][CORE] Use the value of spark.eventLog.compression.codec set by user when write compact file

2023-06-08 Thread via GitHub
shuyouZZ commented on code in PR #41491: URL: https://github.com/apache/spark/pull/41491#discussion_r1222636986 ## core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala: ## @@ -221,5 +222,15 @@ private class CompactedEventLogFileWriter( hadoopConf:

[GitHub] [spark] panbingkun commented on a diff in pull request #41505: [SPARK-43938][CONNECT][PYTHON] Add to_* functions to Scala and Python

2023-06-08 Thread via GitHub
panbingkun commented on code in PR #41505: URL: https://github.com/apache/spark/pull/41505#discussion_r1222637332 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -5292,6 +5292,183 @@ object functions { */ def hours(e: Column): Column = withExpr { Ho

[GitHub] [spark] beliefer commented on pull request #41497: [SPARK-43992][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listFunctions

2023-06-08 Thread via GitHub
beliefer commented on PR #41497: URL: https://github.com/apache/spark/pull/41497#issuecomment-1582127003 cc @cloud-fan @zhengruifeng @amaliujia The GA failure is unrelated to this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] shuyouZZ commented on a diff in pull request #41491: [SPARK-43991][CORE] Use the value of spark.eventLog.compression.codec set by user when write compact file

2023-06-08 Thread via GitHub
shuyouZZ commented on code in PR #41491: URL: https://github.com/apache/spark/pull/41491#discussion_r1222640174 ## core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala: ## @@ -221,5 +222,15 @@ private class CompactedEventLogFileWriter( hadoopConf:

[GitHub] [spark] YannByron commented on pull request #41417: [SPARK-43908][SQL] Choose the bigger rowCount to initialize BloomFilterAggregate in InjectRuntimeFilter

2023-06-08 Thread via GitHub
YannByron commented on PR #41417: URL: https://github.com/apache/spark/pull/41417#issuecomment-1582138917 @cloud-fan @sigmod @somani can you have a chance to take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] beliefer commented on pull request #40204: [SPARK-42601][SQL] New physical type Decimal128 for DecimalType

2023-06-08 Thread via GitHub
beliefer commented on PR #40204: URL: https://github.com/apache/spark/pull/40204#issuecomment-1582152357 > Do you mind pointing me in the code when `Long` is used by `Decimal` instead of `BigDecimal`? Thanks Because the performance of `BigDecimal` is bad. -- This is an automated me

[GitHub] [spark] dongjoon-hyun commented on pull request #41507: [SPARK-43273][SQL] Support `lz4raw` compression codec for Parquet

2023-06-08 Thread via GitHub
dongjoon-hyun commented on PR #41507: URL: https://github.com/apache/spark/pull/41507#issuecomment-1582160844 The first commit passed all tests already (except pyspark-pandas-slow-connect). And, I verified the second commit manually. Merged to master. ``` [info] ParquetCodecSuite:

[GitHub] [spark] dongjoon-hyun closed pull request #41507: [SPARK-43273][SQL] Support `lz4raw` compression codec for Parquet

2023-06-08 Thread via GitHub
dongjoon-hyun closed pull request #41507: [SPARK-43273][SQL] Support `lz4raw` compression codec for Parquet URL: https://github.com/apache/spark/pull/41507 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-08 Thread via GitHub
dongjoon-hyun commented on PR #41448: URL: https://github.com/apache/spark/pull/41448#issuecomment-1582163567 Thank you for updating, @aokolnychyi ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-08 Thread via GitHub
dongjoon-hyun commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1222675432 ## sql/core/src/test/scala/org/apache/spark/sql/connector/MergeIntoTableSuiteBase.scala: ## @@ -0,0 +1,1344 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-08 Thread via GitHub
dongjoon-hyun commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1222677031 ## core/src/main/resources/error/error-classes.json: ## @@ -1539,6 +1539,13 @@ "Parse Mode: . To process malformed records as null result, try setting the

[GitHub] [spark] dongjoon-hyun commented on pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-08 Thread via GitHub
dongjoon-hyun commented on PR #41448: URL: https://github.com/apache/spark/pull/41448#issuecomment-1582173316 Also, cc @cloud-fan, @viirya, @huaxingao, @sunchao once more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [spark] itholic opened a new pull request, #41512: [SPARK-43700][SPARK-43701][CONNECT][PS] Enable `TimedeltaOps.(sub|rsub)` with Spark Connect

2023-06-08 Thread via GitHub
itholic opened a new pull request, #41512: URL: https://github.com/apache/spark/pull/41512 ### What changes were proposed in this pull request? This PR proposes to enable pandas-on-Spark `TimedeltaOps.sub` & `TimedeltaOps.rsub` for Spark Connect. ### Why are the changes nee

[GitHub] [spark] LuciferYang closed pull request #41494: [SPARK-43994][BUILD] Upgrade zstd-jni to 1.5.5-4

2023-06-08 Thread via GitHub
LuciferYang closed pull request #41494: [SPARK-43994][BUILD] Upgrade zstd-jni to 1.5.5-4 URL: https://github.com/apache/spark/pull/41494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #41494: [SPARK-43994][BUILD] Upgrade zstd-jni to 1.5.5-4

2023-06-08 Thread via GitHub
LuciferYang commented on PR #41494: URL: https://github.com/apache/spark/pull/41494#issuecomment-1582176011 Merged to master. Thanks @dongjoon-hyun @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] dongjoon-hyun commented on pull request #41257: [SPARK-43657][K8S]: reuse config map for executor when running on k8s-cluster mode

2023-06-08 Thread via GitHub
dongjoon-hyun commented on PR #41257: URL: https://github.com/apache/spark/pull/41257#issuecomment-1582182931 What I meant was 1. Not to deprecate `spark.kubernetes.executor.disableConfigMap`. 2. So, if `spark.kubernetes.executor.disableConfigMap` is given, executors should not reuse D

[GitHub] [spark] dongjoon-hyun commented on pull request #41494: [SPARK-43994][BUILD] Upgrade zstd-jni to 1.5.5-4

2023-06-08 Thread via GitHub
dongjoon-hyun commented on PR #41494: URL: https://github.com/apache/spark/pull/41494#issuecomment-1582183635 Thank you, @LuciferYang and @panbingkun . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] LuciferYang commented on pull request #41469: [SPARK-43974][CONNECT][BUILD] Upgrade buf to v1.21.0

2023-06-08 Thread via GitHub
LuciferYang commented on PR #41469: URL: https://github.com/apache/spark/pull/41469#issuecomment-1582186947 I set this to draft first to avoid unexpected merging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng commented on pull request #41435: [SPARK-43943][SQL][PYTHON][CONNECT] Add SQL math functions to Scala and Python

2023-06-08 Thread via GitHub
zhengruifeng commented on PR #41435: URL: https://github.com/apache/spark/pull/41435#issuecomment-1582197323 @HyukjinKwon it seems that `sql - other` start failing again ... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41512: [SPARK-43700][SPARK-43701][CONNECT][PS] Enable `TimedeltaOps.(sub|rsub)` with Spark Connect

2023-06-08 Thread via GitHub
HyukjinKwon commented on code in PR #41512: URL: https://github.com/apache/spark/pull/41512#discussion_r1222709092 ## python/pyspark/sql/connect/column.py: ## @@ -73,7 +73,17 @@ def _bin_op( ) -> Callable[["Column", Any], "Column"]: def wrapped(self: "Column", other: Any)

[GitHub] [spark] zhengruifeng commented on pull request #41497: [SPARK-43992][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listFunctions

2023-06-08 Thread via GitHub
zhengruifeng commented on PR #41497: URL: https://github.com/apache/spark/pull/41497#issuecomment-158658 The failure in `sql - other` should be unrelated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [spark] zhengruifeng closed pull request #41497: [SPARK-43992][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listFunctions

2023-06-08 Thread via GitHub
zhengruifeng closed pull request #41497: [SPARK-43992][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listFunctions URL: https://github.com/apache/spark/pull/41497 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [spark] zhengruifeng commented on pull request #41497: [SPARK-43992][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listFunctions

2023-06-08 Thread via GitHub
zhengruifeng commented on PR #41497: URL: https://github.com/apache/spark/pull/41497#issuecomment-1582224856 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng commented on pull request #41511: [SPARK-43613][PS][CONNECT] Enable `pyspark.pandas.spark.functions.covar` in Spark Connect

2023-06-08 Thread via GitHub
zhengruifeng commented on PR #41511: URL: https://github.com/apache/spark/pull/41511#issuecomment-1582228937 also cc @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41444: [SPARK-43916][SQL][PYTHON][CONNECT] Add percentile to Scala and Python API

2023-06-08 Thread via GitHub
zhengruifeng commented on code in PR #41444: URL: https://github.com/apache/spark/pull/41444#discussion_r1222736896 ## python/pyspark/sql/tests/test_functions.py: ## @@ -709,6 +709,52 @@ def test_overlay(self): message_parameters={"arg_name": "len", "arg_type": "flo

[GitHub] [spark] zhengruifeng closed pull request #41444: [SPARK-43916][SQL][PYTHON][CONNECT] Add percentile to Scala and Python API

2023-06-08 Thread via GitHub
zhengruifeng closed pull request #41444: [SPARK-43916][SQL][PYTHON][CONNECT] Add percentile to Scala and Python API URL: https://github.com/apache/spark/pull/41444 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] zhengruifeng commented on pull request #41444: [SPARK-43916][SQL][PYTHON][CONNECT] Add percentile to Scala and Python API

2023-06-08 Thread via GitHub
zhengruifeng commented on PR #41444: URL: https://github.com/apache/spark/pull/41444#issuecomment-1582244517 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41505: [SPARK-43938][CONNECT][PYTHON] Add to_* functions to Scala and Python

2023-06-08 Thread via GitHub
zhengruifeng commented on code in PR #41505: URL: https://github.com/apache/spark/pull/41505#discussion_r1222761412 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -5292,6 +5292,183 @@ object functions { */ def hours(e: Column): Column = withExpr {

[GitHub] [spark] itholic commented on a diff in pull request #41511: [SPARK-43613][PS][CONNECT] Enable `pyspark.pandas.spark.functions.covar` in Spark Connect

2023-06-08 Thread via GitHub
itholic commented on code in PR #41511: URL: https://github.com/apache/spark/pull/41511#discussion_r1222760105 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1686,6 +1692,11 @@ class SparkConnectPlanner(val sessi

[GitHub] [spark] TongWei1105 opened a new pull request, #41513: [SPARK-44007][SQL] Unresolved hint cause query failure

2023-06-08 Thread via GitHub
TongWei1105 opened a new pull request, #41513: URL: https://github.com/apache/spark/pull/41513 ### What changes were proposed in this pull request? After the Resolve Hints Rules are completed, immediately remove unknown Hints to avoid query errors caused by Unresolved Hints.

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41505: [SPARK-43938][CONNECT][PYTHON] Add to_* functions to Scala and Python

2023-06-08 Thread via GitHub
zhengruifeng commented on code in PR #41505: URL: https://github.com/apache/spark/pull/41505#discussion_r1222766865 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -5292,6 +5292,183 @@ object functions { */ def hours(e: Column): Column = withExpr {

[GitHub] [spark] zhengruifeng commented on a diff in pull request #41477: [SPARK-43931][SQL][PYTHON][CONNECT] Add make_* functions to Scala and Python

2023-06-08 Thread via GitHub
zhengruifeng commented on code in PR #41477: URL: https://github.com/apache/spark/pull/41477#discussion_r1222830771 ## python/pyspark/sql/connect/functions.py: ## @@ -2373,6 +2374,117 @@ def hours(col: "ColumnOrName") -> Column: hours.__doc__ = pysparkfuncs.hours.__doc__ +

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41496: [SPARK-42750][SQL][FOLLOWUP] Add INSERT OVERWRITE BY NAME statement

2023-06-08 Thread via GitHub
Hisoka-X commented on code in PR #41496: URL: https://github.com/apache/spark/pull/41496#discussion_r1222832772 ## sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala: ## @@ -160,6 +160,42 @@ trait SQLInsertTestSuite extends QueryTest with SQLTestUtils {

[GitHub] [spark] juliuszsompolski commented on a diff in pull request #41440: [SPARK-43952][CORE][CONNECT][SQL] Add SparkContext APIs for query cancellation by tag

2023-06-08 Thread via GitHub
juliuszsompolski commented on code in PR #41440: URL: https://github.com/apache/spark/pull/41440#discussion_r1222849401 ## core/src/main/scala/org/apache/spark/status/api/v1/api.scala: ## @@ -199,6 +199,7 @@ class JobData private[spark]( val completionTime: Option[Date],

[GitHub] [spark] zhengruifeng commented on pull request #41435: [SPARK-43943][SQL][PYTHON][CONNECT] Add SQL math functions to Scala and Python

2023-06-08 Thread via GitHub
zhengruifeng commented on PR #41435: URL: https://github.com/apache/spark/pull/41435#issuecomment-1582346607 on the sql side, this PR only touch `MathFunctionsSuite` in `sql - slow` ![image](https://github.com/apache/spark/assets/7322292/1819b8d2-8a12-48aa-ba7d-f2f698d6462d)

[GitHub] [spark] zhengruifeng closed pull request #41435: [SPARK-43943][SQL][PYTHON][CONNECT] Add SQL math functions to Scala and Python

2023-06-08 Thread via GitHub
zhengruifeng closed pull request #41435: [SPARK-43943][SQL][PYTHON][CONNECT] Add SQL math functions to Scala and Python URL: https://github.com/apache/spark/pull/41435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] zhengruifeng commented on pull request #41435: [SPARK-43943][SQL][PYTHON][CONNECT] Add SQL math functions to Scala and Python

2023-06-08 Thread via GitHub
zhengruifeng commented on PR #41435: URL: https://github.com/apache/spark/pull/41435#issuecomment-1582349003 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41348: [SPARK-43203][SQL] Move all Drop Table case to DataSource V2

2023-06-08 Thread via GitHub
Hisoka-X commented on code in PR #41348: URL: https://github.com/apache/spark/pull/41348#discussion_r1222860848 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/IdentifierImpl.scala: ## @@ -0,0 +1,63 @@ +/* + * Licensed to the Apache Software Foundation (AS

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41348: [SPARK-43203][SQL] Move all Drop Table case to DataSource V2

2023-06-08 Thread via GitHub
Hisoka-X commented on code in PR #41348: URL: https://github.com/apache/spark/pull/41348#discussion_r1222862287 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala: ## @@ -194,12 +203,30 @@ class V2SessionCatalog(catalog: SessionCatalo

[GitHub] [spark] zhengruifeng closed pull request #41510: [SPARK-43612][PYTHON][CONNECT][FOLLOW-UP] Copy dependent data files to data directory

2023-06-08 Thread via GitHub
zhengruifeng closed pull request #41510: [SPARK-43612][PYTHON][CONNECT][FOLLOW-UP] Copy dependent data files to data directory URL: https://github.com/apache/spark/pull/41510 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] panbingkun commented on a diff in pull request #41505: [SPARK-43938][CONNECT][PYTHON] Add to_* functions to Scala and Python

2023-06-08 Thread via GitHub
panbingkun commented on code in PR #41505: URL: https://github.com/apache/spark/pull/41505#discussion_r1222863294 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -5292,6 +5292,183 @@ object functions { */ def hours(e: Column): Column = withExpr { Ho

[GitHub] [spark] zhengruifeng commented on pull request #41510: [SPARK-43612][PYTHON][CONNECT][FOLLOW-UP] Copy dependent data files to data directory

2023-06-08 Thread via GitHub
zhengruifeng commented on PR #41510: URL: https://github.com/apache/spark/pull/41510#issuecomment-1582360979 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] shuyouZZ commented on a diff in pull request #41491: [SPARK-43991][CORE] Use the value of spark.eventLog.compression.codec set by user when write compact file

2023-06-08 Thread via GitHub
shuyouZZ commented on code in PR #41491: URL: https://github.com/apache/spark/pull/41491#discussion_r1222866586 ## core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala: ## @@ -221,5 +222,15 @@ private class CompactedEventLogFileWriter( hadoopConf:

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41348: [SPARK-43203][SQL] Move all Drop Table case to DataSource V2

2023-06-08 Thread via GitHub
Hisoka-X commented on code in PR #41348: URL: https://github.com/apache/spark/pull/41348#discussion_r1222869265 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala: ## @@ -194,12 +203,30 @@ class V2SessionCatalog(catalog: SessionCatalo

[GitHub] [spark] cloud-fan commented on a diff in pull request #41496: [SPARK-42750][SQL][FOLLOWUP] Add INSERT OVERWRITE BY NAME statement

2023-06-08 Thread via GitHub
cloud-fan commented on code in PR #41496: URL: https://github.com/apache/spark/pull/41496#discussion_r1222895160 ## sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala: ## @@ -440,14 +476,49 @@ trait SQLInsertTestSuite extends QueryTest with SQLTestUtils { }

[GitHub] [spark] beliefer commented on pull request #41444: [SPARK-43916][SQL][PYTHON][CONNECT] Add percentile to Scala and Python API

2023-06-08 Thread via GitHub
beliefer commented on PR #41444: URL: https://github.com/apache/spark/pull/41444#issuecomment-1582395402 @HyukjinKwon @cloud-fan @zhengruifeng Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41496: [SPARK-42750][SQL][FOLLOWUP] Add INSERT OVERWRITE BY NAME statement

2023-06-08 Thread via GitHub
Hisoka-X commented on code in PR #41496: URL: https://github.com/apache/spark/pull/41496#discussion_r1222901581 ## sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala: ## @@ -440,14 +476,49 @@ trait SQLInsertTestSuite extends QueryTest with SQLTestUtils { }

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41496: [SPARK-42750][SQL][FOLLOWUP] Add INSERT OVERWRITE BY NAME statement

2023-06-08 Thread via GitHub
Hisoka-X commented on code in PR #41496: URL: https://github.com/apache/spark/pull/41496#discussion_r1222908269 ## sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala: ## @@ -440,14 +476,49 @@ trait SQLInsertTestSuite extends QueryTest with SQLTestUtils { }

[GitHub] [spark] Hisoka-X commented on pull request #41251: [SPARK-43521][SQL] Add `CREATE TABLE LIKE FILE` statement

2023-06-08 Thread via GitHub
Hisoka-X commented on PR #41251: URL: https://github.com/apache/spark/pull/41251#issuecomment-1582418650 > Is it a popular SQL syntax in other databases? Just in hive, I think we should support hive syntax to make sure user can easy use Spark with hive catalog. > And is it the

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41496: [SPARK-42750][SQL][FOLLOWUP] Add INSERT OVERWRITE BY NAME statement

2023-06-08 Thread via GitHub
Hisoka-X commented on code in PR #41496: URL: https://github.com/apache/spark/pull/41496#discussion_r1222908269 ## sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala: ## @@ -440,14 +476,49 @@ trait SQLInsertTestSuite extends QueryTest with SQLTestUtils { }

[GitHub] [spark] pan3793 commented on a diff in pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-08 Thread via GitHub
pan3793 commented on code in PR #41448: URL: https://github.com/apache/spark/pull/41448#discussion_r1222916939 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/MergeRowsExec.scala: ## @@ -0,0 +1,216 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] [spark] beliefer commented on pull request #41497: [SPARK-43992][SQL][PYTHON][CONNECT] Add optional pattern for Catalog.listFunctions

2023-06-08 Thread via GitHub
beliefer commented on PR #41497: URL: https://github.com/apache/spark/pull/41497#issuecomment-1582431770 @zhengruifeng @HyukjinKwon Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] itholic opened a new pull request, #41514: [SPARK-43684][SPARK-43685][SPARK-43686][SPARK-43691][CONNECT][PS] Fix `(NullOps|NumOps).(eq|ne)` for Spark Connect.

2023-06-08 Thread via GitHub
itholic opened a new pull request, #41514: URL: https://github.com/apache/spark/pull/41514 ### What changes were proposed in this pull request? This PR proposes to fix `NullOps.(eq|ne)` and `NumOps.(eq|ne)` for pandas API on Spark with Spark Connect. This includes SPARK-43684,

[GitHub] [spark] MaxGekk commented on pull request #41465: [SPARK-44006][CONNECT][PYTHON] Support cache artifacts

2023-06-08 Thread via GitHub
MaxGekk commented on PR #41465: URL: https://github.com/apache/spark/pull/41465#issuecomment-1582462888 Merging to master. Thank you, @HyukjinKwon for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] MaxGekk closed pull request #41465: [SPARK-44006][CONNECT][PYTHON] Support cache artifacts

2023-06-08 Thread via GitHub
MaxGekk closed pull request #41465: [SPARK-44006][CONNECT][PYTHON] Support cache artifacts URL: https://github.com/apache/spark/pull/41465 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41514: [SPARK-43684][SPARK-43685][SPARK-43686][SPARK-43691][CONNECT][PS] Fix `(NullOps|NumOps).(eq|ne)` for Spark Connect.

2023-06-08 Thread via GitHub
HyukjinKwon commented on code in PR #41514: URL: https://github.com/apache/spark/pull/41514#discussion_r1222955048 ## python/pyspark/pandas/data_type_ops/null_ops.py: ## @@ -43,6 +43,22 @@ class NullOps(DataTypeOps): def pretty_name(self) -> str: return "nulls" +

[GitHub] [spark] LuciferYang commented on pull request #41427: [SPARK-43888][CONNECT][FOLLOW-UP] Spark Connect client should depend on common-utils explicitly

2023-06-08 Thread via GitHub
LuciferYang commented on PR #41427: URL: https://github.com/apache/spark/pull/41427#issuecomment-1582484997 @amaliujia sorry to missing message, you can refer to https://github.com/apache/spark/blob/958b85418034d3bdce56a7520c0728d666c79480/connector/connect/client/jvm/pom.xml#L165

[GitHub] [spark] beliefer opened a new pull request, #41515: [SPARK-43934][SQL][PYTHON][CONNECT] Add regexp_* functions to Scala and Python

2023-06-08 Thread via GitHub
beliefer opened a new pull request, #41515: URL: https://github.com/apache/spark/pull/41515 ### What changes were proposed in this pull request? This PR want add regexp_* functions to Scala, Python and Connect API. These functions show below. - rlike - regexp - rege

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #41478: [WIP][SPARK-43981][PYTHON][ML] Basic saving / loading implementation for ML on spark connect

2023-06-08 Thread via GitHub
WeichenXu123 commented on code in PR #41478: URL: https://github.com/apache/spark/pull/41478#discussion_r1222997971 ## python/pyspark/mlv2/io_utils.py: ## @@ -0,0 +1,159 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements

[GitHub] [spark] beliefer commented on a diff in pull request #41476: [SPARK-43914][SQL] Assign names to the error class _LEGACY_ERROR_TEMP_[2433-2437]

2023-06-08 Thread via GitHub
beliefer commented on code in PR #41476: URL: https://github.com/apache/spark/pull/41476#discussion_r1223008883 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala: ## @@ -787,8 +787,70 @@ class AnalysisErrorSuite extends AnalysisTest {

[GitHub] [spark] zhengruifeng opened a new pull request, #41516: [SPARK-43932][SQL][PYTHON][CONNECT] Add `current` like functions to Scala and Python

2023-06-08 Thread via GitHub
zhengruifeng opened a new pull request, #41516: URL: https://github.com/apache/spark/pull/41516 ### What changes were proposed in this pull request? Add following functions: - curdate - current_catalog - current_database - current_schema - current_timezone - current_u

[GitHub] [spark] TongWei1105 commented on pull request #41513: [SPARK-44007][SQL] Unresolved hint cause query failure

2023-06-08 Thread via GitHub
TongWei1105 commented on PR #41513: URL: https://github.com/apache/spark/pull/41513#issuecomment-1582613859 cc @cloud-fan @wangyum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [spark] dongjoon-hyun commented on pull request #41448: [SPARK-43885][SQL] DataSource V2: Handle MERGE commands for delta-based sources

2023-06-08 Thread via GitHub
dongjoon-hyun commented on PR #41448: URL: https://github.com/apache/spark/pull/41448#issuecomment-1582718956 Could you fix this test failure by adding new error code to `README.md`? ``` [info] SparkThrowableSuite: [info] - No duplicate error classes (30 milliseconds) [info] - Err

[GitHub] [spark] LuciferYang commented on a diff in pull request #41516: [SPARK-43932][SQL][PYTHON][CONNECT] Add `current` like functions to Scala and Python

2023-06-08 Thread via GitHub
LuciferYang commented on code in PR #41516: URL: https://github.com/apache/spark/pull/41516#discussion_r1223159301 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala: ## @@ -5570,6 +5570,21 @@ class DataFrameFunctionsSuite extends QueryTest with Share

[GitHub] [spark] Hisoka-X opened a new pull request, #41517: [SPARK-42290][SQL] The OOM error can't be reported when AQE on

2023-06-08 Thread via GitHub
Hisoka-X opened a new pull request, #41517: URL: https://github.com/apache/spark/pull/41517 ### What changes were proposed in this pull request? When we use spark shell to submit job like this: ```scala $ spark-shell --conf spark.driver.memory=1g val df = spark.range(5

[GitHub] [spark] EnricoMi opened a new pull request, #41518: [SPARK-19335][SQL] Add upserts for writing to JDBC

2023-06-08 Thread via GitHub
EnricoMi opened a new pull request, #41518: URL: https://github.com/apache/spark/pull/41518 ### What changes were proposed in this pull request? This is a follow-up on #16685 and #16692. Implements upsert mode for `SaveMode.Append` of the MySql, MsSql, and Postgres JDBC source.

[GitHub] [spark] Hisoka-X commented on pull request #41517: [SPARK-42290][SQL] The OOM error can't be reported when AQE on

2023-06-08 Thread via GitHub
Hisoka-X commented on PR #41517: URL: https://github.com/apache/spark/pull/41517#issuecomment-1582763106 cc @cloud-fan @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] LuciferYang opened a new pull request, #41519: [SQL][TESTS] Fix `DataFrame function and SQL functon parity` in `DataFrameFunctionsSuite`

2023-06-08 Thread via GitHub
LuciferYang opened a new pull request, #41519: URL: https://github.com/apache/spark/pull/41519 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] justaparth commented on a diff in pull request #41498: [SPARK-44001][Protobuf] spark protobuf: handle well known wrapper types

2023-06-08 Thread via GitHub
justaparth commented on code in PR #41498: URL: https://github.com/apache/spark/pull/41498#discussion_r1223192039 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala: ## @@ -247,12 +247,86 @@ private[sql] class ProtobufDeserializer(

[GitHub] [spark] justaparth commented on a diff in pull request #41498: [SPARK-44001][Protobuf] spark protobuf: handle well known wrapper types

2023-06-08 Thread via GitHub
justaparth commented on code in PR #41498: URL: https://github.com/apache/spark/pull/41498#discussion_r1223192039 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala: ## @@ -247,12 +247,86 @@ private[sql] class ProtobufDeserializer(

[GitHub] [spark] justaparth commented on a diff in pull request #41498: [SPARK-44001][Protobuf] spark protobuf: handle well known wrapper types

2023-06-08 Thread via GitHub
justaparth commented on code in PR #41498: URL: https://github.com/apache/spark/pull/41498#discussion_r1223192039 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala: ## @@ -247,12 +247,86 @@ private[sql] class ProtobufDeserializer(

[GitHub] [spark] justaparth commented on a diff in pull request #41498: [SPARK-44001][Protobuf] spark protobuf: handle well known wrapper types

2023-06-08 Thread via GitHub
justaparth commented on code in PR #41498: URL: https://github.com/apache/spark/pull/41498#discussion_r1223202896 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala: ## @@ -247,12 +247,86 @@ private[sql] class ProtobufDeserializer(

[GitHub] [spark] justaparth commented on a diff in pull request #41498: [SPARK-44001][Protobuf] spark protobuf: handle well known wrapper types

2023-06-08 Thread via GitHub
justaparth commented on code in PR #41498: URL: https://github.com/apache/spark/pull/41498#discussion_r1223202896 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala: ## @@ -247,12 +247,86 @@ private[sql] class ProtobufDeserializer(

[GitHub] [spark] justaparth commented on a diff in pull request #41498: [SPARK-44001][Protobuf] spark protobuf: handle well known wrapper types

2023-06-08 Thread via GitHub
justaparth commented on code in PR #41498: URL: https://github.com/apache/spark/pull/41498#discussion_r1223202896 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala: ## @@ -247,12 +247,86 @@ private[sql] class ProtobufDeserializer(

[GitHub] [spark] LuciferYang commented on pull request #41519: [SPARK-43943][SQL][TESTS][FOLLOW] Fix `DataFrame function and SQL function parity` in `DataFrameFunctionsSuite`

2023-06-08 Thread via GitHub
LuciferYang commented on PR #41519: URL: https://github.com/apache/spark/pull/41519#issuecomment-1582842207 cc @zhengruifeng I local run `build/sbt clean "sql/testOnly org.apache.spark.sql.DataFrameFunctionsSuite" ` and found the test failed as pr description. I'm not sure wh

[GitHub] [spark] LuciferYang commented on a diff in pull request #41517: [SPARK-42290][SQL] Fix the OOM error can't be reported when AQE on

2023-06-08 Thread via GitHub
LuciferYang commented on code in PR #41517: URL: https://github.com/apache/spark/pull/41517#discussion_r1223250193 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala: ## @@ -308,6 +308,10 @@ class QueryExecutionErrorsSuite } } + tes

[GitHub] [spark] LuciferYang commented on pull request #41517: [SPARK-42290][SQL] Fix the OOM error can't be reported when AQE on

2023-06-08 Thread via GitHub
LuciferYang commented on PR #41517: URL: https://github.com/apache/spark/pull/41517#issuecomment-1582903605 cc @MaxGekk FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang commented on pull request #41435: [SPARK-43943][SQL][PYTHON][CONNECT] Add SQL math functions to Scala and Python

2023-06-08 Thread via GitHub
LuciferYang commented on PR #41435: URL: https://github.com/apache/spark/pull/41435#issuecomment-1582920854 > @HyukjinKwon it seems that `sql - other` start failing again ... I will investigate tomorrow, a little late today There is one case that may have been failed after this

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41517: [SPARK-42290][SQL] Fix the OOM error can't be reported when AQE on

2023-06-08 Thread via GitHub
dongjoon-hyun commented on code in PR #41517: URL: https://github.com/apache/spark/pull/41517#discussion_r1223271184 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -2398,7 +2398,7 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38173: [SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2226-2250

2023-06-08 Thread via GitHub
dongjoon-hyun commented on code in PR #38173: URL: https://github.com/apache/spark/pull/38173#discussion_r1223277475 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -2274,11 +2349,13 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark-connect-go] hiboyang commented on a diff in pull request #10: [SPARK-43351] Add DataFrame writer and reader prototype code

2023-06-08 Thread via GitHub
hiboyang commented on code in PR #10: URL: https://github.com/apache/spark-connect-go/pull/10#discussion_r1223289313 ## client/sql/dataframe.go: ## @@ -31,6 +31,7 @@ type DataFrame interface { Show(numRows int, truncate bool) error Schema() (*StructType, error)

[GitHub] [spark] dongjoon-hyun commented on pull request #41517: [SPARK-42290][SQL] Fix the OOM error can't be reported when AQE on

2023-06-08 Thread via GitHub
dongjoon-hyun commented on PR #41517: URL: https://github.com/apache/spark/pull/41517#issuecomment-1583002276 Also, cc @kazuyukitanimura too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] puchengy commented on pull request #41332: [SPARK-43801][SQL] Support unwrap date type to string type in UnwrapCastInBinaryComparison

2023-06-08 Thread via GitHub
puchengy commented on PR #41332: URL: https://github.com/apache/spark/pull/41332#issuecomment-1583011527 @cloud-fan Thank you, but > if you believe your string-type "timestamp" column always contains standard timestamp strings, then you can manually do string comparison Aren't

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40652: [SPARK-43018][SQL] Fix bug for INSERT commands with timestamp literals

2023-06-08 Thread via GitHub
dongjoon-hyun commented on code in PR #40652: URL: https://github.com/apache/spark/pull/40652#discussion_r1223358336 ## sql/core/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumnsSuite.scala: ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Fo

  1   2   3   >