[GitHub] [spark] cloud-fan commented on a diff in pull request #40865: [SPARK-43156][SQL] Fix `COUNT(*) is null` bug in correlated scalar subquery

2023-04-25 Thread via GitHub
cloud-fan commented on code in PR #40865: URL: https://github.com/apache/spark/pull/40865#discussion_r1177427622 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -459,7 +459,7 @@ object RewriteCorrelatedScalarSubquery extends Rule[Log

[GitHub] [spark] cloud-fan commented on a diff in pull request #40865: [SPARK-43156][SQL] Fix `COUNT(*) is null` bug in correlated scalar subquery

2023-04-25 Thread via GitHub
cloud-fan commented on code in PR #40865: URL: https://github.com/apache/spark/pull/40865#discussion_r1177426150 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -459,7 +459,7 @@ object RewriteCorrelatedScalarSubquery extends Rule[Log

[GitHub] [spark] cloud-fan commented on a diff in pull request #40865: [SPARK-43156][SQL] Fix `COUNT(*) is null` bug in correlated scalar subquery

2023-04-25 Thread via GitHub
cloud-fan commented on code in PR #40865: URL: https://github.com/apache/spark/pull/40865#discussion_r1177426150 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -459,7 +459,7 @@ object RewriteCorrelatedScalarSubquery extends Rule[Log

[GitHub] [spark] JkSelf commented on pull request #40914: [SPARK-43240][SQL][3.3] Fix the wrong result issue when calling df.describe() method.

2023-04-25 Thread via GitHub
JkSelf commented on PR #40914: URL: https://github.com/apache/spark/pull/40914#issuecomment-1522862616 Thanks for your review. @cloud-fan Can you help to merge? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] amaliujia commented on pull request #40931: [SPARK-43265] Move Error framework to a common utils module

2023-04-25 Thread via GitHub
amaliujia commented on PR #40931: URL: https://github.com/apache/spark/pull/40931#issuecomment-1522857821 @LuciferYang thank you! @cloud-fan then probably it goes back to what I did last time. What do you think? -- This is an automated message from the Apache Git Service. To respon

[GitHub] [spark] HyukjinKwon closed pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

2023-04-25 Thread via GitHub
HyukjinKwon closed pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used. URL: https://github.com/apache/spark/pull/40722 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] HyukjinKwon commented on pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

2023-04-25 Thread via GitHub
HyukjinKwon commented on PR #40722: URL: https://github.com/apache/spark/pull/40722#issuecomment-1522850607 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40954: [PYSPARK] [CONNECT] [ML] PySpark UDF supports python package dependencies

2023-04-25 Thread via GitHub
WeichenXu123 commented on code in PR #40954: URL: https://github.com/apache/spark/pull/40954#discussion_r1177395283 ## core/src/main/scala/org/apache/spark/api/python/PythonEnvSetup.scala: ## @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] databricks-david-lewis commented on a diff in pull request #40947: [Spark-43284] Switch back to url-encoded strings

2023-04-25 Thread via GitHub
databricks-david-lewis commented on code in PR #40947: URL: https://github.com/apache/spark/pull/40947#discussion_r1177391819 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala: ## @@ -265,8 +265,8 @@ object FileFormat { * fields of the [[

[GitHub] [spark] Surbhi-Vijay commented on a diff in pull request #40171: [SPARK-42598][TEST] Refactor TPCH schema to separate file similar to TPCDS for code reuse

2023-04-25 Thread via GitHub
Surbhi-Vijay commented on code in PR #40171: URL: https://github.com/apache/spark/pull/40171#discussion_r1177372033 ## sql/core/src/test/scala/org/apache/spark/sql/TPCSchema.scala: ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

[GitHub] [spark] zhengruifeng closed pull request #40617: [SPARK-42992][PYTHON] Introduce PySparkRuntimeError

2023-04-25 Thread via GitHub
zhengruifeng closed pull request #40617: [SPARK-42992][PYTHON] Introduce PySparkRuntimeError URL: https://github.com/apache/spark/pull/40617 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [spark] zhengruifeng commented on pull request #40617: [SPARK-42992][PYTHON] Introduce PySparkRuntimeError

2023-04-25 Thread via GitHub
zhengruifeng commented on PR #40617: URL: https://github.com/apache/spark/pull/40617#issuecomment-1522823970 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng commented on pull request #40938: [SPARK-43274][SPARK-43275][PYTHON][CONNECT] Introduce `PySparkNotImplementedError`

2023-04-25 Thread via GitHub
zhengruifeng commented on PR #40938: URL: https://github.com/apache/spark/pull/40938#issuecomment-1522820728 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng closed pull request #40938: [SPARK-43274][SPARK-43275][PYTHON][CONNECT] Introduce `PySparkNotImplementedError`

2023-04-25 Thread via GitHub
zhengruifeng closed pull request #40938: [SPARK-43274][SPARK-43275][PYTHON][CONNECT] Introduce `PySparkNotImplementedError` URL: https://github.com/apache/spark/pull/40938 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] LuciferYang commented on pull request #40931: [SPARK-43265] Move Error framework to a common utils module

2023-04-25 Thread via GitHub
LuciferYang commented on PR #40931: URL: https://github.com/apache/spark/pull/40931#issuecomment-1522820195 > @amaliujia @cloud-fan > > https://github.com/apache/spark/blob/b461cdea92ea08ce39bb3c9d733f0af7c56abf8d/project/SparkBuild.scala#L404-L407 > > should add `commonUtils`

[GitHub] [spark] cloud-fan commented on a diff in pull request #40922: [SPARK-43063][SQL][FOLLOWUP] Add ToPrettyString expression for Dataset.show

2023-04-25 Thread via GitHub
cloud-fan commented on code in PR #40922: URL: https://github.com/apache/spark/pull/40922#discussion_r1177364653 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -295,13 +289,8 @@ class Dataset[T] private[sql]( // first `truncate-3` and "..." schema

[GitHub] [spark] LuciferYang commented on pull request #40931: [SPARK-43265] Move Error framework to a common utils module

2023-04-25 Thread via GitHub
LuciferYang commented on PR #40931: URL: https://github.com/apache/spark/pull/40931#issuecomment-1522815901 @amaliujia @cloud-fan https://github.com/apache/spark/blob/b461cdea92ea08ce39bb3c9d733f0af7c56abf8d/project/SparkBuild.scala#L405 should add `commonUtils` to this Seq no

[GitHub] [spark] zhengruifeng commented on pull request #40943: [SPARK-43280][BUILD] Reimplement the protobuf breaking change checker

2023-04-25 Thread via GitHub
zhengruifeng commented on PR #40943: URL: https://github.com/apache/spark/pull/40943#issuecomment-1522813797 cc @HyukjinKwon @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] sadikovi commented on a diff in pull request #40922: [SPARK-43063][SQL][FOLLOWUP] Add ToPrettyString expression for Dataset.show

2023-04-25 Thread via GitHub
sadikovi commented on code in PR #40922: URL: https://github.com/apache/spark/pull/40922#discussion_r1177354343 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala: ## @@ -67,6 +67,8 @@ object StringUtils extends Logging { "(?s)" + out.result

[GitHub] [spark] itholic commented on a diff in pull request #40658: [WIP][SPARK-43024][PS] Upgrade pandas to 2.0.0

2023-04-25 Thread via GitHub
itholic commented on code in PR #40658: URL: https://github.com/apache/spark/pull/40658#discussion_r1177354305 ## python/pyspark/pandas/tests/test_dataframe_slow.py: ## @@ -1949,41 +1926,45 @@ def test_between_time(self): idx = pd.date_range("2018-04-09", periods=4, fre

[GitHub] [spark] LuciferYang commented on pull request #40931: [SPARK-43265] Move Error framework to a common utils module

2023-04-25 Thread via GitHub
LuciferYang commented on PR #40931: URL: https://github.com/apache/spark/pull/40931#issuecomment-1522803587 Looking -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] liang3zy22 opened a new pull request, #40955: [SPARK-42843][SQL] Update the error class _LEGACY_ERROR_TEMP_2007 to REGEX_GROUP_INDEX_EXCEED_REGEX_GROUP_COUNT

2023-04-25 Thread via GitHub
liang3zy22 opened a new pull request, #40955: URL: https://github.com/apache/spark/pull/40955 ### What changes were proposed in this pull request? Update the error class _LEGACY_ERROR_TEMP_2007 to REGEX_GROUP_INDEX_EXCEED_REGEX_GROUP_COUNT . ### Why are the changes

[GitHub] [spark] pjfanning commented on a diff in pull request #40933: [SPARK-43263][BUILD] Upgrade `FasterXML jackson` to 2.15.0

2023-04-25 Thread via GitHub
pjfanning commented on code in PR #40933: URL: https://github.com/apache/spark/pull/40933#discussion_r1177345718 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala: ## @@ -175,7 +187,13 @@ private[sql] class JSONOptions( parameters.get(WRIT

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40896: [SPARK-43229][ML][PYTHON][CONNECT] Introduce Barrier Python UDF

2023-04-25 Thread via GitHub
zhengruifeng commented on code in PR #40896: URL: https://github.com/apache/spark/pull/40896#discussion_r1177334616 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -333,6 +333,9 @@ message PythonUDF { bytes command = 3; // (Required) Py

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40896: [SPARK-43229][ML][PYTHON][CONNECT] Introduce Barrier Python UDF

2023-04-25 Thread via GitHub
zhengruifeng commented on code in PR #40896: URL: https://github.com/apache/spark/pull/40896#discussion_r1177334616 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -333,6 +333,9 @@ message PythonUDF { bytes command = 3; // (Required) Py

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40896: [SPARK-43229][ML][PYTHON][CONNECT] Introduce Barrier Python UDF

2023-04-25 Thread via GitHub
zhengruifeng commented on code in PR #40896: URL: https://github.com/apache/spark/pull/40896#discussion_r1177333283 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -826,9 +826,6 @@ message MapPartitions { // (Required) Input user-defined f

[GitHub] [spark] cloud-fan commented on pull request #40931: [SPARK-43265] Move Error framework to a common utils module

2023-04-25 Thread via GitHub
cloud-fan commented on PR #40931: URL: https://github.com/apache/spark/pull/40931#issuecomment-1522775572 Hi @LuciferYang , do you have any idea why mima fails for this PR? The error message says nothing. Thanks! -- This is an automated message from the Apache Git Service. To respond to t

[GitHub] [spark] cloud-fan closed pull request #40946: [SPARK-43156][SPARK-43098][SQL] Extend scalar subquery count bug test with decorrelateInnerQuery disabled

2023-04-25 Thread via GitHub
cloud-fan closed pull request #40946: [SPARK-43156][SPARK-43098][SQL] Extend scalar subquery count bug test with decorrelateInnerQuery disabled URL: https://github.com/apache/spark/pull/40946 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] cloud-fan commented on pull request #40946: [SPARK-43156][SPARK-43098][SQL] Extend scalar subquery count bug test with decorrelateInnerQuery disabled

2023-04-25 Thread via GitHub
cloud-fan commented on PR #40946: URL: https://github.com/apache/spark/pull/40946#issuecomment-1522753302 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] LuciferYang commented on pull request #40933: [SPARK-43263][BUILD] Upgrade `FasterXML jackson` to 2.15.0

2023-04-25 Thread via GitHub
LuciferYang commented on PR #40933: URL: https://github.com/apache/spark/pull/40933#issuecomment-1522751838 Please fix the compilation error first @bjornjorgensen thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] ulysses-you commented on a diff in pull request #40952: [SPARK-43281][SQL] Fix concurrent writer does not update file metrics

2023-04-25 Thread via GitHub
ulysses-you commented on code in PR #40952: URL: https://github.com/apache/spark/pull/40952#discussion_r1177313799 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/BasicWriteTaskStatsTrackerSuite.scala: ## @@ -85,22 +85,23 @@ class BasicWriteTaskStatsTracker

[GitHub] [spark] LuciferYang commented on a diff in pull request #40933: [SPARK-43263][BUILD] Upgrade `FasterXML jackson` to 2.15.0

2023-04-25 Thread via GitHub
LuciferYang commented on code in PR #40933: URL: https://github.com/apache/spark/pull/40933#discussion_r1177313498 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala: ## @@ -175,7 +187,13 @@ private[sql] class JSONOptions( parameters.get(WR

[GitHub] [spark] ulysses-you commented on pull request #40952: [SPARK-43281][SQL] Fix concurrent writer does not update file metrics

2023-04-25 Thread via GitHub
ulysses-you commented on PR #40952: URL: https://github.com/apache/spark/pull/40952#issuecomment-1522750138 @cloud-fan , it happened since https://github.com/apache/spark/pull/32198 and with concurrent writer on. -- This is an automated message from the Apache Git Service. To respond to t

[GitHub] [spark] amaliujia commented on a diff in pull request #40938: [SPARK-43274][SPARK-43275][PYTHON][CONNECT] Introduce `PySparkNotImplementedError`

2023-04-25 Thread via GitHub
amaliujia commented on code in PR #40938: URL: https://github.com/apache/spark/pull/40938#discussion_r1177313029 ## python/pyspark/errors/error_classes.py: ## @@ -269,6 +269,11 @@ " is not iterable." ] }, + "NOT_LIST" : { Review Comment: Sounds good then!

[GitHub] [spark] LuciferYang commented on a diff in pull request #40933: [SPARK-43263][BUILD] Upgrade `FasterXML jackson` to 2.15.0

2023-04-25 Thread via GitHub
LuciferYang commented on code in PR #40933: URL: https://github.com/apache/spark/pull/40933#discussion_r1177312976 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala: ## @@ -175,7 +187,13 @@ private[sql] class JSONOptions( parameters.get(WR

[GitHub] [spark] LuciferYang commented on a diff in pull request #40933: [SPARK-43263][BUILD] Upgrade `FasterXML jackson` to 2.15.0

2023-04-25 Thread via GitHub
LuciferYang commented on code in PR #40933: URL: https://github.com/apache/spark/pull/40933#discussion_r1177311581 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala: ## @@ -43,6 +43,18 @@ private[sql] class JSONOptions( import JSONOptions._

[GitHub] [spark] LuciferYang commented on pull request #40940: [SPARK-43277][YARN] Clean up deprecation hadoop api usage in `yarn` module

2023-04-25 Thread via GitHub
LuciferYang commented on PR #40940: URL: https://github.com/apache/spark/pull/40940#issuecomment-1522745134 Thanks @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] WeichenXu123 opened a new pull request, #40954: [PYSPARK] [CONNECT] [ML] PySpark UDF supports python package dependencies

2023-04-25 Thread via GitHub
WeichenXu123 opened a new pull request, #40954: URL: https://github.com/apache/spark/pull/40954 ### What changes were proposed in this pull request? Make the pyspark UDF support annotating python dependencies and when executing UDF, the UDF worker creates a new python environment

[GitHub] [spark] zhengruifeng commented on pull request #40939: [SPARK-43276][CONNECT][PYTHON] Migrate Spark Connect Window errors into error class

2023-04-25 Thread via GitHub
zhengruifeng commented on PR #40939: URL: https://github.com/apache/spark/pull/40939#issuecomment-1522737555 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng closed pull request #40939: [SPARK-43276][CONNECT][PYTHON] Migrate Spark Connect Window errors into error class

2023-04-25 Thread via GitHub
zhengruifeng closed pull request #40939: [SPARK-43276][CONNECT][PYTHON] Migrate Spark Connect Window errors into error class URL: https://github.com/apache/spark/pull/40939 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] zhengruifeng commented on pull request #40943: [SPARK-43280][BUILD] Reimplement the protobuf breaking change checker

2023-04-25 Thread via GitHub
zhengruifeng commented on PR #40943: URL: https://github.com/apache/spark/pull/40943#issuecomment-1522732825 after adding `set -ex`, the output will be like: ``` ~/spark$ ./dev/protobuf-breaking-changes-check.sh branch-3.4 + [[ 1 -gt 1 ]] +++ dirname ./dev/protobuf-breaking-ch

[GitHub] [spark] srowen commented on pull request #40940: [SPARK-43277][YARN] Clean up deprecation hadoop api usage in `yarn` module

2023-04-25 Thread via GitHub
srowen commented on PR #40940: URL: https://github.com/apache/spark/pull/40940#issuecomment-1522710911 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [spark] srowen closed pull request #40940: [SPARK-43277][YARN] Clean up deprecation hadoop api usage in `yarn` module

2023-04-25 Thread via GitHub
srowen closed pull request #40940: [SPARK-43277][YARN] Clean up deprecation hadoop api usage in `yarn` module URL: https://github.com/apache/spark/pull/40940 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] cloud-fan commented on pull request #40922: [SPARK-43063][SQL][FOLLOWUP] Add ToPrettyString expression for Dataset.show

2023-04-25 Thread via GitHub
cloud-fan commented on PR #40922: URL: https://github.com/apache/spark/pull/40922#issuecomment-1522708607 Most of the changes in https://github.com/apache/spark/pull/40699 are updating tests, and we still need them as we don't revert the behavior change of `df.show`. The behavior change of

[GitHub] [spark] Hisoka-X opened a new pull request, #40953: [SPARK-43267][JDBC] Handle postgres unknown user-defined column as string in array

2023-04-25 Thread via GitHub
Hisoka-X opened a new pull request, #40953: URL: https://github.com/apache/spark/pull/40953 ### What changes were proposed in this pull request? Spark SQL now doesn’t support creating data frame from a Postgres table that contains user-defined array column. This PR supp

[GitHub] [spark] cloud-fan commented on a diff in pull request #40947: [Spark-43284] Switch back to url-encoded strings

2023-04-25 Thread via GitHub
cloud-fan commented on code in PR #40947: URL: https://github.com/apache/spark/pull/40947#discussion_r1177285829 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala: ## @@ -265,8 +265,8 @@ object FileFormat { * fields of the [[PartitionedFi

[GitHub] [spark] LuciferYang commented on pull request #40877: [SPARK-31733][YARN][TESTS] Make `specify a more specific type for the application` in `ClientSuite` pass in Hadoop 3

2023-04-25 Thread via GitHub
LuciferYang commented on PR #40877: URL: https://github.com/apache/spark/pull/40877#issuecomment-1522689975 friendly ping @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [spark] JoshRosen commented on a diff in pull request #39011: [SPARK-41469][CORE] Avoid unnecessary task rerun on decommissioned executor lost if shuffle data migrated

2023-04-25 Thread via GitHub
JoshRosen commented on code in PR #39011: URL: https://github.com/apache/spark/pull/39011#discussion_r1177257697 ## core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala: ## @@ -1046,17 +1048,45 @@ private[spark] class TaskSetManager( /** Called by TaskSchedul

[GitHub] [spark] JkSelf commented on a diff in pull request #40914: [SPARK-43240][SQL][3.3] Fix the wrong result issue when calling df.describe() method.

2023-04-25 Thread via GitHub
JkSelf commented on code in PR #40914: URL: https://github.com/apache/spark/pull/40914#discussion_r1177238458 ## sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala: ## @@ -288,7 +288,7 @@ object StatFunctions extends Logging { } // If the

[GitHub] [spark] ulysses-you commented on pull request #40952: [SPARK-43281][SQL] Fix concurrent writer does not update file metrics

2023-04-25 Thread via GitHub
ulysses-you commented on PR #40952: URL: https://github.com/apache/spark/pull/40952#issuecomment-1522647461 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [spark] ulysses-you opened a new pull request, #40952: [SPARK-43281][SQL] Fix concurrent writer does not update file metrics

2023-04-25 Thread via GitHub
ulysses-you opened a new pull request, #40952: URL: https://github.com/apache/spark/pull/40952 ### What changes were proposed in this pull request? `DynamicPartitionDataConcurrentWriter` it uses temp file path to get file status after commit task. However, the temp file has al

[GitHub] [spark] rangadi commented on a diff in pull request #40892: [SPARK-43128][CONNECT][SS] Make `recentProgress` and `lastProgress` return `StreamingQueryProgress` consistent with the native Scal

2023-04-25 Thread via GitHub
rangadi commented on code in PR #40892: URL: https://github.com/apache/spark/pull/40892#discussion_r1177220790 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala: ## @@ -163,14 +163,14 @@ class RemoteStreamingQuery( override d

[GitHub] [spark] sadikovi commented on pull request #40922: [SPARK-43063][SQL][FOLLOWUP] Add ToPrettyString expression for Dataset.show

2023-04-25 Thread via GitHub
sadikovi commented on PR #40922: URL: https://github.com/apache/spark/pull/40922#issuecomment-1522635737 Does this PR need https://github.com/apache/spark/pull/40699? I was under the assumption that we had to revert the original patch and have another solution instead. -- This is an auto

[GitHub] [spark] itholic commented on a diff in pull request #40939: [SPARK-43276][CONNECT][PYTHON] Migrate Spark Connect Window errors into error class

2023-04-25 Thread via GitHub
itholic commented on code in PR #40939: URL: https://github.com/apache/spark/pull/40939#discussion_r1176535987 ## python/pyspark/errors/error_classes.py: ## @@ -224,6 +224,11 @@ "Argument `` should be a Column, int or str, got ." ] }, + "NOT_COLUMN_OR_LIST_OR_STR

[GitHub] [spark] itholic commented on a diff in pull request #40938: [SPARK-43274][SPARK-43275][PYTHON][CONNECT] Introduce `PySparkNotImplementedError`

2023-04-25 Thread via GitHub
itholic commented on code in PR #40938: URL: https://github.com/apache/spark/pull/40938#discussion_r1177219630 ## python/pyspark/errors/error_classes.py: ## @@ -269,6 +269,11 @@ " is not iterable." ] }, + "NOT_LIST" : { Review Comment: Yes, we can classify er

[GitHub] [spark] zhengruifeng commented on pull request #40941: [MINOR][BUILD] Correct the error message in `dev/connect-check-protos.py`

2023-04-25 Thread via GitHub
zhengruifeng commented on PR #40941: URL: https://github.com/apache/spark/pull/40941#issuecomment-1522630835 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng closed pull request #40941: [MINOR][BUILD] Correct the error message in `dev/connect-check-protos.py`

2023-04-25 Thread via GitHub
zhengruifeng closed pull request #40941: [MINOR][BUILD] Correct the error message in `dev/connect-check-protos.py` URL: https://github.com/apache/spark/pull/40941 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark-docker] Yikun opened a new pull request, #35: [WIP] Switch 3.4.0 default Java to Java17

2023-04-25 Thread via GitHub
Yikun opened a new pull request, #35: URL: https://github.com/apache/spark-docker/pull/35 ### What changes were proposed in this pull request? This PR switch v3.4.0 image to Java17 - Add support for Java17 in add-dockerfiles.sh - `./add-dockerfiles.sh 3.4.0`: generate all Dockerfiles

[GitHub] [spark] cloud-fan commented on a diff in pull request #40947: [Spark-43284] Switch back to url-encoded strings

2023-04-25 Thread via GitHub
cloud-fan commented on code in PR #40947: URL: https://github.com/apache/spark/pull/40947#discussion_r1177212533 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala: ## @@ -265,8 +265,8 @@ object FileFormat { * fields of the [[PartitionedFi

[GitHub] [spark] github-actions[bot] closed pull request #38496: [SPARK-40708][SQL] Auto update table statistics based on write metrics

2023-04-25 Thread via GitHub
github-actions[bot] closed pull request #38496: [SPARK-40708][SQL] Auto update table statistics based on write metrics URL: https://github.com/apache/spark/pull/38496 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] WweiL commented on a diff in pull request #40937: [SPARK-42940] Improve session management for streaming queries

2023-04-25 Thread via GitHub
WweiL commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1177137568 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala: ## @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache

[GitHub] [spark] sadikovi commented on pull request #40922: [SPARK-43063][SQL][FOLLOWUP] Add ToPrettyString expression for Dataset.show

2023-04-25 Thread via GitHub
sadikovi commented on PR #40922: URL: https://github.com/apache/spark/pull/40922#issuecomment-1522539250 I suppose it is fine to have changes in Cast. Would it be possible to check the example queries in my comment in https://github.com/apache/spark/pull/40699 and what results they return?

[GitHub] [spark] srielau commented on pull request #40884: [SPARK-43205] IDENTIFIER() clause

2023-04-25 Thread via GitHub
srielau commented on PR #40884: URL: https://github.com/apache/spark/pull/40884#issuecomment-1522534258 @cloud-fan How are docs done? Attach them to the same PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] WweiL commented on a diff in pull request #40937: [SPARK-42940] Improve session management for streaming queries

2023-04-25 Thread via GitHub
WweiL commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1177145822 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2092,6 +2097,11 @@ class SparkConnectPlanner(val session

[GitHub] [spark] liuzqt commented on pull request #40629: [SPARK-42980][CORE] Implement a lightweight SmallBroadcast

2023-04-25 Thread via GitHub
liuzqt commented on PR #40629: URL: https://github.com/apache/spark/pull/40629#issuecomment-1522520810 Hi @mridulm agree that the broadcast impl under the hood should not be exposed to user if possible, let me see how we can inline the small broadcast within current broadcast code path base

[GitHub] [spark] WweiL commented on a diff in pull request #40937: [SPARK-42940] Improve session management for streaming queries

2023-04-25 Thread via GitHub
WweiL commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1177137568 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala: ## @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache

[GitHub] [spark] WweiL commented on a diff in pull request #40937: [SPARK-42940] Improve session management for streaming queries

2023-04-25 Thread via GitHub
WweiL commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1177137568 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala: ## @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache

[GitHub] [spark] WweiL commented on a diff in pull request #40937: [SPARK-42940] Improve session management for streaming queries

2023-04-25 Thread via GitHub
WweiL commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1177124740 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2092,6 +2097,11 @@ class SparkConnectPlanner(val session

[GitHub] [spark] WweiL commented on a diff in pull request #40937: [SPARK-42940] Improve session management for streaming queries

2023-04-25 Thread via GitHub
WweiL commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1177124740 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2092,6 +2097,11 @@ class SparkConnectPlanner(val session

[GitHub] [spark] WweiL commented on a diff in pull request #40937: [SPARK-42940] Improve session management for streaming queries

2023-04-25 Thread via GitHub
WweiL commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1177124740 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2092,6 +2097,11 @@ class SparkConnectPlanner(val session

[GitHub] [spark] WweiL commented on a diff in pull request #40937: [SPARK-42940] Improve session management for streaming queries

2023-04-25 Thread via GitHub
WweiL commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1177124740 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2092,6 +2097,11 @@ class SparkConnectPlanner(val session

[GitHub] [spark] WweiL commented on a diff in pull request #40937: [SPARK-42940] Improve session management for streaming queries

2023-04-25 Thread via GitHub
WweiL commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1177124740 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2092,6 +2097,11 @@ class SparkConnectPlanner(val session

[GitHub] [spark] WweiL commented on a diff in pull request #40937: [SPARK-42940] Improve session management for streaming queries

2023-04-25 Thread via GitHub
WweiL commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1177124740 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2092,6 +2097,11 @@ class SparkConnectPlanner(val session

[GitHub] [spark] WweiL commented on a diff in pull request #40937: [SPARK-42940] Improve session management for streaming queries

2023-04-25 Thread via GitHub
WweiL commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1177124740 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2092,6 +2097,11 @@ class SparkConnectPlanner(val session

[GitHub] [spark] WweiL commented on a diff in pull request #40937: [SPARK-42940] Improve session management for streaming queries

2023-04-25 Thread via GitHub
WweiL commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1177124740 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2092,6 +2097,11 @@ class SparkConnectPlanner(val session

[GitHub] [spark] amousavigourabi opened a new pull request, #40951: [SPARK-43250] Replace the error class `_LEGACY_ERROR_TEMP_2014` with an internal error

2023-04-25 Thread via GitHub
amousavigourabi opened a new pull request, #40951: URL: https://github.com/apache/spark/pull/40951 ### What changes were proposed in this pull request? In this PR I propose to replace the legacy error class `_LEGACY_ERROR_TEMP_2014` with an internal error as it is not triggered by the us

[GitHub] [spark] zhenlineo commented on a diff in pull request #40762: [SPARK-42953][Connect][Followup] Fix maven test build for Scala client UDF tests

2023-04-25 Thread via GitHub
zhenlineo commented on code in PR #40762: URL: https://github.com/apache/spark/pull/40762#discussion_r1177104126 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -102,6 +85,56 @@ object SparkConnectServerUtil

[GitHub] [spark] zhenlineo commented on pull request #40762: [SPARK-42953][Connect][Followup] Fix maven test build for Scala client UDF tests

2023-04-25 Thread via GitHub
zhenlineo commented on PR #40762: URL: https://github.com/apache/spark/pull/40762#issuecomment-1522471367 @LuciferYang @hvanhovell @vicennial This fixes maven test failures to run UDF E2E tests. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] amaliujia commented on a diff in pull request #40931: [SPARK-43265] Move Error framework to a common utils module

2023-04-25 Thread via GitHub
amaliujia commented on code in PR #40931: URL: https://github.com/apache/spark/pull/40931#discussion_r1176958568 ## project/MimaExcludes.scala: ## @@ -66,6 +66,12 @@ object MimaExcludes { ProblemFilters.exclude[Problem]("org.sparkproject.spark_core.protobuf.*"), Probl

[GitHub] [spark] amaliujia commented on a diff in pull request #40938: [SPARK-43274][SPARK-43275][PYTHON][CONNECT] Introduce `PySparkNotImplementedError`

2023-04-25 Thread via GitHub
amaliujia commented on code in PR #40938: URL: https://github.com/apache/spark/pull/40938#discussion_r1177062830 ## python/pyspark/errors/error_classes.py: ## @@ -269,6 +269,11 @@ " is not iterable." ] }, + "NOT_LIST" : { Review Comment: We should consolidate

[GitHub] [spark] amaliujia commented on a diff in pull request #40906: [SPARK-43134] [CONNECT] [SS] JVM client StreamingQuery exception() API

2023-04-25 Thread via GitHub
amaliujia commented on code in PR #40906: URL: https://github.com/apache/spark/pull/40906#discussion_r1177060006 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -308,8 +308,11 @@ message StreamingQueryCommandResult { } message ExceptionR

[GitHub] [spark] hvanhovell closed pull request #40729: [SPARK-43136][CONNECT] Adding groupByKey + mapGroup + coGroup functions

2023-04-25 Thread via GitHub
hvanhovell closed pull request #40729: [SPARK-43136][CONNECT] Adding groupByKey + mapGroup + coGroup functions URL: https://github.com/apache/spark/pull/40729 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] hvanhovell commented on pull request #40729: [SPARK-43136][CONNECT] Adding groupByKey + mapGroup + coGroup functions

2023-04-25 Thread via GitHub
hvanhovell commented on PR #40729: URL: https://github.com/apache/spark/pull/40729#issuecomment-1522405921 Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] hvanhovell commented on a diff in pull request #40729: [SPARK-43136][CONNECT] Adding groupByKey + mapGroup + coGroup functions

2023-04-25 Thread via GitHub
hvanhovell commented on code in PR #40729: URL: https://github.com/apache/spark/pull/40729#discussion_r1177045727 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -520,54 +515,205 @@ class SparkConnectPlanner(val se

[GitHub] [spark] pjfanning commented on a diff in pull request #40933: [SPARK-43263][BUILD] Upgrade `FasterXML jackson` to 2.15.0

2023-04-25 Thread via GitHub
pjfanning commented on code in PR #40933: URL: https://github.com/apache/spark/pull/40933#discussion_r1177038834 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala: ## @@ -175,7 +187,13 @@ private[sql] class JSONOptions( parameters.get(WRIT

[GitHub] [spark] pjfanning commented on a diff in pull request #40933: [SPARK-43263][BUILD] Upgrade `FasterXML jackson` to 2.15.0

2023-04-25 Thread via GitHub
pjfanning commented on code in PR #40933: URL: https://github.com/apache/spark/pull/40933#discussion_r1177038834 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala: ## @@ -175,7 +187,13 @@ private[sql] class JSONOptions( parameters.get(WRIT

[GitHub] [spark] pjfanning commented on a diff in pull request #40933: [SPARK-43263][BUILD] Upgrade `FasterXML jackson` to 2.15.0

2023-04-25 Thread via GitHub
pjfanning commented on code in PR #40933: URL: https://github.com/apache/spark/pull/40933#discussion_r1177037487 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala: ## @@ -43,6 +43,18 @@ private[sql] class JSONOptions( import JSONOptions._

[GitHub] [spark] hvanhovell commented on a diff in pull request #40729: [SPARK-43136][CONNECT] Adding groupByKey + mapGroup + coGroup functions

2023-04-25 Thread via GitHub
hvanhovell commented on code in PR #40729: URL: https://github.com/apache/spark/pull/40729#discussion_r1177036733 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -520,54 +515,205 @@ class SparkConnectPlanner(val se

[GitHub] [spark] BeishaoCao-db commented on pull request #40907: [SPARK-43270][PYTHON] Implement `__dir__()` in `pyspark.sql.dataframe.DataFrame` to include columns

2023-04-25 Thread via GitHub
BeishaoCao-db commented on PR #40907: URL: https://github.com/apache/spark/pull/40907#issuecomment-1522387106 Have to mention, this solution is not perfect solution: dir won't return private method, so if a column start with an _, it would be ignored in the suggestion -- This is a

[GitHub] [spark] hvanhovell commented on a diff in pull request #40729: [SPARK-43136][CONNECT] Adding groupByKey + mapGroup + coGroup functions

2023-04-25 Thread via GitHub
hvanhovell commented on code in PR #40729: URL: https://github.com/apache/spark/pull/40729#discussion_r1177021671 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/KeyValueGroupedDatasetE2ETestSuite.scala: ## @@ -0,0 +1,218 @@ +/* + * Licensed to the Apache Sof

[GitHub] [spark] hvanhovell commented on a diff in pull request #40729: [SPARK-43136][CONNECT] Adding groupByKey + mapGroup + coGroup functions

2023-04-25 Thread via GitHub
hvanhovell commented on code in PR #40729: URL: https://github.com/apache/spark/pull/40729#discussion_r1177017870 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala: ## @@ -0,0 +1,416 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [spark] WweiL closed pull request #40935: [SPARK-43206] [SS] [CONNECT] [DRAFT] [DO-NOT-REVIEW] StreamingQuery exception() include stack trace

2023-04-25 Thread via GitHub
WweiL closed pull request #40935: [SPARK-43206] [SS] [CONNECT] [DRAFT] [DO-NOT-REVIEW] StreamingQuery exception() include stack trace URL: https://github.com/apache/spark/pull/40935 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] WweiL opened a new pull request, #40950: [SPARK-43206] [SS] [CONNECT] [DRAFT] [DO-NOT-REVIEW] StreamingQuery exception() include stack trace

2023-04-25 Thread via GitHub
WweiL opened a new pull request, #40950: URL: https://github.com/apache/spark/pull/40950 ### What changes were proposed in this pull request? Add stack trace to streamingQuery's `exception()` method. Following https://github.com/apache/spark/commit/a5c8a3c976889f33595ac18f82e7

[GitHub] [spark] WweiL commented on pull request #40906: [SPARK-43134] [CONNECT] [SS] JVM client StreamingQuery exception() API

2023-04-25 Thread via GitHub
WweiL commented on PR #40906: URL: https://github.com/apache/spark/pull/40906#issuecomment-1522329072 The `optional` in command.proto is needed or it throws: ``` [error] /home/wei.liu/oss-spark/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.sc

[GitHub] [spark] amaliujia commented on a diff in pull request #40931: [SPARK-43265] Move Error framework to a common utils module

2023-04-25 Thread via GitHub
amaliujia commented on code in PR #40931: URL: https://github.com/apache/spark/pull/40931#discussion_r1176958568 ## project/MimaExcludes.scala: ## @@ -66,6 +66,12 @@ object MimaExcludes { ProblemFilters.exclude[Problem]("org.sparkproject.spark_core.protobuf.*"), Probl

[GitHub] [spark] amaliujia commented on a diff in pull request #40931: [SPARK-43265] Move Error framework to a common utils module

2023-04-25 Thread via GitHub
amaliujia commented on code in PR #40931: URL: https://github.com/apache/spark/pull/40931#discussion_r1176936721 ## common/utils/src/main/scala/org/apache/spark/JsonProtocol.scala: ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

[GitHub] [spark] hvanhovell closed pull request #40948: [SPARK-43285] Fix ReplE2ESuite consistently failing with JDK 17

2023-04-25 Thread via GitHub
hvanhovell closed pull request #40948: [SPARK-43285] Fix ReplE2ESuite consistently failing with JDK 17 URL: https://github.com/apache/spark/pull/40948 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] hvanhovell commented on pull request #40948: [SPARK-43285] Fix ReplE2ESuite consistently failing with JDK 17

2023-04-25 Thread via GitHub
hvanhovell commented on PR #40948: URL: https://github.com/apache/spark/pull/40948#issuecomment-1522283670 Merging this unblock JDK 17 build. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] otterc commented on a diff in pull request #40921: [SPARK-43242] fix throw 'Unexpected type of BlockId' in diagnose when…

2023-04-25 Thread via GitHub
otterc commented on code in PR #40921: URL: https://github.com/apache/spark/pull/40921#discussion_r1176931684 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -1139,7 +1139,12 @@ final class ShuffleBlockFetcherIterator( case shuffle

[GitHub] [spark] Knorreman commented on pull request #40918: [WIP][CORE] Add shuffle sort merge joins to RDD API

2023-04-25 Thread via GitHub
Knorreman commented on PR #40918: URL: https://github.com/apache/spark/pull/40918#issuecomment-1522268063 > @Knorreman I am not sure how much new things we want to add to the RDD API. The SQL API should be the primary API. @hvanhovell Every now and then there are stuff added to RDDs.

  1   2   3   >