[GitHub] [spark] databricks-david-lewis commented on pull request #40947: [SPARK-43284] Switch back to url-encoded strings

2023-05-03 Thread via GitHub
databricks-david-lewis commented on PR #40947: URL: https://github.com/apache/spark/pull/40947#issuecomment-1534129513 Oof. This is because of the following: ``` val p1 = new Path("file:///a") val p2 = new Path("file:/a") p1 == p2 p1.toString == p2.toString p1.toUri ==

[GitHub] [spark] allisonwang-db commented on a diff in pull request #40896: [SPARK-43229][ML][PYTHON][CONNECT] Introduce Barrier Python UDF

2023-05-03 Thread via GitHub
allisonwang-db commented on code in PR #40896: URL: https://github.com/apache/spark/pull/40896#discussion_r1184564949 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -333,6 +333,9 @@ message PythonUDF { bytes command = 3; // (Required)

[GitHub] [spark] yaooqinn opened a new pull request, #41043: [SPARK-43374][INFRA] Move protobuf-java to BSD 3-clause group and update the license copy

2023-05-03 Thread via GitHub
yaooqinn opened a new pull request, #41043: URL: https://github.com/apache/spark/pull/41043 ### What changes were proposed in this pull request? protobuf-java is licensed under the BSD 3-clause, not the 2 we claimed. And the copy should be updated via

[GitHub] [spark] cloud-fan commented on a diff in pull request #41028: [SPARK-43324][SQL] Handle UPDATE commands for delta-based sources

2023-05-03 Thread via GitHub
cloud-fan commented on code in PR #41028: URL: https://github.com/apache/spark/pull/41028#discussion_r1184533565 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RowDeltaUtils.scala: ## @@ -25,4 +25,5 @@ object RowDeltaUtils { final val DELETE_OPERATION: Int

[GitHub] [spark] ueshin commented on a diff in pull request #41041: [SPARK-43363][SQL][PYTHON] Remove a workaround for pandas categorical type for pyarrow

2023-05-03 Thread via GitHub
ueshin commented on code in PR #41041: URL: https://github.com/apache/spark/pull/41041#discussion_r1184531070 ## python/pyspark/sql/pandas/serializers.py: ## @@ -226,9 +225,6 @@ def create_array(s, t): s = _check_series_convert_timestamps_internal(s,

[GitHub] [spark] ueshin commented on a diff in pull request #41041: [SPARK-43363][SQL][PYTHON] Remove a workaround for pandas categorical type for pyarrow

2023-05-03 Thread via GitHub
ueshin commented on code in PR #41041: URL: https://github.com/apache/spark/pull/41041#discussion_r1184530406 ## python/pyspark/sql/pandas/serializers.py: ## @@ -226,9 +225,6 @@ def create_array(s, t): s = _check_series_convert_timestamps_internal(s,

[GitHub] [spark] ueshin commented on a diff in pull request #41041: [SPARK-43363][SQL][PYTHON] Remove a workaround for pandas categorical type for pyarrow

2023-05-03 Thread via GitHub
ueshin commented on code in PR #41041: URL: https://github.com/apache/spark/pull/41041#discussion_r1184526296 ## python/pyspark/sql/pandas/serializers.py: ## @@ -226,9 +225,6 @@ def create_array(s, t): s = _check_series_convert_timestamps_internal(s,

[GitHub] [spark] HyukjinKwon closed pull request #40686: [SPARK-43051][CONNECT] Add option to emit default values

2023-05-03 Thread via GitHub
HyukjinKwon closed pull request #40686: [SPARK-43051][CONNECT] Add option to emit default values URL: https://github.com/apache/spark/pull/40686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #40686: [SPARK-43051][CONNECT] Add option to emit default values

2023-05-03 Thread via GitHub
HyukjinKwon commented on PR #40686: URL: https://github.com/apache/spark/pull/40686#issuecomment-1534070113 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #30481: [SPARK-33526][SQL] Add config to control if cancel invoke interrupt task on thriftserver

2023-05-03 Thread via GitHub
HyukjinKwon commented on code in PR #30481: URL: https://github.com/apache/spark/pull/30481#discussion_r1184521894 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -926,13 +926,23 @@ object SQLConf { .booleanConf

[GitHub] [spark] dongjoon-hyun commented on pull request #41011: [SPARK-43337][UI][3.4] Asc/desc arrow icons for sorting column does not get displayed in the table column

2023-05-03 Thread via GitHub
dongjoon-hyun commented on PR #41011: URL: https://github.com/apache/spark/pull/41011#issuecomment-1534029673 I prefer the minimal change instead of this PR because there is no easy way for us to guarantee that new version has no regression somewhere. To be honest, actually, it's difficult

[GitHub] [spark] Stove-hust commented on pull request #40412: [SPARK-42784] should still create subDir when the number of subDir in merge dir is less than conf

2023-05-03 Thread via GitHub
Stove-hust commented on PR #40412: URL: https://github.com/apache/spark/pull/40412#issuecomment-1534022451 @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] hiboyang commented on a diff in pull request #41036: [SPARK-43351] [CONNECT] Add Spark Connect Go prototype code and example

2023-05-03 Thread via GitHub
hiboyang commented on code in PR #41036: URL: https://github.com/apache/spark/pull/41036#discussion_r1184493980 ## connector/connect/client/go/README.md: ## @@ -0,0 +1,35 @@ +- Prepare your environment to generate proto Go files: Review Comment: Cool, thanks for the info!

[GitHub] [spark] cloud-fan commented on pull request #40871: [SPARK-43373][SQL] Revert [SPARK-39203][SQL] Rewrite table location to absolute URI based on database URI

2023-05-03 Thread via GitHub
cloud-fan commented on PR #40871: URL: https://github.com/apache/spark/pull/40871#issuecomment-1534014167 I've created a new JIRA ticket -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #40989: [SPARK-43316][SQL] Add more CTE SQL tests

2023-05-03 Thread via GitHub
cloud-fan commented on code in PR #40989: URL: https://github.com/apache/spark/pull/40989#discussion_r1184484905 ## sql/core/src/test/resources/sql-tests/inputs/cte.sql: ## @@ -53,6 +53,289 @@ SELECT * FROM t; WITH t AS (SELECT 1 FROM non_existing_table) SELECT 2; +-- The

[GitHub] [spark] hiboyang commented on a diff in pull request #41036: [SPARK-43351] [CONNECT] Add Spark Connect Go prototype code and example

2023-05-03 Thread via GitHub
hiboyang commented on code in PR #41036: URL: https://github.com/apache/spark/pull/41036#discussion_r1184476110 ## connector/connect/client/go/README.md: ## @@ -0,0 +1,92 @@ +## Summary Review Comment: For Go, normally people put the Go code in GitHub repo, and reference

[GitHub] [spark] srowen commented on pull request #41011: [SPARK-43337][UI][3.4] Asc/desc arrow icons for sorting column does not get displayed in the table column

2023-05-03 Thread via GitHub
srowen commented on PR #41011: URL: https://github.com/apache/spark/pull/41011#issuecomment-1533986641 @dongjoon-hyun WDYT? I'm neutral on one change vs the other. I suppose modifying the CSS is a smaller change -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] hiboyang commented on pull request #41036: [SPARK-43351] [CONNECT] Add Spark Connect Go prototype code and example

2023-05-03 Thread via GitHub
hiboyang commented on PR #41036: URL: https://github.com/apache/spark/pull/41036#issuecomment-1533984760 Need to add unit test / integration test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dcoliversun commented on pull request #39306: [SPARK-41781][K8S] Add the ability to create pvc before creating driver/executor pod

2023-05-03 Thread via GitHub
dcoliversun commented on PR #39306: URL: https://github.com/apache/spark/pull/39306#issuecomment-1533976855 @cometta Hi, this PR is not related to the problem you're having. Could you create an issue in JIRA, attach your spark configuration? I will follow up :) -- This is an automated

[GitHub] [spark] pan3793 commented on pull request #40831: [SPARK-43171][K8S] Support custom Unix username in Pod

2023-05-03 Thread via GitHub
pan3793 commented on PR #40831: URL: https://github.com/apache/spark/pull/40831#issuecomment-1533967009 @Yikun @dongjoon-hyun would you please take a look again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] ulysses-you commented on a diff in pull request #30481: [SPARK-33526][SQL] Add config to control if cancel invoke interrupt task on thriftserver

2023-05-03 Thread via GitHub
ulysses-you commented on code in PR #30481: URL: https://github.com/apache/spark/pull/30481#discussion_r1184464259 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -926,13 +926,23 @@ object SQLConf { .booleanConf

[GitHub] [spark] beliefer commented on pull request #40563: [SPARK-41233][FOLLOWUP] Refactor `array_prepend` with `RuntimeReplaceable`

2023-05-03 Thread via GitHub
beliefer commented on PR #40563: URL: https://github.com/apache/spark/pull/40563#issuecomment-1533956516 @cloud-fan Thanks ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40632: [SPARK-42298][SQL] Assign name to _LEGACY_ERROR_TEMP_2132

2023-05-03 Thread via GitHub
Hisoka-X commented on code in PR #40632: URL: https://github.com/apache/spark/pull/40632#discussion_r1184454024 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala: ## @@ -134,54 +137,60 @@ class JacksonParser( // List([str_a_1,null])

[GitHub] [spark] amaliujia commented on a diff in pull request #41036: [SPARK-43351] [CONNECT] Add Spark Connect Go prototype code and example

2023-05-03 Thread via GitHub
amaliujia commented on code in PR #41036: URL: https://github.com/apache/spark/pull/41036#discussion_r1184443145 ## connector/connect/client/go/README.md: ## @@ -0,0 +1,92 @@ +## Summary Review Comment: Not a Go expert so just a question: how does such code get distributed

[GitHub] [spark] amaliujia commented on a diff in pull request #41036: [SPARK-43351] [CONNECT] Add Spark Connect Go prototype code and example

2023-05-03 Thread via GitHub
amaliujia commented on code in PR #41036: URL: https://github.com/apache/spark/pull/41036#discussion_r1184443145 ## connector/connect/client/go/README.md: ## @@ -0,0 +1,92 @@ +## Summary Review Comment: Not a Go expert so just a question: does such code distributed in Go

[GitHub] [spark] anishshri-db commented on pull request #41042: [SPARK-43364] Add docs for RocksDB state store memory management

2023-05-03 Thread via GitHub
anishshri-db commented on PR #41042: URL: https://github.com/apache/spark/pull/41042#issuecomment-1533931491 @HeartSaVioR @siying - please take a look. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] amaliujia commented on a diff in pull request #41036: [SPARK-43351] [CONNECT] Add Spark Connect Go prototype code and example

2023-05-03 Thread via GitHub
amaliujia commented on code in PR #41036: URL: https://github.com/apache/spark/pull/41036#discussion_r1184440923 ## connector/connect/client/go/README.md: ## @@ -0,0 +1,35 @@ +- Prepare your environment to generate proto Go files: Review Comment: We have been using this to

[GitHub] [spark] anishshri-db opened a new pull request, #41042: [SPARK-43364] Add docs for RocksDB state store memory management

2023-05-03 Thread via GitHub
anishshri-db opened a new pull request, #41042: URL: https://github.com/apache/spark/pull/41042 ### What changes were proposed in this pull request? Add docs for RocksDB state store memory management ### Why are the changes needed? Docs only change ### Does this PR

[GitHub] [spark] github-actions[bot] closed pull request #37616: [SPARK-40178][PYTHON][SQL] Fix partitioning hint parameters in PySpark

2023-05-03 Thread via GitHub
github-actions[bot] closed pull request #37616: [SPARK-40178][PYTHON][SQL] Fix partitioning hint parameters in PySpark URL: https://github.com/apache/spark/pull/37616 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] zhenlineo commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-03 Thread via GitHub
zhenlineo commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1184417644 ## sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala: ## @@ -700,6 +684,34 @@ private[sql] object RelationalGroupedDataset { new

[GitHub] [spark] ueshin opened a new pull request, #41041: [SPARK-43363][SQL][PYTHON] Remove a workaround for pandas categorical type for pyarrow

2023-05-03 Thread via GitHub
ueshin opened a new pull request, #41041: URL: https://github.com/apache/spark/pull/41041 ### What changes were proposed in this pull request? Removes a workaround for pandas categorical type for pyarrow. ### Why are the changes needed? Now that the minimum version of

[GitHub] [spark] zhenlineo commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-03 Thread via GitHub
zhenlineo commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1184408288 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -664,7 +665,53 @@ class SparkConnectPlanner(val

[GitHub] [spark] zhenlineo commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-03 Thread via GitHub
zhenlineo commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1184086351 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1867,7 +1928,34 @@ class SparkConnectPlanner(val

[GitHub] [spark] tianhanhu opened a new pull request, #41040: [SPARK-43362][SQL] Special handling of JSON type for MySQL connector

2023-05-03 Thread via GitHub
tianhanhu opened a new pull request, #41040: URL: https://github.com/apache/spark/pull/41040 ### What changes were proposed in this pull request? MySQL JSON type is converted into JDBC VARCHAR type with precision of -1 on some MariaDB drivers. When receiving VARCHAR with

[GitHub] [spark] WweiL commented on pull request #41039: [SPARK-43360] [SS] [CONNECT] Scala client StreamingQueryManager

2023-05-03 Thread via GitHub
WweiL commented on PR #41039: URL: https://github.com/apache/spark/pull/41039#issuecomment-1533860277 @rangadi @pengzhon-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] WweiL commented on pull request #41039: [SPARK-43360] [SS] [CONNECT] Scala client StreamingQueryManager

2023-05-03 Thread via GitHub
WweiL commented on PR #41039: URL: https://github.com/apache/spark/pull/41039#issuecomment-1533860158 Some of the bug fixes are also in https://github.com/apache/spark/pull/41037 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] WweiL opened a new pull request, #41039: [SPARK-43360] [SS] [CONNECT] Scala client StreamingQueryManager

2023-05-03 Thread via GitHub
WweiL opened a new pull request, #41039: URL: https://github.com/apache/spark/pull/41039 ### What changes were proposed in this pull request? Add support for scala client `StreamingQueryManager` ### Why are the changes needed? Development of scala connect client

[GitHub] [spark] vitaliili-db opened a new pull request, #41038: [WIP] Similarity search fix

2023-05-03 Thread via GitHub
vitaliili-db opened a new pull request, #41038: URL: https://github.com/apache/spark/pull/41038 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] RunyaoChen commented on a diff in pull request #40989: [SPARK-43316][SQL] Add more CTE SQL tests

2023-05-03 Thread via GitHub
RunyaoChen commented on code in PR #40989: URL: https://github.com/apache/spark/pull/40989#discussion_r1184361656 ## sql/core/src/test/resources/sql-tests/inputs/cte.sql: ## @@ -53,6 +53,289 @@ SELECT * FROM t; WITH t AS (SELECT 1 FROM non_existing_table) SELECT 2; +-- The

[GitHub] [spark] rangadi commented on a diff in pull request #40983: [SPARK-43312][PROTOBUF] Option to convert Any fields into JSON

2023-05-03 Thread via GitHub
rangadi commented on code in PR #40983: URL: https://github.com/apache/spark/pull/40983#discussion_r1184276121 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala: ## @@ -70,6 +69,15 @@ private[sql] class ProtobufDeserializer( def

[GitHub] [spark] WweiL opened a new pull request, #41037: [SPARK-43032] [SS] [CONNECT] Python SQM bug fix

2023-05-03 Thread via GitHub
WweiL opened a new pull request, #41037: URL: https://github.com/apache/spark/pull/41037 ### What changes were proposed in this pull request? One line fix of python SQM ### Why are the changes needed? Bug fix ### Does this PR introduce _any_ user-facing

[GitHub] [spark] hiboyang commented on a diff in pull request #41036: [SPARK-43351] [CONNECT] Add Spark Connect Go prototype code and example

2023-05-03 Thread via GitHub
hiboyang commented on code in PR #41036: URL: https://github.com/apache/spark/pull/41036#discussion_r1184242830 ## connector/connect/client/go/README.md: ## @@ -0,0 +1,35 @@ +- Prepare your environment to generate proto Go files: Review Comment: Do you mean the

[GitHub] [spark] hiboyang commented on pull request #41036: [SPARK-43351] [CONNECT] Add Spark Connect Go prototype code and example

2023-05-03 Thread via GitHub
hiboyang commented on PR #41036: URL: https://github.com/apache/spark/pull/41036#issuecomment-1533705017 > This is awesome! Thanks for starting the work. I think the next step would be to have a quick discussion in a readme or the PR on the rough design of the objects and methods so that

[GitHub] [spark] WweiL commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

2023-05-03 Thread via GitHub
WweiL commented on code in PR #41026: URL: https://github.com/apache/spark/pull/41026#discussion_r1184008423 ## sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: ## @@ -534,6 +552,8 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {

[GitHub] [spark] WweiL commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

2023-05-03 Thread via GitHub
WweiL commented on code in PR #41026: URL: https://github.com/apache/spark/pull/41026#discussion_r1184008423 ## sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: ## @@ -534,6 +552,8 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {

[GitHub] [spark] WweiL commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

2023-05-03 Thread via GitHub
WweiL commented on code in PR #41026: URL: https://github.com/apache/spark/pull/41026#discussion_r1184224749 ## python/pyspark/sql/connect/streaming/readwriter.py: ## @@ -339,7 +342,9 @@ def table(self, tableName: str) -> "DataFrame": class DataStreamWriter: -def

[GitHub] [spark] RyanBerti closed pull request #39678: [SPARK-16484][SQL] Add HyperLogLogPlusPlus sketch generator/evaluator/aggregator

2023-05-03 Thread via GitHub
RyanBerti closed pull request #39678: [SPARK-16484][SQL] Add HyperLogLogPlusPlus sketch generator/evaluator/aggregator URL: https://github.com/apache/spark/pull/39678 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] RyanBerti commented on pull request #39678: [SPARK-16484][SQL] Add HyperLogLogPlusPlus sketch generator/evaluator/aggregator

2023-05-03 Thread via GitHub
RyanBerti commented on PR #39678: URL: https://github.com/apache/spark/pull/39678#issuecomment-1533691734 Closing in favor of https://github.com/apache/spark/pull/40615 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] WweiL commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

2023-05-03 Thread via GitHub
WweiL commented on code in PR #41026: URL: https://github.com/apache/spark/pull/41026#discussion_r1184225072 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -209,6 +209,15 @@ message WriteStreamOperationStart { string path = 11;

[GitHub] [spark] WweiL commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

2023-05-03 Thread via GitHub
WweiL commented on code in PR #41026: URL: https://github.com/apache/spark/pull/41026#discussion_r1184224749 ## python/pyspark/sql/connect/streaming/readwriter.py: ## @@ -339,7 +342,9 @@ def table(self, tableName: str) -> "DataFrame": class DataStreamWriter: -def

[GitHub] [spark] WweiL commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

2023-05-03 Thread via GitHub
WweiL commented on code in PR #41026: URL: https://github.com/apache/spark/pull/41026#discussion_r1184223920 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -209,6 +209,15 @@ message WriteStreamOperationStart { string path = 11;

[GitHub] [spark] WweiL commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

2023-05-03 Thread via GitHub
WweiL commented on code in PR #41026: URL: https://github.com/apache/spark/pull/41026#discussion_r1184008423 ## sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: ## @@ -534,6 +552,8 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {

[GitHub] [spark] WweiL commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

2023-05-03 Thread via GitHub
WweiL commented on code in PR #41026: URL: https://github.com/apache/spark/pull/41026#discussion_r1184008423 ## sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: ## @@ -534,6 +552,8 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {

[GitHub] [spark] WweiL commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

2023-05-03 Thread via GitHub
WweiL commented on code in PR #41026: URL: https://github.com/apache/spark/pull/41026#discussion_r1184008423 ## sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: ## @@ -534,6 +552,8 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) {

[GitHub] [spark] rangadi commented on pull request #40686: [SPARK-43051][CONNECT] Add option to emit default values

2023-05-03 Thread via GitHub
rangadi commented on PR #40686: URL: https://github.com/apache/spark/pull/40686#issuecomment-1533676324 @HeartSaVioR could you merge this? You don't need to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] zhenlineo commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-03 Thread via GitHub
zhenlineo commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1184086351 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1867,7 +1928,34 @@ class SparkConnectPlanner(val

[GitHub] [spark] rangadi commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

2023-05-03 Thread via GitHub
rangadi commented on code in PR #41026: URL: https://github.com/apache/spark/pull/41026#discussion_r1184091839 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -209,6 +209,15 @@ message WriteStreamOperationStart { string path = 11;

[GitHub] [spark] xinrong-meng closed pull request #41027: [WIP] Nested ArrayType, MapType support in Arrow-optimized UDFs

2023-05-03 Thread via GitHub
xinrong-meng closed pull request #41027: [WIP] Nested ArrayType, MapType support in Arrow-optimized UDFs URL: https://github.com/apache/spark/pull/41027 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on pull request #40994: [SPARK-43319][K8S][TEST] Remove usage of deprecated DefaultKubernetesClient

2023-05-03 Thread via GitHub
dongjoon-hyun commented on PR #40994: URL: https://github.com/apache/spark/pull/40994#issuecomment-1533499271 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #40994: [SPARK-43319][K8S][TEST] Remove usage of deprecated DefaultKubernetesClient

2023-05-03 Thread via GitHub
dongjoon-hyun closed pull request #40994: [SPARK-43319][K8S][TEST] Remove usage of deprecated DefaultKubernetesClient URL: https://github.com/apache/spark/pull/40994 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun closed pull request #41023: [SPARK-43347][PYTHON] Remove Python 3.7 Support

2023-05-03 Thread via GitHub
dongjoon-hyun closed pull request #41023: [SPARK-43347][PYTHON] Remove Python 3.7 Support URL: https://github.com/apache/spark/pull/41023 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #41023: [SPARK-43347][PYTHON] Remove Python 3.7 Support

2023-05-03 Thread via GitHub
dongjoon-hyun commented on PR #41023: URL: https://github.com/apache/spark/pull/41023#issuecomment-1533496469 All tests passed and there is no change in code during rebasing. Merged to master for Apache Spark 3.5.0. Thank you again, @HyukjinKwon and @ueshin . -- This is an automated

[GitHub] [spark] zhenlineo commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-03 Thread via GitHub
zhenlineo commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1184086351 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1867,7 +1928,34 @@ class SparkConnectPlanner(val

[GitHub] [spark] zhenlineo commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-03 Thread via GitHub
zhenlineo commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1184078816 ## sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala: ## @@ -700,6 +684,34 @@ private[sql] object RelationalGroupedDataset { new

[GitHub] [spark] dtenedor commented on pull request #40996: [SPARK-43313][SQL] Adding missing column DEFAULT values for MERGE INSERT actions

2023-05-03 Thread via GitHub
dtenedor commented on PR #40996: URL: https://github.com/apache/spark/pull/40996#issuecomment-1533445192 > @dtenedor FYI there are test failures in the latest code. @gengliangwang thanks, fixed -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] WweiL commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

2023-05-03 Thread via GitHub
WweiL commented on code in PR #41026: URL: https://github.com/apache/spark/pull/41026#discussion_r1184008814 ## sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala: ## @@ -352,10 +355,15 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T])

[GitHub] [spark] WweiL commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

2023-05-03 Thread via GitHub
WweiL commented on code in PR #41026: URL: https://github.com/apache/spark/pull/41026#discussion_r1183993090 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -226,16 +226,9 @@ object

[GitHub] [spark] WweiL commented on pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

2023-05-03 Thread via GitHub
WweiL commented on PR #41026: URL: https://github.com/apache/spark/pull/41026#issuecomment-1533441031 @HyukjinKwon @HeartSaVioR @xinrong-meng @rangadi @pengzhon-db @amaliujia Can you guys take a look? Thanks! -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dtenedor commented on a diff in pull request #40996: [SPARK-43313][SQL] Adding missing column DEFAULT values for MERGE INSERT actions

2023-05-03 Thread via GitHub
dtenedor commented on code in PR #40996: URL: https://github.com/apache/spark/pull/40996#discussion_r1183995046 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala: ## @@ -589,8 +631,10 @@ case class ResolveDefaultColumns(

[GitHub] [spark] MaxGekk commented on pull request #40951: [SPARK-43250][SQL] Replace the error class `_LEGACY_ERROR_TEMP_2014` with an internal error

2023-05-03 Thread via GitHub
MaxGekk commented on PR #40951: URL: https://github.com/apache/spark/pull/40951#issuecomment-1533415423 @amousavigourabi Congratulations with your first contribution to Apache Spark! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] MaxGekk closed pull request #40951: [SPARK-43250][SQL] Replace the error class `_LEGACY_ERROR_TEMP_2014` with an internal error

2023-05-03 Thread via GitHub
MaxGekk closed pull request #40951: [SPARK-43250][SQL] Replace the error class `_LEGACY_ERROR_TEMP_2014` with an internal error URL: https://github.com/apache/spark/pull/40951 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] MaxGekk commented on pull request #40951: [SPARK-43250][SQL] Replace the error class `_LEGACY_ERROR_TEMP_2014` with an internal error

2023-05-03 Thread via GitHub
MaxGekk commented on PR #40951: URL: https://github.com/apache/spark/pull/40951#issuecomment-1533408418 +1, LGTM. Merging to master. Thank you, @amousavigourabi. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] grundprinzip commented on a diff in pull request #41036: [SPARK-43351] [CONNECT] Add Spark Connect Go prototype code and example

2023-05-03 Thread via GitHub
grundprinzip commented on code in PR #41036: URL: https://github.com/apache/spark/pull/41036#discussion_r1183970744 ## connector/connect/client/go/README.md: ## @@ -0,0 +1,35 @@ +- Prepare your environment to generate proto Go files: Review Comment: The current buf build

[GitHub] [spark] zhenlineo commented on a diff in pull request #40997: [SPARK-43321][Connect] Dataset#Joinwith

2023-05-03 Thread via GitHub
zhenlineo commented on code in PR #40997: URL: https://github.com/apache/spark/pull/40997#discussion_r1183975202 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala: ## @@ -53,7 +53,25 @@ private[sql] class SparkResult[T](

[GitHub] [spark] hiboyang commented on pull request #41036: [SPARK-43351] [CONNECT] Add Spark Connect Go prototype code and example

2023-05-03 Thread via GitHub
hiboyang commented on PR #41036: URL: https://github.com/apache/spark/pull/41036#issuecomment-1533402592 > Thanks. Please file a JIRA and update the PR title, @hiboyang . > > cc @HyukjinKwon , @cloud-fan , @hvanhovell , @LuciferYang , @grundprinzip Thanks for the suggestion!

[GitHub] [spark] hvanhovell commented on a diff in pull request #40997: [SPARK-43321][Connect] Dataset#Joinwith

2023-05-03 Thread via GitHub
hvanhovell commented on code in PR #40997: URL: https://github.com/apache/spark/pull/40997#discussion_r1183955286 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala: ## @@ -53,7 +53,25 @@ private[sql] class SparkResult[T](

[GitHub] [spark] hvanhovell commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-03 Thread via GitHub
hvanhovell commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1183946902 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -664,7 +665,53 @@ class SparkConnectPlanner(val

[GitHub] [spark] imback82 commented on a diff in pull request #41020: [SPARK-43345][SPARK-43346][SQL] Rename the error classes _LEGACY_ERROR_TEMP_[0041|1206]

2023-05-03 Thread via GitHub
imback82 commented on code in PR #41020: URL: https://github.com/apache/spark/pull/41020#discussion_r1183939879 ## core/src/main/resources/error/error-classes.json: ## @@ -1278,6 +1288,11 @@ ], "sqlState" : "42826" }, + "OPERATION_NOT_ALLOWED" : { +"message" :

[GitHub] [spark] dongjoon-hyun commented on pull request #41028: [SPARK-43324][SQL] Handle UPDATE commands for delta-based sources

2023-05-03 Thread via GitHub
dongjoon-hyun commented on PR #41028: URL: https://github.com/apache/spark/pull/41028#issuecomment-159930 Thank you for pinging me, @aokolnychyi . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] hvanhovell commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-03 Thread via GitHub
hvanhovell commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1183916217 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -664,7 +665,53 @@ class SparkConnectPlanner(val

[GitHub] [spark] zhenlineo commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-03 Thread via GitHub
zhenlineo commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1183914885 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1867,7 +1928,34 @@ class SparkConnectPlanner(val

[GitHub] [spark] hvanhovell commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-03 Thread via GitHub
hvanhovell commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1183913165 ## sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala: ## @@ -700,6 +684,34 @@ private[sql] object RelationalGroupedDataset { new

[GitHub] [spark] dongjoon-hyun commented on pull request #40690: [SPARK-43043][CORE] Improve the performance of MapOutputTracker.updateMapOutput

2023-05-03 Thread via GitHub
dongjoon-hyun commented on PR #40690: URL: https://github.com/apache/spark/pull/40690#issuecomment-1533324077 Since this is `Performance` PR, could you contribute a micro-benchmark which is similar to your case? > This happens on a benchmark job generating a large number of very tiny

[GitHub] [spark] hvanhovell commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-03 Thread via GitHub
hvanhovell commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1183906392 ## sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala: ## @@ -700,6 +684,34 @@ private[sql] object RelationalGroupedDataset { new

[GitHub] [spark] MaxGekk commented on a diff in pull request #40632: [SPARK-42298][SQL] Assign name to _LEGACY_ERROR_TEMP_2132

2023-05-03 Thread via GitHub
MaxGekk commented on code in PR #40632: URL: https://github.com/apache/spark/pull/40632#discussion_r1183905218 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala: ## @@ -134,54 +137,60 @@ class JacksonParser( // List([str_a_1,null])

[GitHub] [spark] hvanhovell commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-03 Thread via GitHub
hvanhovell commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1183902935 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1867,7 +1928,34 @@ class SparkConnectPlanner(val

[GitHub] [spark] hvanhovell commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-03 Thread via GitHub
hvanhovell commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1183901421 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1867,7 +1928,34 @@ class SparkConnectPlanner(val

[GitHub] [spark] hvanhovell commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-03 Thread via GitHub
hvanhovell commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1183900469 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1867,7 +1928,34 @@ class SparkConnectPlanner(val

[GitHub] [spark] dongjoon-hyun closed pull request #41030: [SPARK-43352][K8S][TEST] Inline `DepsTestsSuite#setPythonSparkConfProperties`

2023-05-03 Thread via GitHub
dongjoon-hyun closed pull request #41030: [SPARK-43352][K8S][TEST] Inline `DepsTestsSuite#setPythonSparkConfProperties` URL: https://github.com/apache/spark/pull/41030 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] hiboyang opened a new pull request, #41036: [Connect] Add Spark Connect Go prototype code and example

2023-05-03 Thread via GitHub
hiboyang opened a new pull request, #41036: URL: https://github.com/apache/spark/pull/41036 ### What changes were proposed in this pull request? This pull request is to add a small Spark Connect Go client example and prototype. ### Why are the changes needed? Spark

[GitHub] [spark] xinrong-meng commented on pull request #41019: [SPARK-43344][BUILD] Upgrade `mlflow` to 2.3.1

2023-05-03 Thread via GitHub
xinrong-meng commented on PR #41019: URL: https://github.com/apache/spark/pull/41019#issuecomment-1533310260 Thank you @bjornjorgensen @dongjoon-hyun ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] xinrong-meng commented on a diff in pull request #40684: [SPARK-41532][CONNECT][CLIENT] Add check for operations that involve multiple data frames

2023-05-03 Thread via GitHub
xinrong-meng commented on code in PR #40684: URL: https://github.com/apache/spark/pull/40684#discussion_r1183889165 ## python/pyspark/sql/connect/dataframe.py: ## @@ -249,14 +254,18 @@ def crossJoin(self, other: "DataFrame") -> "DataFrame": raise Exception("Cannot

[GitHub] [spark] xinrong-meng commented on a diff in pull request #40684: [SPARK-41532][CONNECT][CLIENT] Add check for operations that involve multiple data frames

2023-05-03 Thread via GitHub
xinrong-meng commented on code in PR #40684: URL: https://github.com/apache/spark/pull/40684#discussion_r1183889165 ## python/pyspark/sql/connect/dataframe.py: ## @@ -249,14 +254,18 @@ def crossJoin(self, other: "DataFrame") -> "DataFrame": raise Exception("Cannot

[GitHub] [spark] xinrong-meng commented on a diff in pull request #40684: [SPARK-41532][CONNECT][CLIENT] Add check for operations that involve multiple data frames

2023-05-03 Thread via GitHub
xinrong-meng commented on code in PR #40684: URL: https://github.com/apache/spark/pull/40684#discussion_r1183889165 ## python/pyspark/sql/connect/dataframe.py: ## @@ -249,14 +254,18 @@ def crossJoin(self, other: "DataFrame") -> "DataFrame": raise Exception("Cannot

[GitHub] [spark] hvanhovell commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-03 Thread via GitHub
hvanhovell commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1183888685 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala: ## @@ -703,3 +703,20 @@ case class CoGroup( override protected def

[GitHub] [spark] dongjoon-hyun commented on pull request #40994: [SPARK-43319][K8S][TEST] Remove usage of deprecated DefaultKubernetesClient

2023-05-03 Thread via GitHub
dongjoon-hyun commented on PR #40994: URL: https://github.com/apache/spark/pull/40994#issuecomment-1533285058 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] pan3793 commented on pull request #40994: [SPARK-43319][K8S][TEST] Remove usage of deprecated DefaultKubernetesClient

2023-05-03 Thread via GitHub
pan3793 commented on PR #40994: URL: https://github.com/apache/spark/pull/40994#issuecomment-1533284437 > Since #41034 is merged, could you rebase this PR to the master branch, @pan3793 ? Sure, rebased -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] hvanhovell closed pull request #41005: [SPARK-43331][CONNECT] Add Spark Connect SparkSession.interruptAll

2023-05-03 Thread via GitHub
hvanhovell closed pull request #41005: [SPARK-43331][CONNECT] Add Spark Connect SparkSession.interruptAll URL: https://github.com/apache/spark/pull/41005 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] hvanhovell commented on pull request #41005: [SPARK-43331][CONNECT] Add Spark Connect SparkSession.interruptAll

2023-05-03 Thread via GitHub
hvanhovell commented on PR #41005: URL: https://github.com/apache/spark/pull/41005#issuecomment-1533279355 Merging to master thanks! Please address remaining comments in a couple of follow-ups. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] dongjoon-hyun closed pull request #41029: [SPARK-43350][BUILD] Upgrade `scalafmt` to 3.7.3

2023-05-03 Thread via GitHub
dongjoon-hyun closed pull request #41029: [SPARK-43350][BUILD] Upgrade `scalafmt` to 3.7.3 URL: https://github.com/apache/spark/pull/41029 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

  1   2   >