[GitHub] [spark] zhengruifeng commented on pull request #40724: [SPARK-43081] [ML] [CONNECT] Add torch distributor data loader that loads data from spark partition data

2023-04-26 Thread via GitHub
zhengruifeng commented on PR #40724: URL: https://github.com/apache/spark/pull/40724#issuecomment-1524848380 `test_data_loader` should also be added to `modules.py` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] LuciferYang commented on a diff in pull request #40675: [SPARK-42657][CONNECT] Support to find and transfer client-side REPL classfiles to server as artifacts

2023-04-26 Thread via GitHub
LuciferYang commented on code in PR #40675: URL: https://github.com/apache/spark/pull/40675#discussion_r1178635161 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/application/ReplE2ESuite.scala: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] [spark] yaooqinn commented on a diff in pull request #40922: [SPARK-43063][SQL][FOLLOWUP] Add ToPrettyString expression for Dataset.show

2023-04-26 Thread via GitHub
yaooqinn commented on code in PR #40922: URL: https://github.com/apache/spark/pull/40922#discussion_r1178621599 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ToPrettyString.scala: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [spark] yaooqinn commented on a diff in pull request #40922: [SPARK-43063][SQL][FOLLOWUP] Add ToPrettyString expression for Dataset.show

2023-04-26 Thread via GitHub
yaooqinn commented on code in PR #40922: URL: https://github.com/apache/spark/pull/40922#discussion_r1178621599 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ToPrettyString.scala: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [spark] warrenzhu25 commented on pull request #40911: [SPARK-43237][CORE] Handle null exception message in event log

2023-04-26 Thread via GitHub
warrenzhu25 commented on PR #40911: URL: https://github.com/apache/spark/pull/40911#issuecomment-1524691247 > Can you fix the conflict @warrenzhu25 ? We can merge it after that. Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] zhengruifeng commented on pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-04-26 Thread via GitHub
zhengruifeng commented on PR #40586: URL: https://github.com/apache/spark/pull/40586#issuecomment-1524690074 @rangadi It seems the doctest `pyspark.sql.connect.dataframe.DataFrame.writeStream` is not stable, would you mind taking a look? https://github.com/apache/spark/actions/runs/4

[GitHub] [spark] mridulm commented on pull request #40911: [SPARK-43237][CORE] Handle null exception message in event log

2023-04-26 Thread via GitHub
mridulm commented on PR #40911: URL: https://github.com/apache/spark/pull/40911#issuecomment-1524684791 Can you fix the conflict @warrenzhu25 ? We can merge it after that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [spark] mridulm commented on pull request #40687: [SPARK-43052][CORE] Handle stacktrace with null file name in event log

2023-04-26 Thread via GitHub
mridulm commented on PR #40687: URL: https://github.com/apache/spark/pull/40687#issuecomment-1524683595 Merging to master. Thanks for fixing this @warrenzhu25 Thanks for the review @HyukjinKwon, @srowen :-) -- This is an automated message from the Apache Git Service. To respond to t

[GitHub] [spark] mridulm closed pull request #40687: [SPARK-43052][CORE] Handle stacktrace with null file name in event log

2023-04-26 Thread via GitHub
mridulm closed pull request #40687: [SPARK-43052][CORE] Handle stacktrace with null file name in event log URL: https://github.com/apache/spark/pull/40687 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] WeichenXu123 commented on pull request #40724: [SPARK-43081] [ML] [CONNECT] Add torch distributor data loader that loads data from spark partition data

2023-04-26 Thread via GitHub
WeichenXu123 commented on PR #40724: URL: https://github.com/apache/spark/pull/40724#issuecomment-1524671277 > @mengxr raises another suggestion: uses petastorm to load data from DBFS / HDFS /.. .(so that it can make torch distributor has a simpler interfaces). But there’s a shortcoming tha

[GitHub] [spark] wgtmac closed pull request #40971: Test Apache ORC 1.7.9-SNAPSHOT

2023-04-26 Thread via GitHub
wgtmac closed pull request #40971: Test Apache ORC 1.7.9-SNAPSHOT URL: https://github.com/apache/spark/pull/40971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40724: [SPARK-43081] [ML] [CONNECT] Add torch distributor data loader that loads data from spark partition data

2023-04-26 Thread via GitHub
WeichenXu123 commented on code in PR #40724: URL: https://github.com/apache/spark/pull/40724#discussion_r1178603320 ## python/pyspark/ml/torch/tests/test_distributor.py: ## @@ -431,7 +431,7 @@ def test_dist_training_succeeds(self) -> None: ) sel

[GitHub] [spark] cloud-fan commented on a diff in pull request #40739: [SPARK-43302][SQL] Make Python UDAF an AggregateFunction

2023-04-26 Thread via GitHub
cloud-fan commented on code in PR #40739: URL: https://github.com/apache/spark/pull/40739#discussion_r1178598343 ## sql/core/src/test/resources/sql-tests/results/udaf/udaf-group-by.sql.out: ## @@ -315,24 +315,9 @@ struct<1:int> -- !query SELECT 1 FROM range(10) HAVING udaf(id)

[GitHub] [spark] cloud-fan commented on a diff in pull request #40739: [SPARK-43302][SQL] Make Python UDAF an AggregateFunction

2023-04-26 Thread via GitHub
cloud-fan commented on code in PR #40739: URL: https://github.com/apache/spark/pull/40739#discussion_r1178598100 ## sql/core/src/test/resources/sql-tests/results/udaf/udaf-group-by-ordinal.sql.out: ## @@ -93,12 +93,19 @@ struct<> -- !query output org.apache.spark.sql.AnalysisE

[GitHub] [spark] cxzl25 opened a new pull request, #40972: [SPARK-43301][CORE][SHUFFLE] BlockStoreClient getHostLocalDirs RPC supports IOexception retry

2023-04-26 Thread via GitHub
cxzl25 opened a new pull request, #40972: URL: https://github.com/apache/spark/pull/40972 ### What changes were proposed in this pull request? Use `CompletableFuture` to implement retry logic, and retry operations are performed asynchronously. ### Why are the changes needed? `Bl

[GitHub] [spark] cloud-fan commented on pull request #40922: [SPARK-43063][SQL][FOLLOWUP] Add ToPrettyString expression for Dataset.show

2023-04-26 Thread via GitHub
cloud-fan commented on PR #40922: URL: https://github.com/apache/spark/pull/40922#issuecomment-1524628328 cc @LuciferYang @AngersZh @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] sunchao commented on pull request #40920: [SPARK-43248][SQL] Unnecessary serialize/deserialize of Path on parallel gather partition stats

2023-04-26 Thread via GitHub
sunchao commented on PR #40920: URL: https://github.com/apache/spark/pull/40920#issuecomment-1524621007 Oops almost forgot. Merged to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] sunchao closed pull request #40920: [SPARK-43248][SQL] Unnecessary serialize/deserialize of Path on parallel gather partition stats

2023-04-26 Thread via GitHub
sunchao closed pull request #40920: [SPARK-43248][SQL] Unnecessary serialize/deserialize of Path on parallel gather partition stats URL: https://github.com/apache/spark/pull/40920 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [spark] allisonwang-db commented on a diff in pull request #40966: [SPARK-43206] [SS] [CONNECT] StreamingQuery exception() include stack trace

2023-04-26 Thread via GitHub
allisonwang-db commented on code in PR #40966: URL: https://github.com/apache/spark/pull/40966#discussion_r1178591206 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2326,10 +2328,21 @@ class SparkConnectPlanner(v

[GitHub] [spark] HyukjinKwon commented on pull request #40967: [SPARK-43298][PYTHON][ML] predict_batch_udf with scalar input fails with batch size of one

2023-04-26 Thread via GitHub
HyukjinKwon commented on PR #40967: URL: https://github.com/apache/spark/pull/40967#issuecomment-1524585405 Mind taking a look at https://github.com/apache/spark/pull/40967/checks?check_run_id=13051337101? -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [spark] LuciferYang commented on a diff in pull request #40933: [SPARK-43263][BUILD] Upgrade `FasterXML jackson` to 2.15.0

2023-04-26 Thread via GitHub
LuciferYang commented on code in PR #40933: URL: https://github.com/apache/spark/pull/40933#discussion_r1178573353 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala: ## @@ -117,7 +135,9 @@ private[sql] class JSONOptions( val timestampNTZFor

[GitHub] [spark] cloud-fan closed pull request #40961: [SPARK-43293][SQL] `__qualified_access_only` should be ignored in normal columns

2023-04-26 Thread via GitHub
cloud-fan closed pull request #40961: [SPARK-43293][SQL] `__qualified_access_only` should be ignored in normal columns URL: https://github.com/apache/spark/pull/40961 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] cloud-fan commented on pull request #40961: [SPARK-43293][SQL] `__qualified_access_only` should be ignored in normal columns

2023-04-26 Thread via GitHub
cloud-fan commented on PR #40961: URL: https://github.com/apache/spark/pull/40961#issuecomment-1524530055 thanks for the review, merging to master/3.4/3.3! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

2023-04-26 Thread via GitHub
itholic commented on code in PR #40722: URL: https://github.com/apache/spark/pull/40722#discussion_r1178559820 ## python/pyspark/pandas/internal.py: ## @@ -628,7 +624,13 @@ def __init__( >>> internal.column_label_names [('column_labels_a',), ('column_labels_b',

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

2023-04-26 Thread via GitHub
itholic commented on code in PR #40722: URL: https://github.com/apache/spark/pull/40722#discussion_r1178559820 ## python/pyspark/pandas/internal.py: ## @@ -628,7 +624,13 @@ def __init__( >>> internal.column_label_names [('column_labels_a',), ('column_labels_b',

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

2023-04-26 Thread via GitHub
itholic commented on code in PR #40722: URL: https://github.com/apache/spark/pull/40722#discussion_r1178559820 ## python/pyspark/pandas/internal.py: ## @@ -628,7 +624,13 @@ def __init__( >>> internal.column_label_names [('column_labels_a',), ('column_labels_b',

[GitHub] [spark] itholic commented on a diff in pull request #40722: [SPARK-43076][PS][CONNECT] Removing the dependency on `grpcio` when remote session is not used.

2023-04-26 Thread via GitHub
itholic commented on code in PR #40722: URL: https://github.com/apache/spark/pull/40722#discussion_r1178559820 ## python/pyspark/pandas/internal.py: ## @@ -628,7 +624,13 @@ def __init__( >>> internal.column_label_names [('column_labels_a',), ('column_labels_b',

[GitHub] [spark] itholic commented on pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-04-26 Thread via GitHub
itholic commented on PR #40420: URL: https://github.com/apache/spark/pull/40420#issuecomment-1524465392 Could you resolve mypy check? You can run the static analysis by running `dev/lint-python` locally. -- This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [spark] pan3793 commented on pull request #40920: [SPARK-43248][SQL] Unnecessary serialize/deserialize of Path on parallel gather partition stats

2023-04-26 Thread via GitHub
pan3793 commented on PR #40920: URL: https://github.com/apache/spark/pull/40920#issuecomment-1524462509 kindly ping @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40890: [SPARK-43219][SQL][DOCS] Add `INSERT INTO REPLACE WHERE` statement into website

2023-04-26 Thread via GitHub
Hisoka-X commented on code in PR #40890: URL: https://github.com/apache/spark/pull/40890#discussion_r1178554291 ## docs/sql-ref-syntax-dml-insert-table.md: ## @@ -26,8 +26,8 @@ The `INSERT` statement inserts new rows into a table or overwrites the existing ### Syntax ```sql

[GitHub] [spark] itholic commented on pull request #40436: [SPARK-42619][PS] Add `show_counts` parameter for DataFrame.info

2023-04-26 Thread via GitHub
itholic commented on PR #40436: URL: https://github.com/apache/spark/pull/40436#issuecomment-1524459310 Looks pretty good., but let's wait until the [initial pandas 2.0 support](https://github.com/apache/spark/pull/40658) is done. -- This is an automated message from the Apache Git Servic

[GitHub] [spark] rangadi commented on a diff in pull request #40937: [SPARK-42940][SS][CONNECT] Improve session management for streaming queries

2023-04-26 Thread via GitHub
rangadi commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1178553564 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala: ## @@ -0,0 +1,200 @@ +/* + * Licensed to the Apac

[GitHub] [spark] itholic commented on pull request #40370: [SPARK-42620][PS] Add `inclusive` parameter for (DataFrame|Series).between_time

2023-04-26 Thread via GitHub
itholic commented on PR #40370: URL: https://github.com/apache/spark/pull/40370#issuecomment-1524450947 Change itself looks pretty good to me, once the CI is passed after the [initial pandas 2.0 support](https://github.com/apache/spark/pull/40658) completing. -- This is an automated mess

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #40937: [SPARK-42940][SS][CONNECT] Improve session management for streaming queries

2023-04-26 Thread via GitHub
HeartSaVioR commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1178548438 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala: ## @@ -0,0 +1,200 @@ +/* + * Licensed to the

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #40937: [SPARK-42940][SS][CONNECT] Improve session management for streaming queries

2023-04-26 Thread via GitHub
HeartSaVioR commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1178548438 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala: ## @@ -0,0 +1,200 @@ +/* + * Licensed to the

[GitHub] [spark] hvanhovell closed pull request #40931: [SPARK-43265] Move Error framework to a common utils module

2023-04-26 Thread via GitHub
hvanhovell closed pull request #40931: [SPARK-43265] Move Error framework to a common utils module URL: https://github.com/apache/spark/pull/40931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] rangadi commented on a diff in pull request #40937: [SPARK-42940][SS][CONNECT] Improve session management for streaming queries

2023-04-26 Thread via GitHub
rangadi commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1178534860 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala: ## @@ -0,0 +1,200 @@ +/* + * Licensed to the Apac

[GitHub] [spark] rangadi commented on a diff in pull request #40937: [SPARK-42940][SS][CONNECT] Improve session management for streaming queries

2023-04-26 Thread via GitHub
rangadi commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1178534860 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala: ## @@ -0,0 +1,200 @@ +/* + * Licensed to the Apac

[GitHub] [spark] hvanhovell commented on pull request #40931: [SPARK-43265] Move Error framework to a common utils module

2023-04-26 Thread via GitHub
hvanhovell commented on PR #40931: URL: https://github.com/apache/spark/pull/40931#issuecomment-1524376979 Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #40937: [SPARK-42940][SS][CONNECT] Improve session management for streaming queries

2023-04-26 Thread via GitHub
HeartSaVioR commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1178532487 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala: ## @@ -0,0 +1,200 @@ +/* + * Licensed to the

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #40937: [SPARK-42940][SS][CONNECT] Improve session management for streaming queries

2023-04-26 Thread via GitHub
HeartSaVioR commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1178532487 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala: ## @@ -0,0 +1,200 @@ +/* + * Licensed to the

[GitHub] [spark] zhengruifeng commented on pull request #40954: [PYSPARK] [CONNECT] [ML] PySpark UDF supports python package dependencies

2023-04-26 Thread via GitHub
zhengruifeng commented on PR #40954: URL: https://github.com/apache/spark/pull/40954#issuecomment-1524348310 also cc @cloud-fan @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] zhengruifeng commented on pull request #40965: [SPARK-42192][FOLLOWUP][PYTHON] Refine improper error class and error type

2023-04-26 Thread via GitHub
zhengruifeng commented on PR #40965: URL: https://github.com/apache/spark/pull/40965#issuecomment-1524332211 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] zhengruifeng closed pull request #40965: [SPARK-42192][FOLLOWUP][PYTHON] Refine improper error class and error type

2023-04-26 Thread via GitHub
zhengruifeng closed pull request #40965: [SPARK-42192][FOLLOWUP][PYTHON] Refine improper error class and error type URL: https://github.com/apache/spark/pull/40965 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] cloud-fan commented on a diff in pull request #40890: [SPARK-43219][SQL][DOCS] Add `INSERT INTO REPLACE WHERE` statement into website

2023-04-26 Thread via GitHub
cloud-fan commented on code in PR #40890: URL: https://github.com/apache/spark/pull/40890#discussion_r1178523834 ## docs/sql-ref-syntax-dml-insert-table.md: ## @@ -26,8 +26,8 @@ The `INSERT` statement inserts new rows into a table or overwrites the existing ### Syntax ```sq

[GitHub] [spark] cloud-fan commented on a diff in pull request #40890: [SPARK-43219][SQL][DOCS] Add `INSERT INTO REPLACE WHERE` statement into website

2023-04-26 Thread via GitHub
cloud-fan commented on code in PR #40890: URL: https://github.com/apache/spark/pull/40890#discussion_r1178523834 ## docs/sql-ref-syntax-dml-insert-table.md: ## @@ -26,8 +26,8 @@ The `INSERT` statement inserts new rows into a table or overwrites the existing ### Syntax ```sq

[GitHub] [spark] rangadi commented on a diff in pull request #40937: [SPARK-42940][SS][CONNECT] Improve session management for streaming queries

2023-04-26 Thread via GitHub
rangadi commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1178495387 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala: ## @@ -0,0 +1,200 @@ +/* + * Licensed to the Apac

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40896: [SPARK-43229][ML][PYTHON][CONNECT] Introduce Barrier Python UDF

2023-04-26 Thread via GitHub
zhengruifeng commented on code in PR #40896: URL: https://github.com/apache/spark/pull/40896#discussion_r1178518856 ## python/pyspark/sql/udf.py: ## @@ -465,6 +476,7 @@ def wrapper(*args: "ColumnOrName") -> Column: wrapper.returnType = self.returnType # type: ignore[at

[GitHub] [spark] LuciferYang commented on pull request #40956: [SPARK-43292][BUILD] [CONNECT] Add `spark-repl` as maven test dependency of `connect-server` module

2023-04-26 Thread via GitHub
LuciferYang commented on PR #40956: URL: https://github.com/apache/spark/pull/40956#issuecomment-1524292143 > @LuciferYang can you also get rid of the reflection in Executor.scala that used to be needed? I'll look this today, and I think there should be no need to use reflection anym

[GitHub] [spark] rangadi commented on a diff in pull request #40937: [SPARK-42940][SS][CONNECT] Improve session management for streaming queries

2023-04-26 Thread via GitHub
rangadi commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1178495387 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala: ## @@ -0,0 +1,200 @@ +/* + * Licensed to the Apac

[GitHub] [spark] amaliujia commented on a diff in pull request #40968: [SPARK-43143] [SS] [CONNECT] Scala StreamingQuery awaitTermination()

2023-04-26 Thread via GitHub
amaliujia commented on code in PR #40968: URL: https://github.com/apache/spark/pull/40968#discussion_r1178479301 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala: ## @@ -159,6 +185,21 @@ class RemoteStreamingQuery( executeQu

[GitHub] [spark] sweisdb opened a new pull request, #40970: [SPARK-43290][SQL] Adds IV and AAD support to aes_encrypt/aes_decrypt

2023-04-26 Thread via GitHub
sweisdb opened a new pull request, #40970: URL: https://github.com/apache/spark/pull/40970 ### What changes were proposed in this pull request? This change adds support for optional IV and AAD fields to `aes_encrypt` and `aes_decrypt`. This allows callers to specify their own initializati

[GitHub] [spark] sweisdb commented on pull request #40903: [WIP][SPARK-NNNNN] Updating AES-CBC support to not use OpenSSL's KDF

2023-04-26 Thread via GitHub
sweisdb commented on PR #40903: URL: https://github.com/apache/spark/pull/40903#issuecomment-1524131444 Closing this and will replace with SPARK-43286 and SPARK-43290. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] sweisdb closed pull request #40903: [WIP][SPARK-NNNNN] Updating AES-CBC support to not use OpenSSL's KDF

2023-04-26 Thread via GitHub
sweisdb closed pull request #40903: [WIP][SPARK-N] Updating AES-CBC support to not use OpenSSL's KDF URL: https://github.com/apache/spark/pull/40903 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] sweisdb opened a new pull request, #40969: [SPARK-43286][SQL] Updates aes_encrypt CBC mode to generate random IVs

2023-04-26 Thread via GitHub
sweisdb opened a new pull request, #40969: URL: https://github.com/apache/spark/pull/40969 ### What changes were proposed in this pull request? The current implementation of AES-CBC mode called via `aes_encrypt` and `aes_decrypt` uses a key derivation function (KDF) based on OpenSSL's

[GitHub] [spark] warrenzhu25 commented on a diff in pull request #40911: [SPARK-43237][CORE] Handle null exception message in event log

2023-04-26 Thread via GitHub
warrenzhu25 commented on code in PR #40911: URL: https://github.com/apache/spark/pull/40911#discussion_r1178462815 ## core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala: ## @@ -809,6 +809,11 @@ class JsonProtocolSuite extends SparkFunSuite { JsonProtocol.t

[GitHub] [spark] warrenzhu25 commented on a diff in pull request #40687: [SPARK-43052][CORE] Handle stacktrace with null file name in event log

2023-04-26 Thread via GitHub
warrenzhu25 commented on code in PR #40687: URL: https://github.com/apache/spark/pull/40687#discussion_r1178460260 ## core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala: ## @@ -904,6 +909,12 @@ private[spark] object JsonProtocolSuite extends Assertions { ass

[GitHub] [spark] rangadi commented on a diff in pull request #40968: [SPARK-43143] [SS] [CONNECT] Scala StreamingQuery awaitTermination()

2023-04-26 Thread via GitHub
rangadi commented on code in PR #40968: URL: https://github.com/apache/spark/pull/40968#discussion_r1178408626 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala: ## @@ -159,6 +185,21 @@ class RemoteStreamingQuery( executeQuer

[GitHub] [spark] rangadi commented on a diff in pull request #40968: [SPARK-43143] [SS] [CONNECT] Scala StreamingQuery awaitTermination()

2023-04-26 Thread via GitHub
rangadi commented on code in PR #40968: URL: https://github.com/apache/spark/pull/40968#discussion_r1178408626 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala: ## @@ -159,6 +185,21 @@ class RemoteStreamingQuery( executeQuer

[GitHub] [spark] rangadi commented on a diff in pull request #40968: [SPARK-43143] [SS] [CONNECT] Scala StreamingQuery awaitTermination()

2023-04-26 Thread via GitHub
rangadi commented on code in PR #40968: URL: https://github.com/apache/spark/pull/40968#discussion_r1178408626 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala: ## @@ -159,6 +185,21 @@ class RemoteStreamingQuery( executeQuer

[GitHub] [spark] rangadi commented on a diff in pull request #40937: [SPARK-42940][SS][CONNECT] Improve session management for streaming queries

2023-04-26 Thread via GitHub
rangadi commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1178403254 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -268,6 +268,12 @@ object SparkConnectService { priva

[GitHub] [spark] rangadi commented on a diff in pull request #40937: [SPARK-42940][SS][CONNECT] Improve session management for streaming queries

2023-04-26 Thread via GitHub
rangadi commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1178402235 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala: ## @@ -0,0 +1,203 @@ +/* + * Licensed to the Apac

[GitHub] [spark] WweiL commented on a diff in pull request #40937: [SPARK-42940][SS][CONNECT] Improve session management for streaming queries

2023-04-26 Thread via GitHub
WweiL commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1178400574 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala: ## @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache

[GitHub] [spark] rangadi commented on a diff in pull request #40937: [SPARK-42940][SS][CONNECT] Improve session management for streaming queries

2023-04-26 Thread via GitHub
rangadi commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1177096641 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala: ## @@ -0,0 +1,203 @@ +/* + * Licensed to the Apac

[GitHub] [spark] rangadi commented on a diff in pull request #40937: [SPARK-42940][SS][CONNECT] Improve session management for streaming queries

2023-04-26 Thread via GitHub
rangadi commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1178398941 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2092,6 +2097,11 @@ class SparkConnectPlanner(val sessi

[GitHub] [spark] pengzhon-db commented on a diff in pull request #40937: [SPARK-42940][SS][CONNECT] Improve session management for streaming queries

2023-04-26 Thread via GitHub
pengzhon-db commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1178398721 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -268,6 +268,12 @@ object SparkConnectService { p

[GitHub] [spark] WweiL commented on pull request #40968: [SPARK-43143] [SS] [CONNECT] Scala StreamingQuery awaitTermination()

2023-04-26 Thread via GitHub
WweiL commented on PR #40968: URL: https://github.com/apache/spark/pull/40968#issuecomment-1524045898 @rangadi @pengzhon-db @zhenlineo Can you guys take a look? Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [spark] rangadi commented on a diff in pull request #40937: [SPARK-42940][SS][CONNECT] Improve session management for streaming queries

2023-04-26 Thread via GitHub
rangadi commented on code in PR #40937: URL: https://github.com/apache/spark/pull/40937#discussion_r1178398552 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala: ## @@ -0,0 +1,203 @@ +/* + * Licensed to the Apac

[GitHub] [spark] WweiL opened a new pull request, #40968: [SPARK-43143] [SS] [CONNECT] Scala StreamingQuery awaitTermination()

2023-04-26 Thread via GitHub
WweiL opened a new pull request, #40968: URL: https://github.com/apache/spark/pull/40968 ### What changes were proposed in this pull request? Add the awaitTermination() API to scala client query. Please note that currently it won't throw the exception as it would do in origina

[GitHub] [spark] leewyang commented on pull request #40967: predict_batch_udf with scalar input fails with batch size of one

2023-04-26 Thread via GitHub
leewyang commented on PR #40967: URL: https://github.com/apache/spark/pull/40967#issuecomment-1524033540 cc @HyukjinKwon @mengxr @WeichenXu123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] leewyang opened a new pull request, #40967: predict_batch_udf with scalar input fails with batch size of one

2023-04-26 Thread via GitHub
leewyang opened a new pull request, #40967: URL: https://github.com/apache/spark/pull/40967 ### What changes were proposed in this pull request? This is a followup to #39817 to handle another error condition when the input batch is a single scalar value (where the previous fix focused

[GitHub] [spark] EnricoMi commented on a diff in pull request #39673: [SPARK-42132][SQL] Deduplicate attributes in groupByKey.cogroup

2023-04-26 Thread via GitHub
EnricoMi commented on code in PR #39673: URL: https://github.com/apache/spark/pull/39673#discussion_r1178369201 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala: ## @@ -659,22 +659,46 @@ object CoGroup { right: LogicalPlan): LogicalP

[GitHub] [spark] EnricoMi commented on a diff in pull request #39673: [SPARK-42132][SQL] Deduplicate attributes in groupByKey.cogroup

2023-04-26 Thread via GitHub
EnricoMi commented on code in PR #39673: URL: https://github.com/apache/spark/pull/39673#discussion_r1178369201 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala: ## @@ -659,22 +659,46 @@ object CoGroup { right: LogicalPlan): LogicalP

[GitHub] [spark] RyanBerti commented on pull request #40615: [SPARK-16484][SQL] Add support for Datasketches HllSketch

2023-04-26 Thread via GitHub
RyanBerti commented on PR #40615: URL: https://github.com/apache/spark/pull/40615#issuecomment-1523999640 @mkaravel I've updated the implementation based on your review comments. We're now returning the updatable binary representation, no longer support the tgtHllType parameter, and defer i

[GitHub] [spark] MaxGekk commented on a diff in pull request #40827: [SPARK-42585][CONNECT] Streaming of local relations

2023-04-26 Thread via GitHub
MaxGekk commented on code in PR #40827: URL: https://github.com/apache/spark/pull/40827#discussion_r1178332122 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -91,7 +93,18 @@ class SparkConnectArtifactMana

[GitHub] [spark] WweiL closed pull request #40950: [SPARK-43206] [SS] [CONNECT] [DRAFT] [DO-NOT-REVIEW] StreamingQuery exception() include stack trace

2023-04-26 Thread via GitHub
WweiL closed pull request #40950: [SPARK-43206] [SS] [CONNECT] [DRAFT] [DO-NOT-REVIEW] StreamingQuery exception() include stack trace URL: https://github.com/apache/spark/pull/40950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] WweiL commented on pull request #40966: [SPARK-43206] [SS] [CONNECT] [DRAFT] [DO-NOT-REVIEW] StreamingQuery exception() include stack trace

2023-04-26 Thread via GitHub
WweiL commented on PR #40966: URL: https://github.com/apache/spark/pull/40966#issuecomment-1523950470 @allisonwang-db Could you also take a look? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] WweiL commented on pull request #40966: [SPARK-43206] [SS] [CONNECT] [DRAFT] [DO-NOT-REVIEW] StreamingQuery exception() include stack trace

2023-04-26 Thread via GitHub
WweiL commented on PR #40966: URL: https://github.com/apache/spark/pull/40966#issuecomment-1523950046 @rangadi @pengzhon-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] WweiL opened a new pull request, #40966: [SPARK-43206] [SS] [CONNECT] [DRAFT] [DO-NOT-REVIEW] StreamingQuery exception() include stack trace

2023-04-26 Thread via GitHub
WweiL opened a new pull request, #40966: URL: https://github.com/apache/spark/pull/40966 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was t

[GitHub] [spark] hvanhovell commented on pull request #40956: [SPARK-43292][BUILD] [CONNECT] Add `spark-repl` as maven test dependency of `connect-server` module

2023-04-26 Thread via GitHub
hvanhovell commented on PR #40956: URL: https://github.com/apache/spark/pull/40956#issuecomment-1523898344 @LuciferYang can you also get rid of the reflection in Executor.scala that used to be needed? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] hvanhovell commented on pull request #40956: [SPARK-43292][BUILD] [CONNECT] Add `spark-repl` as maven test dependency of `connect-server` module

2023-04-26 Thread via GitHub
hvanhovell commented on PR #40956: URL: https://github.com/apache/spark/pull/40956#issuecomment-1523896577 Yeah let's put it in `org.apache.spark.executor`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] LuciferYang commented on pull request #40956: [SPARK-43292][BUILD] [CONNECT] Add `spark-repl` as maven test dependency of `connect-server` module

2023-04-26 Thread via GitHub
LuciferYang commented on PR #40956: URL: https://github.com/apache/spark/pull/40956#issuecomment-1523845845 will update pr title and description later if all test pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [spark] LuciferYang commented on pull request #40956: [SPARK-43292][BUILD] [CONNECT] Add `spark-repl` as maven test dependency of `connect-server` module

2023-04-26 Thread via GitHub
LuciferYang commented on PR #40956: URL: https://github.com/apache/spark/pull/40956#issuecomment-1523834900 @hvanhovell Is it better to keep `ExecutorClassLoader` in `org.apache.spark.repl` or move it to other package, like `org.apache.spark.executor`? -- This is an autom

[GitHub] [spark] HyukjinKwon commented on pull request #40370: [SPARK-42620][PS] Add `inclusive` parameter for (DataFrame|Series).between_time

2023-04-26 Thread via GitHub
HyukjinKwon commented on PR #40370: URL: https://github.com/apache/spark/pull/40370#issuecomment-1523825165 cc @itholic @zhengruifeng @xinrong-meng if you find some time to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] HyukjinKwon commented on pull request #40907: [SPARK-43270][PYTHON] Implement `__dir__()` in `pyspark.sql.dataframe.DataFrame` to include columns

2023-04-26 Thread via GitHub
HyukjinKwon commented on PR #40907: URL: https://github.com/apache/spark/pull/40907#issuecomment-1523824585 Looks fine to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40907: [SPARK-43270][PYTHON] Implement `__dir__()` in `pyspark.sql.dataframe.DataFrame` to include columns

2023-04-26 Thread via GitHub
HyukjinKwon commented on code in PR #40907: URL: https://github.com/apache/spark/pull/40907#discussion_r1178214495 ## python/pyspark/sql/dataframe.py: ## @@ -3008,6 +3008,34 @@ def __getattr__(self, name: str) -> Column: jc = self._jdf.apply(name) return Column

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40907: [SPARK-43270][PYTHON] Implement `__dir__()` in `pyspark.sql.dataframe.DataFrame` to include columns

2023-04-26 Thread via GitHub
HyukjinKwon commented on code in PR #40907: URL: https://github.com/apache/spark/pull/40907#discussion_r1178214107 ## python/pyspark/sql/dataframe.py: ## @@ -3008,6 +3008,34 @@ def __getattr__(self, name: str) -> Column: jc = self._jdf.apply(name) return Column

[GitHub] [spark] LuciferYang commented on pull request #40956: [SPARK-43292][BUILD] [CONNECT] Add `spark-repl` as maven test dependency of `connect-server` module

2023-04-26 Thread via GitHub
LuciferYang commented on PR #40956: URL: https://github.com/apache/spark/pull/40956#issuecomment-1523822735 SGTM, Let GA test it first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] HyukjinKwon commented on pull request #40934: [SPARK-43268][SQL] Use proper error classes when exceptions are constructed with a message

2023-04-26 Thread via GitHub
HyukjinKwon commented on PR #40934: URL: https://github.com/apache/spark/pull/40934#issuecomment-1523816341 LGTM2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[GitHub] [spark] HyukjinKwon commented on pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-04-26 Thread via GitHub
HyukjinKwon commented on PR #40420: URL: https://github.com/apache/spark/pull/40420#issuecomment-1523814214 cc @itholic @zhengruifeng @xinrong-meng if you find some time to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] HyukjinKwon commented on pull request #40436: [SPARK-42619][PS] Add `show_counts` parameter for DataFrame.info

2023-04-26 Thread via GitHub
HyukjinKwon commented on PR #40436: URL: https://github.com/apache/spark/pull/40436#issuecomment-1523814336 cc @itholic @zhengruifeng @xinrong-meng if you find some time to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] LuciferYang commented on pull request #40877: [SPARK-31733][YARN][TESTS] Make `specify a more specific type for the application` in `ClientSuite` pass in Hadoop 3

2023-04-26 Thread via GitHub
LuciferYang commented on PR #40877: URL: https://github.com/apache/spark/pull/40877#issuecomment-1523813028 Thanks @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40675: [SPARK-42657][CONNECT] Support to find and transfer client-side REPL classfiles to server as artifacts

2023-04-26 Thread via GitHub
HyukjinKwon commented on code in PR #40675: URL: https://github.com/apache/spark/pull/40675#discussion_r1178203553 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/application/ReplE2ESuite.scala: ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] [spark] HyukjinKwon closed pull request #40877: [SPARK-31733][YARN][TESTS] Make `specify a more specific type for the application` in `ClientSuite` pass in Hadoop 3

2023-04-26 Thread via GitHub
HyukjinKwon closed pull request #40877: [SPARK-31733][YARN][TESTS] Make `specify a more specific type for the application` in `ClientSuite` pass in Hadoop 3 URL: https://github.com/apache/spark/pull/40877 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon commented on pull request #40877: [SPARK-31733][YARN][TESTS] Make `specify a more specific type for the application` in `ClientSuite` pass in Hadoop 3

2023-04-26 Thread via GitHub
HyukjinKwon commented on PR #40877: URL: https://github.com/apache/spark/pull/40877#issuecomment-1523806413 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] itholic opened a new pull request, #40965: [SPARK-42192][FOLLOWUP][PYTHON] Refine improper error class and error type

2023-04-26 Thread via GitHub
itholic opened a new pull request, #40965: URL: https://github.com/apache/spark/pull/40965 ### What changes were proposed in this pull request? This is follow-up for https://github.com/apache/spark/pull/39785 to address improper error class and error type. ### Why are the chang

[GitHub] [spark] HyukjinKwon closed pull request #40943: [SPARK-43280][BUILD] Reimplement the protobuf breaking change checker

2023-04-26 Thread via GitHub
HyukjinKwon closed pull request #40943: [SPARK-43280][BUILD] Reimplement the protobuf breaking change checker URL: https://github.com/apache/spark/pull/40943 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #40933: [SPARK-43263][BUILD] Upgrade `FasterXML jackson` to 2.15.0

2023-04-26 Thread via GitHub
bjornjorgensen commented on code in PR #40933: URL: https://github.com/apache/spark/pull/40933#discussion_r1178200169 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala: ## @@ -175,7 +187,13 @@ private[sql] class JSONOptions( parameters.get

[GitHub] [spark] HyukjinKwon commented on pull request #40943: [SPARK-43280][BUILD] Reimplement the protobuf breaking change checker

2023-04-26 Thread via GitHub
HyukjinKwon commented on PR #40943: URL: https://github.com/apache/spark/pull/40943#issuecomment-1523805967 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on pull request #40943: [SPARK-43280][BUILD] Reimplement the protobuf breaking change checker

2023-04-26 Thread via GitHub
HyukjinKwon commented on PR #40943: URL: https://github.com/apache/spark/pull/40943#issuecomment-1523805734 Thanks @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

  1   2   3   >