[GitHub] [spark] MaxGekk commented on a diff in pull request #41010: [SPARK-43336][SQL] Casting between Timestamp and TimestampNTZ requires timezone

2023-05-01 Thread via GitHub
MaxGekk commented on code in PR #41010: URL: https://github.com/apache/spark/pull/41010#discussion_r1182114437 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala: ## @@ -99,6 +99,18 @@ class CanonicalizeSuite extends SparkFunSuite {

[GitHub] [spark] HeartSaVioR closed pull request #41001: [SPARK-43328][SS] Add latest timestamp on no-execution trigger for Idle event in streaming query listener

2023-05-01 Thread via GitHub
HeartSaVioR closed pull request #41001: [SPARK-43328][SS] Add latest timestamp on no-execution trigger for Idle event in streaming query listener URL: https://github.com/apache/spark/pull/41001 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HeartSaVioR commented on pull request #41001: [SPARK-43328][SS] Add latest timestamp on no-execution trigger for Idle event in streaming query listener

2023-05-01 Thread via GitHub
HeartSaVioR commented on PR #41001: URL: https://github.com/apache/spark/pull/41001#issuecomment-1530918000 Thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] RunyaoChen commented on a diff in pull request #40989: [SPARK-43316][SQL] Add more CTE SQL tests

2023-05-01 Thread via GitHub
RunyaoChen commented on code in PR #40989: URL: https://github.com/apache/spark/pull/40989#discussion_r1182106933 ## sql/core/src/test/resources/sql-tests/inputs/cte.sql: ## @@ -53,6 +53,347 @@ SELECT * FROM t; WITH t AS (SELECT 1 FROM non_existing_table) SELECT 2; +-- The

[GitHub] [spark] MaxGekk commented on a diff in pull request #40955: [SPARK-42843][SQL] Update the error class _LEGACY_ERROR_TEMP_2007 to REGEX_GROUP_INDEX_EXCEED_REGEX_GROUP_COUNT

2023-05-01 Thread via GitHub
MaxGekk commented on code in PR #40955: URL: https://github.com/apache/spark/pull/40955#discussion_r1182106079 ## core/src/main/resources/error/error-classes.json: ## @@ -1009,6 +1009,11 @@ "." ] }, + "REGEX_GROUP_INDEX_EXCEED_REGEX_GROUP_COUNT"

[GitHub] [spark] maytasm commented on pull request #36226: [SPARK-38924][UI] Update datatables to 1.10.25

2023-05-01 Thread via GitHub
maytasm commented on PR #36226: URL: https://github.com/apache/spark/pull/36226#issuecomment-1530899374 I think we can backport [SPARK-42435](https://issues.apache.org/jira/browse/SPARK-42435) to 3.3 and 3.4? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] maytasm commented on pull request #36226: [SPARK-38924][UI] Update datatables to 1.10.25

2023-05-01 Thread via GitHub
maytasm commented on PR #36226: URL: https://github.com/apache/spark/pull/36226#issuecomment-1530899078 @srowen I verified that upgrading DataTables to 1.13.2 fixes the issue with missing asc/desc arrows. See image:

[GitHub] [spark] gengliangwang opened a new pull request, #41010: [SPARK-43336][SQL] Casting between Timestamp and TimestampNTZ requires timezone

2023-05-01 Thread via GitHub
gengliangwang opened a new pull request, #41010: URL: https://github.com/apache/spark/pull/41010 ### What changes were proposed in this pull request? Casting between Timestamp and TimestampNTZ requires a timezone since the timezone id is used in the evaluation. This PR

[GitHub] [spark] grundprinzip commented on pull request #40993: Make it possible to extend `ChannelBuilder` for `SparkConnectClient`

2023-05-01 Thread via GitHub
grundprinzip commented on PR #40993: URL: https://github.com/apache/spark/pull/40993#issuecomment-1530831141 You need it you want to pass specialized authentication handlers for GRPC in the form of credentials plugins. In addition it allows you to configure the GRPC channel with additional

[GitHub] [spark] RyanBerti commented on pull request #40615: [SPARK-16484][SQL] Add support for Datasketches HllSketch

2023-05-01 Thread via GitHub
RyanBerti commented on PR #40615: URL: https://github.com/apache/spark/pull/40615#issuecomment-1530822680 @mkaravel @dtenedor Finally got all the tests passing, thanks for all your help! Think I covered all of the most recent review comments, let me know if you need anything else from me

[GitHub] [spark] rangadi commented on a diff in pull request #40861: [SPARK-43032][CONNECT][SS] Add Streaming query manager

2023-05-01 Thread via GitHub
rangadi commented on code in PR #40861: URL: https://github.com/apache/spark/pull/40861#discussion_r1182043858 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -238,15 +239,21 @@ message StreamingQueryInstanceId { string run_id = 2; }

[GitHub] [spark] srowen commented on pull request #36226: [SPARK-38924][UI] Update datatables to 1.10.25

2023-05-01 Thread via GitHub
srowen commented on PR #36226: URL: https://github.com/apache/spark/pull/36226#issuecomment-1530791323 Ooh, OK. do you know of a clean fix for the older branches to restore the arrows? If 1.13.2 works again, then we could also consider back-porting that update too -- This is an

[GitHub] [spark] maytasm commented on pull request #36226: [SPARK-38924][UI] Update datatables to 1.10.25

2023-05-01 Thread via GitHub
maytasm commented on PR #36226: URL: https://github.com/apache/spark/pull/36226#issuecomment-1530789474 @gengliangwang @srowen looks like this change broke the sorting arrow on the table. The sort_asc.png and sort_ desc.png is not displayed in the table. I believe this is because the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40896: [SPARK-43229][ML][PYTHON][CONNECT] Introduce Barrier Python UDF

2023-05-01 Thread via GitHub
zhengruifeng commented on code in PR #40896: URL: https://github.com/apache/spark/pull/40896#discussion_r1182025735 ## python/pyspark/sql/udf.py: ## @@ -249,6 +259,38 @@ def __init__( self.evalType = evalType self.deterministic = deterministic +#

[GitHub] [spark] HyukjinKwon commented on pull request #40907: [SPARK-43270][PYTHON] Implement `__dir__()` in `pyspark.sql.dataframe.DataFrame` to include columns

2023-05-01 Thread via GitHub
HyukjinKwon commented on PR #40907: URL: https://github.com/apache/spark/pull/40907#issuecomment-1530768477 Made a followup to implement this in Spark Connect: https://github.com/apache/spark/pull/41009 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] zhengruifeng commented on pull request #40976: [SPARK-43307][PYTHON] Migrate PandasUDF value errors into error class

2023-05-01 Thread via GitHub
zhengruifeng commented on PR #40976: URL: https://github.com/apache/spark/pull/40976#issuecomment-1530768350 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon opened a new pull request, #41009: [SPARK-43270][PYTHON][CONNECT][FOLLOW-UP] Implement `__dir__` in PySpark Connect DataFrame

2023-05-01 Thread via GitHub
HyukjinKwon opened a new pull request, #41009: URL: https://github.com/apache/spark/pull/41009 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/40907 that implements `__dir__` in PySpark Connect DataFrame. ###

[GitHub] [spark] zhengruifeng closed pull request #40976: [SPARK-43307][PYTHON] Migrate PandasUDF value errors into error class

2023-05-01 Thread via GitHub
zhengruifeng closed pull request #40976: [SPARK-43307][PYTHON] Migrate PandasUDF value errors into error class URL: https://github.com/apache/spark/pull/40976 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] zhengruifeng commented on pull request #40985: [SPARK-43314][CONNECT][PYTHON] Migrate Spark Connect client errors into error class

2023-05-01 Thread via GitHub
zhengruifeng commented on PR #40985: URL: https://github.com/apache/spark/pull/40985#issuecomment-1530767027 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #40985: [SPARK-43314][CONNECT][PYTHON] Migrate Spark Connect client errors into error class

2023-05-01 Thread via GitHub
zhengruifeng closed pull request #40985: [SPARK-43314][CONNECT][PYTHON] Migrate Spark Connect client errors into error class URL: https://github.com/apache/spark/pull/40985 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] zhengruifeng commented on pull request #40973: [SPARK-43304][CONNECT][PYTHON] Migrate `NotImplementedError` into `PySparkNotImplementedError`

2023-05-01 Thread via GitHub
zhengruifeng commented on PR #40973: URL: https://github.com/apache/spark/pull/40973#issuecomment-1530765993 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #40973: [SPARK-43304][CONNECT][PYTHON] Migrate `NotImplementedError` into `PySparkNotImplementedError`

2023-05-01 Thread via GitHub
zhengruifeng closed pull request #40973: [SPARK-43304][CONNECT][PYTHON] Migrate `NotImplementedError` into `PySparkNotImplementedError` URL: https://github.com/apache/spark/pull/40973 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #41008: [SPARK-432265][CORE][FOLLOW-UP] Add Mima excludes of ErrorInfo and ErrorSubInfo for Scala 2.13

2023-05-01 Thread via GitHub
HyukjinKwon commented on PR #41008: URL: https://github.com/apache/spark/pull/41008#issuecomment-1530758059 cc @amaliujia @cloud-fan @hvanhovell FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon opened a new pull request, #41008: [SPARK-432265][CORE][FOLLOW-UP] Add Mima excludes of ErrorInfo and ErrorSubInfo for Scala 2.13

2023-05-01 Thread via GitHub
HyukjinKwon opened a new pull request, #41008: URL: https://github.com/apache/spark/pull/41008 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/40931 that excludes `org.apache.spark.ErrorInfo$` and

[GitHub] [spark] cloud-fan commented on a diff in pull request #40989: [SPARK-43316][SQL] Add more CTE SQL tests

2023-05-01 Thread via GitHub
cloud-fan commented on code in PR #40989: URL: https://github.com/apache/spark/pull/40989#discussion_r1182000729 ## sql/core/src/test/resources/sql-tests/inputs/cte.sql: ## @@ -53,6 +53,347 @@ SELECT * FROM t; WITH t AS (SELECT 1 FROM non_existing_table) SELECT 2; +-- The

[GitHub] [spark] cloud-fan commented on pull request #40947: [Spark-43284] Switch back to url-encoded strings

2023-05-01 Thread via GitHub
cloud-fan commented on PR #40947: URL: https://github.com/apache/spark/pull/40947#issuecomment-1530732071 `FileMetadataStructSuite.metadata struct (json): read partial/all metadata struct fields` fails, @databricks-david-lewis -- This is an automated message from the Apache Git Service.

[GitHub] [spark] HyukjinKwon closed pull request #40827: [SPARK-42585][CONNECT] Streaming of local relations

2023-05-01 Thread via GitHub
HyukjinKwon closed pull request #40827: [SPARK-42585][CONNECT] Streaming of local relations URL: https://github.com/apache/spark/pull/40827 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #40827: [SPARK-42585][CONNECT] Streaming of local relations

2023-05-01 Thread via GitHub
HyukjinKwon commented on PR #40827: URL: https://github.com/apache/spark/pull/40827#issuecomment-1530726252 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhenlineo commented on a diff in pull request #40997: [SPARK-43321][Connect] Dataset#Joinwith

2023-05-01 Thread via GitHub
zhenlineo commented on code in PR #40997: URL: https://github.com/apache/spark/pull/40997#discussion_r1181995277 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -837,6 +837,76 @@ class Dataset[T] private[sql] ( } } + /** + *

[GitHub] [spark] zhenlineo commented on a diff in pull request #40997: [SPARK-43321][Connect] Dataset#Joinwith

2023-05-01 Thread via GitHub
zhenlineo commented on code in PR #40997: URL: https://github.com/apache/spark/pull/40997#discussion_r1181995277 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -837,6 +837,76 @@ class Dataset[T] private[sql] ( } } + /** + *

[GitHub] [spark] zhenlineo commented on a diff in pull request #40997: [SPARK-43321][Connect] Dataset#Joinwith

2023-05-01 Thread via GitHub
zhenlineo commented on code in PR #40997: URL: https://github.com/apache/spark/pull/40997#discussion_r1181993197 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -837,6 +837,76 @@ class Dataset[T] private[sql] ( } } + /** + *

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #41001: [SPARK-43328][SS] Add latest timestamp on no-execution trigger for Idle event in streaming query listener

2023-05-01 Thread via GitHub
HeartSaVioR commented on code in PR #41001: URL: https://github.com/apache/spark/pull/41001#discussion_r1181993402 ## sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryListener.scala: ## @@ -140,7 +140,8 @@ object StreamingQueryListener { @Evolving

[GitHub] [spark] zhenlineo commented on a diff in pull request #40997: [SPARK-43321][Connect] Dataset#Joinwith

2023-05-01 Thread via GitHub
zhenlineo commented on code in PR #40997: URL: https://github.com/apache/spark/pull/40997#discussion_r1181993197 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -837,6 +837,76 @@ class Dataset[T] private[sql] ( } } + /** + *

[GitHub] [spark] dtenedor commented on a diff in pull request #40996: [SPARK-43313][SQL] Adding missing column DEFAULT values for MERGE INSERT actions

2023-05-01 Thread via GitHub
dtenedor commented on code in PR #40996: URL: https://github.com/apache/spark/pull/40996#discussion_r1181991609 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -343,4 +363,11 @@ object ResolveDefaultColumns {

[GitHub] [spark] viirya commented on a diff in pull request #41001: [SPARK-43328][SS] Add latest timestamp on no-execution trigger for Idle event in streaming query listener

2023-05-01 Thread via GitHub
viirya commented on code in PR #41001: URL: https://github.com/apache/spark/pull/41001#discussion_r1181989326 ## sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryListener.scala: ## @@ -140,7 +140,8 @@ object StreamingQueryListener { @Evolving class

[GitHub] [spark] viirya commented on a diff in pull request #41001: [SPARK-43328][SS] Add latest timestamp on no-execution trigger for Idle event in streaming query listener

2023-05-01 Thread via GitHub
viirya commented on code in PR #41001: URL: https://github.com/apache/spark/pull/41001#discussion_r1181989173 ## sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryListener.scala: ## @@ -140,7 +140,8 @@ object StreamingQueryListener { @Evolving class

[GitHub] [spark] HyukjinKwon closed pull request #41004: [SPARK-43330][DOCS] FIX typo (StructsToJosn -> StructsToJson)

2023-05-01 Thread via GitHub
HyukjinKwon closed pull request #41004: [SPARK-43330][DOCS] FIX typo (StructsToJosn -> StructsToJson) URL: https://github.com/apache/spark/pull/41004 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #41004: [SPARK-43330][DOCS] FIX typo (StructsToJosn -> StructsToJson)

2023-05-01 Thread via GitHub
HyukjinKwon commented on PR #41004: URL: https://github.com/apache/spark/pull/41004#issuecomment-1530671123 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41005: [SPARK-43267][CONNECT] Add Spark Connect SparkSession.interruptAll

2023-05-01 Thread via GitHub
HyukjinKwon commented on code in PR #41005: URL: https://github.com/apache/spark/pull/41005#discussion_r1181987162 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutePlanHolder.scala: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache

[GitHub] [spark] amaliujia commented on a diff in pull request #41005: [SPARK-43267][CONNECT] Add Spark Connect SparkSession.interruptAll

2023-05-01 Thread via GitHub
amaliujia commented on code in PR #41005: URL: https://github.com/apache/spark/pull/41005#discussion_r1181985068 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutePlanHolder.scala: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] github-actions[bot] closed pull request #39115: [SPARK-41563][SQL] Support partition filter in MSCK REPAIR TABLE statement

2023-05-01 Thread via GitHub
github-actions[bot] closed pull request #39115: [SPARK-41563][SQL] Support partition filter in MSCK REPAIR TABLE statement URL: https://github.com/apache/spark/pull/39115 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41005: [SPARK-43267][CONNECT] Add Spark Connect SparkSession.interruptAll

2023-05-01 Thread via GitHub
HyukjinKwon commented on code in PR #41005: URL: https://github.com/apache/spark/pull/41005#discussion_r1181984323 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutePlanHolder.scala: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache

[GitHub] [spark] amaliujia commented on a diff in pull request #41005: [SPARK-43267][CONNECT] Add Spark Connect SparkSession.interruptAll

2023-05-01 Thread via GitHub
amaliujia commented on code in PR #41005: URL: https://github.com/apache/spark/pull/41005#discussion_r1181974647 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] amaliujia commented on a diff in pull request #41005: [SPARK-43267][CONNECT] Add Spark Connect SparkSession.interruptAll

2023-05-01 Thread via GitHub
amaliujia commented on code in PR #41005: URL: https://github.com/apache/spark/pull/41005#discussion_r1181974647 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] amaliujia commented on a diff in pull request #41005: [SPARK-43267][CONNECT] Add Spark Connect SparkSession.interruptAll

2023-05-01 Thread via GitHub
amaliujia commented on code in PR #41005: URL: https://github.com/apache/spark/pull/41005#discussion_r1181971691 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -525,6 +525,19 @@ class SparkSession private[sql] (

[GitHub] [spark] amaliujia commented on pull request #40993: Make it possible to extend `ChannelBuilder` for `SparkConnectClient`

2023-05-01 Thread via GitHub
amaliujia commented on PR #40993: URL: https://github.com/apache/spark/pull/40993#issuecomment-1530585825 Can I ask when you need a customized channel? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon closed pull request #40998: [SPARK-43323][SQL][PYTHON] Fix DataFrame.toPandas with Arrow enabled to handle exceptions properly

2023-05-01 Thread via GitHub
HyukjinKwon closed pull request #40998: [SPARK-43323][SQL][PYTHON] Fix DataFrame.toPandas with Arrow enabled to handle exceptions properly URL: https://github.com/apache/spark/pull/40998 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #40998: [SPARK-43323][SQL][PYTHON] Fix DataFrame.toPandas with Arrow enabled to handle exceptions properly

2023-05-01 Thread via GitHub
HyukjinKwon commented on PR #40998: URL: https://github.com/apache/spark/pull/40998#issuecomment-1530538764 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] WweiL commented on a diff in pull request #40861: [SPARK-43032][CONNECT][SS] Add Streaming query manager

2023-05-01 Thread via GitHub
WweiL commented on code in PR #40861: URL: https://github.com/apache/spark/pull/40861#discussion_r1181953595 ## python/pyspark/sql/tests/connect/streaming/test_parity_streaming.py: ## @@ -22,32 +22,7 @@ class StreamingParityTests(StreamingTestsMixin,

[GitHub] [spark] rangadi commented on a diff in pull request #40861: [SPARK-43032][CONNECT][SS] Add Streaming query manager

2023-05-01 Thread via GitHub
rangadi commented on code in PR #40861: URL: https://github.com/apache/spark/pull/40861#discussion_r1181952879 ## python/pyspark/sql/tests/connect/test_parity_pandas_grouped_map_with_state.py: ## @@ -25,10 +25,6 @@ class GroupedApplyInPandasWithStateTests(

[GitHub] [spark] WweiL commented on a diff in pull request #40861: [SPARK-43032][CONNECT][SS] Add Streaming query manager

2023-05-01 Thread via GitHub
WweiL commented on code in PR #40861: URL: https://github.com/apache/spark/pull/40861#discussion_r1181952577 ## python/pyspark/sql/tests/connect/test_parity_pandas_grouped_map_with_state.py: ## @@ -25,10 +25,6 @@ class GroupedApplyInPandasWithStateTests(

[GitHub] [spark] srielau opened a new pull request, #41007: [WIP][SPARK-43205] IDENTIFIER clause

2023-05-01 Thread via GitHub
srielau opened a new pull request, #41007: URL: https://github.com/apache/spark/pull/41007 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] rangadi commented on a diff in pull request #40861: [SPARK-43032][CONNECT][SS] Add Streaming query manager

2023-05-01 Thread via GitHub
rangadi commented on code in PR #40861: URL: https://github.com/apache/spark/pull/40861#discussion_r1181946893 ## python/pyspark/sql/tests/connect/test_parity_pandas_grouped_map_with_state.py: ## @@ -25,10 +25,6 @@ class GroupedApplyInPandasWithStateTests(

[GitHub] [spark] rangadi commented on a diff in pull request #40861: [SPARK-43032][CONNECT][SS] Add Streaming query manager

2023-05-01 Thread via GitHub
rangadi commented on code in PR #40861: URL: https://github.com/apache/spark/pull/40861#discussion_r1181944351 ## python/pyspark/sql/connect/client.py: ## @@ -967,9 +967,8 @@ def _execute_and_fetch_as_iterator( "streaming_query_command_result":

[GitHub] [spark] rangadi commented on a diff in pull request #40861: [SPARK-43032][CONNECT][SS] Add Streaming query manager

2023-05-01 Thread via GitHub
rangadi commented on code in PR #40861: URL: https://github.com/apache/spark/pull/40861#discussion_r1181943971 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -236,6 +237,9 @@ message StreamingQueryInstanceId { // will generate a unique

[GitHub] [spark] gengliangwang commented on a diff in pull request #40996: [SPARK-43313][SQL] Adding missing column DEFAULT values for MERGE INSERT actions

2023-05-01 Thread via GitHub
gengliangwang commented on code in PR #40996: URL: https://github.com/apache/spark/pull/40996#discussion_r1181938253 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -343,4 +363,11 @@ object ResolveDefaultColumns {

[GitHub] [spark] RyanBerti commented on a diff in pull request #40615: [SPARK-16484][SQL] Add support for Datasketches HllSketch

2023-05-01 Thread via GitHub
RyanBerti commented on code in PR #40615: URL: https://github.com/apache/spark/pull/40615#discussion_r1181937412 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -597,6 +597,103 @@ object functions { grouping_id((Seq(colName) ++ colNames).map(n =>

[GitHub] [spark] RyanBerti commented on a diff in pull request #40615: [SPARK-16484][SQL] Add support for Datasketches HllSketch

2023-05-01 Thread via GitHub
RyanBerti commented on code in PR #40615: URL: https://github.com/apache/spark/pull/40615#discussion_r1181937235 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala: ## @@ -0,0 +1,368 @@ +/* + * Licensed to the Apache

[GitHub] [spark] RyanBerti commented on a diff in pull request #40615: [SPARK-16484][SQL] Add support for Datasketches HllSketch

2023-05-01 Thread via GitHub
RyanBerti commented on code in PR #40615: URL: https://github.com/apache/spark/pull/40615#discussion_r1181937012 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala: ## @@ -0,0 +1,368 @@ +/* + * Licensed to the Apache

[GitHub] [spark] RyanBerti commented on a diff in pull request #40615: [SPARK-16484][SQL] Add support for Datasketches HllSketch

2023-05-01 Thread via GitHub
RyanBerti commented on code in PR #40615: URL: https://github.com/apache/spark/pull/40615#discussion_r1181936821 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala: ## @@ -0,0 +1,368 @@ +/* + * Licensed to the Apache

[GitHub] [spark] RyanBerti commented on a diff in pull request #40615: [SPARK-16484][SQL] Add support for Datasketches HllSketch

2023-05-01 Thread via GitHub
RyanBerti commented on code in PR #40615: URL: https://github.com/apache/spark/pull/40615#discussion_r1181936673 ## python/pyspark/sql/functions.py: ## @@ -10113,6 +10113,157 @@ def unwrap_udt(col: "ColumnOrName") -> Column: return _invoke_function("unwrap_udt",

[GitHub] [spark] RyanBerti commented on a diff in pull request #40615: [SPARK-16484][SQL] Add support for Datasketches HllSketch

2023-05-01 Thread via GitHub
RyanBerti commented on code in PR #40615: URL: https://github.com/apache/spark/pull/40615#discussion_r1181936057 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala: ## @@ -0,0 +1,368 @@ +/* + * Licensed to the Apache

[GitHub] [spark] RyanBerti commented on a diff in pull request #40615: [SPARK-16484][SQL] Add support for Datasketches HllSketch

2023-05-01 Thread via GitHub
RyanBerti commented on code in PR #40615: URL: https://github.com/apache/spark/pull/40615#discussion_r1181935926 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala: ## @@ -0,0 +1,368 @@ +/* + * Licensed to the Apache

[GitHub] [spark] RyanBerti commented on a diff in pull request #40615: [SPARK-16484][SQL] Add support for Datasketches HllSketch

2023-05-01 Thread via GitHub
RyanBerti commented on code in PR #40615: URL: https://github.com/apache/spark/pull/40615#discussion_r1181935592 ## sql/core/src/test/resources/sql-functions/sql-expression-schema.md: ## @@ -422,4 +422,4 @@ | org.apache.spark.sql.catalyst.expressions.xml.XPathList | xpath |

[GitHub] [spark] dtenedor commented on a diff in pull request #40996: [SPARK-43313][SQL] Adding missing column DEFAULT values for MERGE INSERT actions

2023-05-01 Thread via GitHub
dtenedor commented on code in PR #40996: URL: https://github.com/apache/spark/pull/40996#discussion_r1181933218 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -343,4 +363,11 @@ object ResolveDefaultColumns {

[GitHub] [spark] gengliangwang commented on a diff in pull request #40996: [SPARK-43313][SQL] Adding missing column DEFAULT values for MERGE INSERT actions

2023-05-01 Thread via GitHub
gengliangwang commented on code in PR #40996: URL: https://github.com/apache/spark/pull/40996#discussion_r1181930939 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -343,4 +363,11 @@ object ResolveDefaultColumns {

[GitHub] [spark] dongjoon-hyun commented on pull request #41006: [SPARK-43206][SS][CONNECT][FOLLOWUP] Remove unintended change on `StreamingQueryManager.scala`

2023-05-01 Thread via GitHub
dongjoon-hyun commented on PR #41006: URL: https://github.com/apache/spark/pull/41006#issuecomment-1530405401 Merged to `master`. Thank you for the quick fix, @WweiL , @hvanhovell , @amaliujia . -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dongjoon-hyun closed pull request #41006: [SPARK-43206][SS][CONNECT][FOLLOWUP] Remove unintended change on `StreamingQueryManager.scala`

2023-05-01 Thread via GitHub
dongjoon-hyun closed pull request #41006: [SPARK-43206][SS][CONNECT][FOLLOWUP] Remove unintended change on `StreamingQueryManager.scala` URL: https://github.com/apache/spark/pull/41006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dtenedor commented on a diff in pull request #40996: [SPARK-43313][SQL] Adding missing column DEFAULT values for MERGE INSERT actions

2023-05-01 Thread via GitHub
dtenedor commented on code in PR #40996: URL: https://github.com/apache/spark/pull/40996#discussion_r1181927923 ## sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala: ## @@ -1044,7 +1044,7 @@ class InsertSuite extends DataSourceTest with SharedSparkSession

[GitHub] [spark] dongjoon-hyun commented on pull request #39280: [SPARK-41766][CORE] Handle decommission request sent before executor registration

2023-05-01 Thread via GitHub
dongjoon-hyun commented on PR #39280: URL: https://github.com/apache/spark/pull/39280#issuecomment-1530392797 Thank you, @warrenzhu25 , @mridulm , @Ngone51 . Merged to master for Apache Spark 3.5.0. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dongjoon-hyun closed pull request #39280: [SPARK-41766][CORE] Handle decommission request sent before executor registration

2023-05-01 Thread via GitHub
dongjoon-hyun closed pull request #39280: [SPARK-41766][CORE] Handle decommission request sent before executor registration URL: https://github.com/apache/spark/pull/39280 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun commented on pull request #39280: [SPARK-41766][CORE] Handle decommission request sent before executor registration

2023-05-01 Thread via GitHub
dongjoon-hyun commented on PR #39280: URL: https://github.com/apache/spark/pull/39280#issuecomment-1530390092 Sorry, @warrenzhu25 . I was on vacation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] warrenzhu25 commented on pull request #39280: [SPARK-41766][CORE] Handle decommission request sent before executor registration

2023-05-01 Thread via GitHub
warrenzhu25 commented on PR #39280: URL: https://github.com/apache/spark/pull/39280#issuecomment-1530386238 @mridulm Could you help merge this. It seems no response from @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] WweiL commented on a diff in pull request #40861: [SPARK-43032][CONNECT][SS] Add Streaming query manager

2023-05-01 Thread via GitHub
WweiL commented on code in PR #40861: URL: https://github.com/apache/spark/pull/40861#discussion_r1181923751 ## python/pyspark/sql/connect/client.py: ## @@ -967,9 +967,8 @@ def _execute_and_fetch_as_iterator( "streaming_query_command_result":

[GitHub] [spark] WweiL commented on a diff in pull request #40861: [SPARK-43032][CONNECT][SS] Add Streaming query manager

2023-05-01 Thread via GitHub
WweiL commented on code in PR #40861: URL: https://github.com/apache/spark/pull/40861#discussion_r1181923751 ## python/pyspark/sql/connect/client.py: ## @@ -967,9 +967,8 @@ def _execute_and_fetch_as_iterator( "streaming_query_command_result":

[GitHub] [spark] WweiL commented on a diff in pull request #40861: [SPARK-43032][CONNECT][SS] Add Streaming query manager

2023-05-01 Thread via GitHub
WweiL commented on code in PR #40861: URL: https://github.com/apache/spark/pull/40861#discussion_r1181923751 ## python/pyspark/sql/connect/client.py: ## @@ -967,9 +967,8 @@ def _execute_and_fetch_as_iterator( "streaming_query_command_result":

[GitHub] [spark] gengliangwang commented on a diff in pull request #40996: [SPARK-43313][SQL] Adding missing column DEFAULT values for MERGE INSERT actions

2023-05-01 Thread via GitHub
gengliangwang commented on code in PR #40996: URL: https://github.com/apache/spark/pull/40996#discussion_r1181919920 ## sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala: ## @@ -1044,7 +1044,7 @@ class InsertSuite extends DataSourceTest with

[GitHub] [spark] WweiL commented on a diff in pull request #40861: [SPARK-43032][CONNECT][SS] Add Streaming query manager

2023-05-01 Thread via GitHub
WweiL commented on code in PR #40861: URL: https://github.com/apache/spark/pull/40861#discussion_r1181915163 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -236,6 +237,9 @@ message StreamingQueryInstanceId { // will generate a unique

[GitHub] [spark] rangadi commented on a diff in pull request #40861: [SPARK-43032][CONNECT][SS] Add Streaming query manager

2023-05-01 Thread via GitHub
rangadi commented on code in PR #40861: URL: https://github.com/apache/spark/pull/40861#discussion_r1181911657 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -321,6 +324,50 @@ message StreamingQueryCommandResult { } } +// Commands for

[GitHub] [spark] WweiL commented on a diff in pull request #40861: [SPARK-43032][CONNECT][SS] Add Streaming query manager

2023-05-01 Thread via GitHub
WweiL commented on code in PR #40861: URL: https://github.com/apache/spark/pull/40861#discussion_r1181910654 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -321,6 +324,50 @@ message StreamingQueryCommandResult { } } +// Commands for the

[GitHub] [spark] WweiL commented on a diff in pull request #40861: [SPARK-43032][CONNECT][SS] Add Streaming query manager

2023-05-01 Thread via GitHub
WweiL commented on code in PR #40861: URL: https://github.com/apache/spark/pull/40861#discussion_r1181910654 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -321,6 +324,50 @@ message StreamingQueryCommandResult { } } +// Commands for the

[GitHub] [spark] amaliujia commented on pull request #41006: [SPARK-43206][SS][CONNECT][MINOR][FOLLOWUP] Fix for SPARK-43206

2023-05-01 Thread via GitHub
amaliujia commented on PR #41006: URL: https://github.com/apache/spark/pull/41006#issuecomment-1530323184 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] xinrong-meng commented on a diff in pull request #40988: [SPARK-41971][SQL][PYTHON] Add a config for pandas conversion how to handle struct types

2023-05-01 Thread via GitHub
xinrong-meng commented on code in PR #40988: URL: https://github.com/apache/spark/pull/40988#discussion_r1181899926 ## python/pyspark/sql/pandas/types.py: ## @@ -462,3 +467,233 @@ def _convert_dict_to_map_items(s: "PandasSeriesLike") -> "PandasSeriesLike": :return:

[GitHub] [spark] WweiL opened a new pull request, #41006: [SPARK-43206][SS][CONNECT][MINOR][FOLLOWUP] Fix for SPARK-43206

2023-05-01 Thread via GitHub
WweiL opened a new pull request, #41006: URL: https://github.com/apache/spark/pull/41006 ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/40966 introduced a unneeded change in `StreamingQueryManager` by error. This fix removes it.

[GitHub] [spark] WweiL commented on a diff in pull request #40966: [SPARK-43206] [SS] [CONNECT] StreamingQuery exception() include stack trace

2023-05-01 Thread via GitHub
WweiL commented on code in PR #40966: URL: https://github.com/apache/spark/pull/40966#discussion_r1181879069 ## sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala: ## @@ -408,15 +408,15 @@ class StreamingQueryManager private[sql] (

[GitHub] [spark] rangadi commented on a diff in pull request #40966: [SPARK-43206] [SS] [CONNECT] StreamingQuery exception() include stack trace

2023-05-01 Thread via GitHub
rangadi commented on code in PR #40966: URL: https://github.com/apache/spark/pull/40966#discussion_r1181878060 ## sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala: ## @@ -408,15 +408,15 @@ class StreamingQueryManager private[sql] (

[GitHub] [spark] hvanhovell commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-01 Thread via GitHub
hvanhovell commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1181870990 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala: ## @@ -37,15 +37,15 @@ import org.apache.spark.connect.proto

[GitHub] [spark] hvanhovell commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

2023-05-01 Thread via GitHub
hvanhovell commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1181870460 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -1271,10 +1268,35 @@ class Dataset[T] private[sql] ( val colNames:

[GitHub] [spark] hvanhovell commented on a diff in pull request #40997: [SPARK-43321][Connect] Dataset#Joinwith

2023-05-01 Thread via GitHub
hvanhovell commented on code in PR #40997: URL: https://github.com/apache/spark/pull/40997#discussion_r1181864909 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -837,6 +837,76 @@ class Dataset[T] private[sql] ( } } + /** +

[GitHub] [spark] hvanhovell commented on a diff in pull request #40997: [SPARK-43321][Connect] Dataset#Joinwith

2023-05-01 Thread via GitHub
hvanhovell commented on code in PR #40997: URL: https://github.com/apache/spark/pull/40997#discussion_r1181854341 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -582,12 +582,12 @@ class Dataset[T] private[sql] (

[GitHub] [spark] hvanhovell commented on a diff in pull request #40997: [SPARK-43321][Connect] Dataset#Joinwith

2023-05-01 Thread via GitHub
hvanhovell commented on code in PR #40997: URL: https://github.com/apache/spark/pull/40997#discussion_r1181852165 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -837,6 +837,76 @@ class Dataset[T] private[sql] ( } } + /** +

[GitHub] [spark] hvanhovell commented on a diff in pull request #41005: [SPARK-43267][CONNECT] Add Spark Connect SparkSession.interruptAll

2023-05-01 Thread via GitHub
hvanhovell commented on code in PR #41005: URL: https://github.com/apache/spark/pull/41005#discussion_r1181843398 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -47,18 +47,29 @@ class

[GitHub] [spark] grundprinzip commented on pull request #40993: Make it possible to extend `ChannelBuilder` for `SparkConnectClient`

2023-05-01 Thread via GitHub
grundprinzip commented on PR #40993: URL: https://github.com/apache/spark/pull/40993#issuecomment-1530130870 Please update the title of the PR to `[SPARK-43332][CONNECT][PYTHON] Make it possible to extend ChannelBuilder for SparkConnectClient` -- This is an automated message from

[GitHub] [spark] sunchao commented on pull request #40995: [SPARK-43320][SQL][HIVE] Directly call Hive 2.3.9 API

2023-05-01 Thread via GitHub
sunchao commented on PR #40995: URL: https://github.com/apache/spark/pull/40995#issuecomment-1530130460 Late LGTM. Was trying to +1 and merge this but @srowen beat me on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] hvanhovell commented on a diff in pull request #41005: [SPARK-43267][CONNECT] Add Spark Connect SparkSession.interruptAll

2023-05-01 Thread via GitHub
hvanhovell commented on code in PR #41005: URL: https://github.com/apache/spark/pull/41005#discussion_r1181839864 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] WweiL commented on pull request #40861: [SPARK-43032][CONNECT][SS] Add Streaming query manager

2023-05-01 Thread via GitHub
WweiL commented on PR #40861: URL: https://github.com/apache/spark/pull/40861#issuecomment-1530118252 PTAL! @amaliujia @HyukjinKwon Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] rangadi commented on a diff in pull request #40861: [SPARK-43032][CONNECT][SS] Add Streaming query manager

2023-05-01 Thread via GitHub
rangadi commented on code in PR #40861: URL: https://github.com/apache/spark/pull/40861#discussion_r1181810503 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -236,6 +237,9 @@ message StreamingQueryInstanceId { // will generate a unique

[GitHub] [spark] HeartSaVioR closed pull request #40981: [SPARK-43311][SS] Add RocksDB state store provider memory management enhancements

2023-05-01 Thread via GitHub
HeartSaVioR closed pull request #40981: [SPARK-43311][SS] Add RocksDB state store provider memory management enhancements URL: https://github.com/apache/spark/pull/40981 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HeartSaVioR commented on pull request #40981: [SPARK-43311][SS] Add RocksDB state store provider memory management enhancements

2023-05-01 Thread via GitHub
HeartSaVioR commented on PR #40981: URL: https://github.com/apache/spark/pull/40981#issuecomment-1530068162 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] juliuszsompolski commented on pull request #41005: [SPARK-43267][CONNECT] Add Spark Connect SparkSession.interruptAll

2023-05-01 Thread via GitHub
juliuszsompolski commented on PR #41005: URL: https://github.com/apache/spark/pull/41005#issuecomment-1530062704 cc @hvanhovell @HyukjinKwon @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

  1   2   >