[GitHub] [spark] yaooqinn commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

2023-03-29 Thread via GitHub
yaooqinn commented on code in PR #40437: URL: https://github.com/apache/spark/pull/40437#discussion_r1152809362 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/KeepCommandOutputWithHive.scala: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundati

[GitHub] [spark] wangyum commented on a diff in pull request #40601: [SPARK-42975][SQL] Cast result type to timestamp type for string +/- interval

2023-03-29 Thread via GitHub
wangyum commented on code in PR #40601: URL: https://github.com/apache/spark/pull/40601#discussion_r1152803208 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -424,6 +428,8 @@ class Analyzer(override val catalogManager: CatalogManager)

[GitHub] [spark] yaooqinn commented on pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

2023-03-29 Thread via GitHub
yaooqinn commented on PR #40437: URL: https://github.com/apache/spark/pull/40437#issuecomment-1489781086 I am not sure why we must keep consistent with Hive for such a case, 1. this is just output from the command line interface, not a programming API. 2. the `hive` CLI itself is a

[GitHub] [spark] itholic commented on pull request #40525: [SPARK-42859][CONNECT][PS] Basic support for pandas API on Spark Connect

2023-03-29 Thread via GitHub
itholic commented on PR #40525: URL: https://github.com/apache/spark/pull/40525#issuecomment-1489771881 CI passed. cc @HyukjinKwon @ueshin @xinrong-meng @zhengruifeng PTAL when you find some time. I summarized key changes into PR description for review. -- This is an automated mess

[GitHub] [spark] zsxwing commented on a diff in pull request #40561: [SPARK-42931][SS] Introduce dropDuplicatesWithinWatermark

2023-03-29 Thread via GitHub
zsxwing commented on code in PR #40561: URL: https://github.com/apache/spark/pull/40561#discussion_r1152772092 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala: ## @@ -980,3 +1022,65 @@ object StreamingDeduplicateExec { private val E

[GitHub] [spark] wangyum opened a new pull request, #40601: [SPARK-42975][SQL] Cast result type to timestamp type for string +/- interval

2023-03-29 Thread via GitHub
wangyum opened a new pull request, #40601: URL: https://github.com/apache/spark/pull/40601 ### What changes were proposed in this pull request? This PR makes it cast the result type of string +/- interval to timestamp type instead of string type. ### Why are the changes needed?

[GitHub] [spark] LuciferYang commented on pull request #40598: [SPARK-42974][CORE] Restore `Utils#createTempDir` use `ShutdownHookManager#registerShutdownDeleteDir` to cleanup tempDir

2023-03-29 Thread via GitHub
LuciferYang commented on PR #40598: URL: https://github.com/apache/spark/pull/40598#issuecomment-1489742010 cc @sadikovi @srowen @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on pull request #40597: [SPARK-42971][CORE] Change to print `workdir` if `appDirs` is null when worker handle `WorkDirCleanup` event

2023-03-29 Thread via GitHub
LuciferYang commented on PR #40597: URL: https://github.com/apache/spark/pull/40597#issuecomment-1489741560 Thanks @HyukjinKwon @sadikovi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] LuciferYang commented on a diff in pull request #36529: [SPARK-39102][CORE][SQL][DSTREAM] Add checkstyle rules to disabled use of Guava's `Files.createTempDir()`

2023-03-29 Thread via GitHub
LuciferYang commented on code in PR #36529: URL: https://github.com/apache/spark/pull/36529#discussion_r1152742646 ## common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java: ## @@ -362,6 +364,60 @@ public static byte[] bufferToArray(ByteBuffer buffer) {

[GitHub] [spark] HyukjinKwon closed pull request #40599: [SPARK-42907][TESTS][FOLLOWUP] Avro functions doctest cleanup

2023-03-29 Thread via GitHub
HyukjinKwon closed pull request #40599: [SPARK-42907][TESTS][FOLLOWUP] Avro functions doctest cleanup URL: https://github.com/apache/spark/pull/40599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] anishshri-db commented on pull request #40600: [SPARK-42968][SS] Add option to skip commit coordinator as part of StreamingWrite API for DSv2 sources/sinks

2023-03-29 Thread via GitHub
anishshri-db commented on PR #40600: URL: https://github.com/apache/spark/pull/40600#issuecomment-1489721186 @HeartSaVioR - PTAL when you get a chance. Thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] HyukjinKwon commented on pull request #40599: [SPARK-42907][TESTS][FOLLOWUP] Avro functions doctest cleanup

2023-03-29 Thread via GitHub
HyukjinKwon commented on PR #40599: URL: https://github.com/apache/spark/pull/40599#issuecomment-1489721124 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] anishshri-db opened a new pull request, #40600: [SPARK-42968][SS] Add option to skip commit coordinator as part of StreamingWrite API for DSv2 sources/sinks

2023-03-29 Thread via GitHub
anishshri-db opened a new pull request, #40600: URL: https://github.com/apache/spark/pull/40600 ### What changes were proposed in this pull request? Add option to skip commit coordinator as part of StreamingWrite API for DSv2 sources/sinks. This option was already present as part of the B

[GitHub] [spark] LuciferYang commented on a diff in pull request #36529: [SPARK-39102][CORE][SQL][DSTREAM] Add checkstyle rules to disabled use of Guava's `Files.createTempDir()`

2023-03-29 Thread via GitHub
LuciferYang commented on code in PR #36529: URL: https://github.com/apache/spark/pull/36529#discussion_r1152742646 ## common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java: ## @@ -362,6 +364,60 @@ public static byte[] bufferToArray(ByteBuffer buffer) {

[GitHub] [spark] zhengruifeng opened a new pull request, #40599: [SPARK-42907][TESTS][FOLLOWUP] Avro functions doctest cleanup

2023-03-29 Thread via GitHub
zhengruifeng opened a new pull request, #40599: URL: https://github.com/apache/spark/pull/40599 ### What changes were proposed in this pull request? Avro functions doctest cleanup, remove unused `print` ### Why are the changes needed? those lines were just to investigate the logs

[GitHub] [spark] HyukjinKwon closed pull request #40597: [SPARK-42971][CORE] Change to print `workdir` if `appDirs` is null when worker handle `WorkDirCleanup` event

2023-03-29 Thread via GitHub
HyukjinKwon closed pull request #40597: [SPARK-42971][CORE] Change to print `workdir` if `appDirs` is null when worker handle `WorkDirCleanup` event URL: https://github.com/apache/spark/pull/40597 -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [spark] HyukjinKwon commented on pull request #40597: [SPARK-42971][CORE] Change to print `workdir` if `appDirs` is null when worker handle `WorkDirCleanup` event

2023-03-29 Thread via GitHub
HyukjinKwon commented on PR #40597: URL: https://github.com/apache/spark/pull/40597#issuecomment-1489687927 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] LuciferYang opened a new pull request, #40598: [SPARK-42974][CORE] Restore `Utils#createTempDir` use `ShutdownHookManager#registerShutdownDeleteDir` to cleanup tempDir

2023-03-29 Thread via GitHub
LuciferYang opened a new pull request, #40598: URL: https://github.com/apache/spark/pull/40598 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] itholic commented on pull request #39702: [SPARK-41487][SQL] Assign name to _LEGACY_ERROR_TEMP_1020

2023-03-29 Thread via GitHub
itholic commented on PR #39702: URL: https://github.com/apache/spark/pull/39702#issuecomment-1489668136 @MaxGekk Can you take a look when you find some time? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [spark] LuciferYang commented on a diff in pull request #36529: [SPARK-39102][CORE][SQL][DSTREAM] Add checkstyle rules to disabled use of Guava's `Files.createTempDir()`

2023-03-29 Thread via GitHub
LuciferYang commented on code in PR #36529: URL: https://github.com/apache/spark/pull/36529#discussion_r1152707836 ## common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java: ## @@ -362,6 +364,60 @@ public static byte[] bufferToArray(ByteBuffer buffer) {

[GitHub] [spark] LuciferYang commented on a diff in pull request #36529: [SPARK-39102][CORE][SQL][DSTREAM] Add checkstyle rules to disabled use of Guava's `Files.createTempDir()`

2023-03-29 Thread via GitHub
LuciferYang commented on code in PR #36529: URL: https://github.com/apache/spark/pull/36529#discussion_r1152702218 ## common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java: ## @@ -362,6 +364,60 @@ public static byte[] bufferToArray(ByteBuffer buffer) {

[GitHub] [spark] ulysses-you commented on a diff in pull request #40589: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE query stage optimizer

2023-03-29 Thread via GitHub
ulysses-you commented on code in PR #40589: URL: https://github.com/apache/spark/pull/40589#discussion_r1152689151 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRulesHolder.scala: ## @@ -26,5 +26,6 @@ import org.apache.spark.sql.execution.SparkPlan

[GitHub] [spark] sadikovi commented on a diff in pull request #36529: [SPARK-39102][CORE][SQL][DSTREAM] Add checkstyle rules to disabled use of Guava's `Files.createTempDir()`

2023-03-29 Thread via GitHub
sadikovi commented on code in PR #36529: URL: https://github.com/apache/spark/pull/36529#discussion_r1152687590 ## common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java: ## @@ -362,6 +364,60 @@ public static byte[] bufferToArray(ByteBuffer buffer) {

[GitHub] [spark] beliefer commented on pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-29 Thread via GitHub
beliefer commented on PR #40355: URL: https://github.com/apache/spark/pull/40355#issuecomment-1489622921 @hvanhovell Could you take a review ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on a diff in pull request #40563: [SPARK-41232][SPARK-41233][FOLLOWUP] Refactor `array_append` and `array_prepend` with `RuntimeReplaceable`

2023-03-29 Thread via GitHub
beliefer commented on code in PR #40563: URL: https://github.com/apache/spark/pull/40563#discussion_r1152684039 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala: ## @@ -1855,50 +1855,6 @@ class CollectionExpressionsSuite e

[GitHub] [spark] wangyum closed pull request #40294: [SPARK-40610][SQL] Support unwrap date type to string type

2023-03-29 Thread via GitHub
wangyum closed pull request #40294: [SPARK-40610][SQL] Support unwrap date type to string type URL: https://github.com/apache/spark/pull/40294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] wangyum commented on pull request #40294: [SPARK-40610][SQL] Support unwrap date type to string type

2023-03-29 Thread via GitHub
wangyum commented on PR #40294: URL: https://github.com/apache/spark/pull/40294#issuecomment-1489605259 Close it, because this change may have potential data issue. Users can `set spark.sql.legacy.typeCoercion.datetimeToString.enabled` to `true` to restore the old behavior. -- This i

[GitHub] [spark] LuciferYang commented on pull request #40597: [SPARK-42971][CORE] Change to print `workdir` if `appDirs` is null when worker handle `WorkDirCleanup` event

2023-03-29 Thread via GitHub
LuciferYang commented on PR #40597: URL: https://github.com/apache/spark/pull/40597#issuecomment-1489600591 cc @HyukjinKwon @sadikovi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40564: [SPARK-42519] [Test] [Connect] Add More WriteTo Tests In Spark Connect Client

2023-03-29 Thread via GitHub
Hisoka-X commented on code in PR #40564: URL: https://github.com/apache/spark/pull/40564#discussion_r1152663238 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/IntegrationTestUtils.scala: ## @@ -57,6 +57,12 @@ object IntegrationTestUtils {

[GitHub] [spark] LuciferYang opened a new pull request, #40597: [SPARK-42971][CORE] Change to print `workdir` if `appDirs` is null when worker handle `WorkDirCleanup` event

2023-03-29 Thread via GitHub
LuciferYang opened a new pull request, #40597: URL: https://github.com/apache/spark/pull/40597 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] LuciferYang commented on a diff in pull request #36677: [SPARK-39296][CORE][SQL] Replcace `Array.toString` with `Array.mkString`

2023-03-29 Thread via GitHub
LuciferYang commented on code in PR #36677: URL: https://github.com/apache/spark/pull/36677#discussion_r1152661979 ## core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala: ## @@ -516,7 +516,8 @@ private[deploy] class Worker( val cleanupFuture: concurrent.Futu

[GitHub] [spark] cloud-fan commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

2023-03-29 Thread via GitHub
cloud-fan commented on code in PR #40437: URL: https://github.com/apache/spark/pull/40437#discussion_r1152653649 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/KeepCommandOutputWithHive.scala: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundat

[GitHub] [spark] panbingkun opened a new pull request, #40596: [SPARK-42973][CONNECT][BUILD] Upgrade buf to v1.16.0

2023-03-29 Thread via GitHub
panbingkun opened a new pull request, #40596: URL: https://github.com/apache/spark/pull/40596 ### What changes were proposed in this pull request? The pr aims to upgrade buf from 1.15.1 to 1.16.0 ### Why are the changes needed? Release Notes: https://github.com/bufbuild/buf/relea

[GitHub] [spark] cloud-fan commented on a diff in pull request #40589: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE query stage optimizer

2023-03-29 Thread via GitHub
cloud-fan commented on code in PR #40589: URL: https://github.com/apache/spark/pull/40589#discussion_r1152649900 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRulesHolder.scala: ## @@ -26,5 +26,6 @@ import org.apache.spark.sql.execution.SparkPlan *

[GitHub] [spark] cloud-fan commented on a diff in pull request #40589: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE query stage optimizer

2023-03-29 Thread via GitHub
cloud-fan commented on code in PR #40589: URL: https://github.com/apache/spark/pull/40589#discussion_r1152649299 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRulesHolder.scala: ## @@ -26,5 +26,6 @@ import org.apache.spark.sql.execution.SparkPlan *

[GitHub] [spark] ueshin commented on pull request #40594: [SPARK-42970][CONNECT][PYTHON][TESTS] Reuse pyspark.sql.tests.test_arrow test cases

2023-03-29 Thread via GitHub
ueshin commented on PR #40594: URL: https://github.com/apache/spark/pull/40594#issuecomment-1489567987 #40595 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

[GitHub] [spark] ueshin opened a new pull request, #40595: [SPARK-42970][CONNECT][PYTHON][TESTS][3.4] Reuse pyspark.sql.tests.test_arrow test cases

2023-03-29 Thread via GitHub
ueshin opened a new pull request, #40595: URL: https://github.com/apache/spark/pull/40595 ### What changes were proposed in this pull request? Reuses `pyspark.sql.tests.test_arrow` test cases. ### Why are the changes needed? `test_arrow` is also helpful because it contain

[GitHub] [spark] beliefer commented on a diff in pull request #40563: [SPARK-41232][SPARK-41233][FOLLOWUP] Refactor `array_append` and `array_prepend` with `RuntimeReplaceable`

2023-03-29 Thread via GitHub
beliefer commented on code in PR #40563: URL: https://github.com/apache/spark/pull/40563#discussion_r1152648264 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -5056,128 +4950,45 @@ case class ArrayCompact(child: Express

[GitHub] [spark] cloud-fan commented on a diff in pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-29 Thread via GitHub
cloud-fan commented on code in PR #40116: URL: https://github.com/apache/spark/pull/40116#discussion_r1152647858 ## sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala: ## @@ -45,7 +45,7 @@ abstract class SQLImplicits extends LowPrioritySQLImplicits { } // Pr

[GitHub] [spark] beliefer commented on a diff in pull request #40563: [SPARK-41232][SPARK-41233][FOLLOWUP] Refactor `array_append` and `array_prepend` with `RuntimeReplaceable`

2023-03-29 Thread via GitHub
beliefer commented on code in PR #40563: URL: https://github.com/apache/spark/pull/40563#discussion_r1152647435 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -1400,120 +1400,24 @@ case class ArrayContains(left: Express

[GitHub] [spark] beliefer commented on pull request #40291: [WIP][SPARK-42578][CONNECT] Add JDBC to DataFrameWriter

2023-03-29 Thread via GitHub
beliefer commented on PR #40291: URL: https://github.com/apache/spark/pull/40291#issuecomment-1489556667 > Is that #40415? It is https://github.com/apache/spark/pull/40358 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] sadikovi commented on a diff in pull request #36677: [SPARK-39296][CORE][SQL] Replcace `Array.toString` with `Array.mkString`

2023-03-29 Thread via GitHub
sadikovi commented on code in PR #36677: URL: https://github.com/apache/spark/pull/36677#discussion_r1152636135 ## core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala: ## @@ -516,7 +516,8 @@ private[deploy] class Worker( val cleanupFuture: concurrent.Future[

[GitHub] [spark] HyukjinKwon commented on pull request #40594: [SPARK-42970][CONNECT][PYTHON][TESTS] Reuse pyspark.sql.tests.test_arrow test cases

2023-03-29 Thread via GitHub
HyukjinKwon commented on PR #40594: URL: https://github.com/apache/spark/pull/40594#issuecomment-1489522478 It has a conflict with branch-3.4. Would you mind creating a backport please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] LuciferYang commented on a diff in pull request #36677: [SPARK-39296][CORE][SQL] Replcace `Array.toString` with `Array.mkString`

2023-03-29 Thread via GitHub
LuciferYang commented on code in PR #36677: URL: https://github.com/apache/spark/pull/36677#discussion_r1152618683 ## core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala: ## @@ -516,7 +516,8 @@ private[deploy] class Worker( val cleanupFuture: concurrent.Futu

[GitHub] [spark] HyukjinKwon closed pull request #40594: [SPARK-42970][CONNECT][PYTHON][TESTS] Reuse pyspark.sql.tests.test_arrow test cases

2023-03-29 Thread via GitHub
HyukjinKwon closed pull request #40594: [SPARK-42970][CONNECT][PYTHON][TESTS] Reuse pyspark.sql.tests.test_arrow test cases URL: https://github.com/apache/spark/pull/40594 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [spark] HyukjinKwon commented on pull request #40594: [SPARK-42970][CONNECT][PYTHON][TESTS] Reuse pyspark.sql.tests.test_arrow test cases

2023-03-29 Thread via GitHub
HyukjinKwon commented on PR #40594: URL: https://github.com/apache/spark/pull/40594#issuecomment-1489520920 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] github-actions[bot] commented on pull request #39102: [SPARK-41555][SQL] Multi sparkSession should share single SQLAppStatusStore

2023-03-29 Thread via GitHub
github-actions[bot] commented on PR #39102: URL: https://github.com/apache/spark/pull/39102#issuecomment-1489512808 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] amaliujia commented on a diff in pull request #40581: [SPARK-42953][Connect] Typed filter, map, flatMap, mapPartitions

2023-03-29 Thread via GitHub
amaliujia commented on code in PR #40581: URL: https://github.com/apache/spark/pull/40581#discussion_r1152606377 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -482,27 +482,66 @@ class SparkConnectPlanner(val sess

[GitHub] [spark] srowen commented on a diff in pull request #36529: [SPARK-39102][CORE][SQL][DSTREAM] Add checkstyle rules to disabled use of Guava's `Files.createTempDir()`

2023-03-29 Thread via GitHub
srowen commented on code in PR #36529: URL: https://github.com/apache/spark/pull/36529#discussion_r1152590107 ## common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java: ## @@ -362,6 +364,60 @@ public static byte[] bufferToArray(ByteBuffer buffer) {

[GitHub] [spark] sadikovi commented on a diff in pull request #36529: [SPARK-39102][CORE][SQL][DSTREAM] Add checkstyle rules to disabled use of Guava's `Files.createTempDir()`

2023-03-29 Thread via GitHub
sadikovi commented on code in PR #36529: URL: https://github.com/apache/spark/pull/36529#discussion_r1152583895 ## common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java: ## @@ -362,6 +364,60 @@ public static byte[] bufferToArray(ByteBuffer buffer) {

[GitHub] [spark] sadikovi commented on a diff in pull request #36677: [SPARK-39296][CORE][SQL] Replcace `Array.toString` with `Array.mkString`

2023-03-29 Thread via GitHub
sadikovi commented on code in PR #36677: URL: https://github.com/apache/spark/pull/36677#discussion_r1152581161 ## core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala: ## @@ -516,7 +516,8 @@ private[deploy] class Worker( val cleanupFuture: concurrent.Future[

[GitHub] [spark] ueshin opened a new pull request, #40594: [SPARK-42970][CONNECT][PYTHON][TESTS] Reuse pyspark.sql.tests.test_arrow test cases

2023-03-29 Thread via GitHub
ueshin opened a new pull request, #40594: URL: https://github.com/apache/spark/pull/40594 ### What changes were proposed in this pull request? Reuses `pyspark.sql.tests.test_arrow` test cases. ### Why are the changes needed? `test_arrow` is also helpful because it contain

[GitHub] [spark] zhenlineo commented on a diff in pull request #40581: [SPARK-42953][Connect] Typed filter, map, flatMap, mapPartitions

2023-03-29 Thread via GitHub
zhenlineo commented on code in PR #40581: URL: https://github.com/apache/spark/pull/40581#discussion_r1152474793 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -482,27 +482,66 @@ class SparkConnectPlanner(val sess

[GitHub] [spark] zhenlineo commented on a diff in pull request #40581: [SPARK-42953][Connect] Typed filter, map, flatMap, mapPartitions

2023-03-29 Thread via GitHub
zhenlineo commented on code in PR #40581: URL: https://github.com/apache/spark/pull/40581#discussion_r1152469860 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/IntegrationTestUtils.scala: ## @@ -43,7 +45,25 @@ object IntegrationTestUtils

[GitHub] [spark] rangadi commented on a diff in pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-03-29 Thread via GitHub
rangadi commented on code in PR #40586: URL: https://github.com/apache/spark/pull/40586#discussion_r1152453603 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -177,3 +179,97 @@ message WriteOperationV2 { // (Optional) A condition for overwrit

[GitHub] [spark] rangadi commented on a diff in pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-03-29 Thread via GitHub
rangadi commented on code in PR #40586: URL: https://github.com/apache/spark/pull/40586#discussion_r1152452932 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -177,3 +179,97 @@ message WriteOperationV2 { // (Optional) A condition for overwrit

[GitHub] [spark] rangadi commented on a diff in pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-03-29 Thread via GitHub
rangadi commented on code in PR #40586: URL: https://github.com/apache/spark/pull/40586#discussion_r1152451343 ## python/pyspark/sql/connect/streaming/query.py: ## @@ -0,0 +1,173 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license a

[GitHub] [spark] amaliujia commented on a diff in pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-03-29 Thread via GitHub
amaliujia commented on code in PR #40586: URL: https://github.com/apache/spark/pull/40586#discussion_r1152441819 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -177,3 +179,97 @@ message WriteOperationV2 { // (Optional) A condition for overwr

[GitHub] [spark] rangadi commented on a diff in pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-03-29 Thread via GitHub
rangadi commented on code in PR #40586: URL: https://github.com/apache/spark/pull/40586#discussion_r1152436270 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -177,3 +179,97 @@ message WriteOperationV2 { // (Optional) A condition for overwrit

[GitHub] [spark] MaxGekk opened a new pull request, #40593: [WIP][SQL] Defined typed literal constructors as keywords

2023-03-29 Thread via GitHub
MaxGekk opened a new pull request, #40593: URL: https://github.com/apache/spark/pull/40593 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] hvanhovell closed pull request #40590: [SPARK-42631][CONNECT][FOLLOW-UP] Expose Column.expr to extensions

2023-03-29 Thread via GitHub
hvanhovell closed pull request #40590: [SPARK-42631][CONNECT][FOLLOW-UP] Expose Column.expr to extensions URL: https://github.com/apache/spark/pull/40590 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] hvanhovell commented on pull request #40590: [SPARK-42631][CONNECT][FOLLOW-UP] Expose Column.expr to extensions

2023-03-29 Thread via GitHub
hvanhovell commented on PR #40590: URL: https://github.com/apache/spark/pull/40590#issuecomment-1489181163 Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] amaliujia commented on a diff in pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-03-29 Thread via GitHub
amaliujia commented on code in PR #40586: URL: https://github.com/apache/spark/pull/40586#discussion_r1152388067 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -177,3 +179,97 @@ message WriteOperationV2 { // (Optional) A condition for overwr

[GitHub] [spark] amaliujia commented on a diff in pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-03-29 Thread via GitHub
amaliujia commented on code in PR #40586: URL: https://github.com/apache/spark/pull/40586#discussion_r1152387534 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -177,3 +179,97 @@ message WriteOperationV2 { // (Optional) A condition for overwr

[GitHub] [spark] amaliujia commented on a diff in pull request #40581: [SPARK-42953][Connect] Typed filter, map, flatMap, mapPartitions

2023-03-29 Thread via GitHub
amaliujia commented on code in PR #40581: URL: https://github.com/apache/spark/pull/40581#discussion_r1152385742 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -482,27 +482,66 @@ class SparkConnectPlanner(val sess

[GitHub] [spark] amaliujia commented on pull request #40590: [SPARK-42631][CONNECT][FOLLOW-UP] Expose Column.expr to extensions

2023-03-29 Thread via GitHub
amaliujia commented on PR #40590: URL: https://github.com/apache/spark/pull/40590#issuecomment-1489129059 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [spark] jiangxb1987 opened a new pull request, #40592: [SPARK-42967] Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is cancelled

2023-03-29 Thread via GitHub
jiangxb1987 opened a new pull request, #40592: URL: https://github.com/apache/spark/pull/40592 ### What changes were proposed in this pull request? The PR fixes a bug that SparkListenerTaskStart can have `stageAttemptId = -1` when a task is launched after the stage is cancelled. A

[GitHub] [spark] WweiL commented on a diff in pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-03-29 Thread via GitHub
WweiL commented on code in PR #40586: URL: https://github.com/apache/spark/pull/40586#discussion_r1152323338 ## python/pyspark/sql/connect/streaming/query.py: ## @@ -0,0 +1,173 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agr

[GitHub] [spark] rangadi commented on a diff in pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-03-29 Thread via GitHub
rangadi commented on code in PR #40586: URL: https://github.com/apache/spark/pull/40586#discussion_r1152315438 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1969,6 +2014,136 @@ class SparkConnectPlanner(val sess

[GitHub] [spark] WweiL commented on a diff in pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-03-29 Thread via GitHub
WweiL commented on code in PR #40586: URL: https://github.com/apache/spark/pull/40586#discussion_r1152307065 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1969,6 +2014,136 @@ class SparkConnectPlanner(val sessio

[GitHub] [spark] WweiL commented on a diff in pull request #40586: [SPARK-42939][SS][CONNECT] Core streaming Python API for Spark Connect

2023-03-29 Thread via GitHub
WweiL commented on code in PR #40586: URL: https://github.com/apache/spark/pull/40586#discussion_r1152296460 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1969,6 +2014,136 @@ class SparkConnectPlanner(val sessio

[GitHub] [spark] hvanhovell commented on pull request #40291: [WIP][SPARK-42578][CONNECT] Add JDBC to DataFrameWriter

2023-03-29 Thread via GitHub
hvanhovell commented on PR #40291: URL: https://github.com/apache/spark/pull/40291#issuecomment-1489025120 Is that https://github.com/apache/spark/pull/40415? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] zhenlineo commented on a diff in pull request #40581: [SPARK-42953][Connect] Typed filter, map, flatMap, mapPartitions

2023-03-29 Thread via GitHub
zhenlineo commented on code in PR #40581: URL: https://github.com/apache/spark/pull/40581#discussion_r1152274921 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/IntegrationTestUtils.scala: ## @@ -43,7 +45,27 @@ object IntegrationTestUtils

[GitHub] [spark] paul-laffon-dd opened a new pull request, #40591: [SPARK-42950][CORE] Add exit code in SparkListenerApplicationEnd

2023-03-29 Thread via GitHub
paul-laffon-dd opened a new pull request, #40591: URL: https://github.com/apache/spark/pull/40591 ### What changes were proposed in this pull request? The exit code is already available in the `stop(exitCode: Int)` function of the SparkContext, it only can be propagated to

[GitHub] [spark] sunchao commented on a diff in pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice when no filters in vectorized reader

2023-03-29 Thread via GitHub
sunchao commented on code in PR #39950: URL: https://github.com/apache/spark/pull/39950#discussion_r1152231701 ## sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetFooterReader.java: ## @@ -17,23 +17,53 @@ package org.apache.spark.sql.execution.

[GitHub] [spark] tomvanbussel opened a new pull request, #40590: [SPARK-42631][CONNECT][FOLLOW-UP] Expose Column.expr to extensions

2023-03-29 Thread via GitHub
tomvanbussel opened a new pull request, #40590: URL: https://github.com/apache/spark/pull/40590 ### What changes were proposed in this pull request? This PR is a follow-up to https://github.com/apache/spark/pull/40234, which makes it possible for extensions to create custom `Dataset`s and

[GitHub] [spark] zhenlineo commented on a diff in pull request #40564: [SPARK-42519] [Test] [Connect] Add More WriteTo Tests In Spark Connect Client

2023-03-29 Thread via GitHub
zhenlineo commented on code in PR #40564: URL: https://github.com/apache/spark/pull/40564#discussion_r1152212099 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/IntegrationTestUtils.scala: ## @@ -57,6 +57,12 @@ object IntegrationTestUtils

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40564: [SPARK-42519] [Test] [Connect] Add More WriteTo Tests In Spark Connect Client

2023-03-29 Thread via GitHub
Hisoka-X commented on code in PR #40564: URL: https://github.com/apache/spark/pull/40564#discussion_r1152172912 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -58,10 +58,12 @@ object SparkConnectServerUtils

[GitHub] [spark] MaxGekk closed pull request #40565: [SPARK-42873][SQL] Define Spark SQL types as keywords

2023-03-29 Thread via GitHub
MaxGekk closed pull request #40565: [SPARK-42873][SQL] Define Spark SQL types as keywords URL: https://github.com/apache/spark/pull/40565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] MaxGekk commented on pull request #40565: [SPARK-42873][SQL] Define Spark SQL types as keywords

2023-03-29 Thread via GitHub
MaxGekk commented on PR #40565: URL: https://github.com/apache/spark/pull/40565#issuecomment-1488877352 Merging to master. Thank you, @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] MaxGekk commented on pull request #40565: [SPARK-42873][SQL] Define Spark SQL types as keywords

2023-03-29 Thread via GitHub
MaxGekk commented on PR #40565: URL: https://github.com/apache/spark/pull/40565#issuecomment-1488876848 Highly likely, the GA `continuous-integration/appveyor/pr` is not related to my changes. I am going to merge this PR. -- This is an automated message from the Apache Git Service. To res

[GitHub] [spark] zhenlineo commented on a diff in pull request #40564: [SPARK-42519] [Test] [Connect] Add More WriteTo Tests In Spark Connect Client

2023-03-29 Thread via GitHub
zhenlineo commented on code in PR #40564: URL: https://github.com/apache/spark/pull/40564#discussion_r1152081937 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/RemoteSparkSession.scala: ## @@ -58,10 +58,12 @@ object SparkConnectServerUtil

[GitHub] [spark] infoankitp commented on a diff in pull request #40563: [SPARK-41232][SPARK-41233][FOLLOWUP] Refactor `array_append` and `array_prepend` with `RuntimeReplaceable`

2023-03-29 Thread via GitHub
infoankitp commented on code in PR #40563: URL: https://github.com/apache/spark/pull/40563#discussion_r1151970771 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala: ## @@ -1855,50 +1855,6 @@ class CollectionExpressionsSuite

[GitHub] [spark] infoankitp commented on a diff in pull request #40563: [SPARK-41232][SPARK-41233][FOLLOWUP] Refactor `array_append` and `array_prepend` with `RuntimeReplaceable`

2023-03-29 Thread via GitHub
infoankitp commented on code in PR #40563: URL: https://github.com/apache/spark/pull/40563#discussion_r1151952607 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -1400,120 +1400,24 @@ case class ArrayContains(left: Expre

[GitHub] [spark] VindhyaG commented on a diff in pull request #40553: [SPARK-39722] [SQL] getString API for Dataset

2023-03-29 Thread via GitHub
VindhyaG commented on code in PR #40553: URL: https://github.com/apache/spark/pull/40553#discussion_r1151950076 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -883,6 +883,129 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, verti

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

2023-03-29 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40300: URL: https://github.com/apache/spark/pull/40300#discussion_r1151946878 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsMetadataColumns.java: ## @@ -48,11 +47,22 @@ public interface SupportsMetad

[GitHub] [spark] VindhyaG commented on a diff in pull request #40553: [SPARK-39722] [SQL] getString API for Dataset

2023-03-29 Thread via GitHub
VindhyaG commented on code in PR #40553: URL: https://github.com/apache/spark/pull/40553#discussion_r1151938443 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -535,6 +535,159 @@ class Dataset[T] private[sql] ( } } + /** + *

[GitHub] [spark] VindhyaG commented on a diff in pull request #40553: [SPARK-39722] [SQL] getString API for Dataset

2023-03-29 Thread via GitHub
VindhyaG commented on code in PR #40553: URL: https://github.com/apache/spark/pull/40553#discussion_r1151936705 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -883,6 +883,129 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, verti

[GitHub] [spark] yabola closed pull request #40495: test reading footer within file range

2023-03-29 Thread via GitHub
yabola closed pull request #40495: test reading footer within file range URL: https://github.com/apache/spark/pull/40495 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [spark] Kwafoor commented on a diff in pull request #40294: [SPARK-40610][SQL] Support unwrap date type to string type

2023-03-29 Thread via GitHub
Kwafoor commented on code in PR #40294: URL: https://github.com/apache/spark/pull/40294#discussion_r1151924388 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala: ## @@ -133,6 +133,11 @@ object UnwrapCastInBinaryComparison e

[GitHub] [spark] ulysses-you commented on pull request #40589: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE query stage optimizer

2023-03-29 Thread via GitHub
ulysses-you commented on PR #40589: URL: https://github.com/apache/spark/pull/40589#issuecomment-1488464221 cc @cloud-fan @dongjoon-hyun @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] ulysses-you opened a new pull request, #40589: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE query stage optimizer

2023-03-29 Thread via GitHub
ulysses-you opened a new pull request, #40589: URL: https://github.com/apache/spark/pull/40589 ### What changes were proposed in this pull request? Add `injectQueryStageOptimizerRule` public method in `SparkSessionExtensions` ### Why are the changes needed? Provid

[GitHub] [spark] yaooqinn opened a new pull request, #40588: [SPARK-42964][SQL] PosgresDialect '42P07' also means table already exists

2023-03-29 Thread via GitHub
yaooqinn opened a new pull request, #40588: URL: https://github.com/apache/spark/pull/40588 ### What changes were proposed in this pull request? This PR redirects '42P07' SQL state to table not found according to the doc - https://www.postgresql.org/docs/14/errcodes-ap

[GitHub] [spark] tamama commented on pull request #37206: [SPARK-39696][CORE] Ensure Concurrent r/w `TaskMetrics` not throw Exception

2023-03-29 Thread via GitHub
tamama commented on PR #37206: URL: https://github.com/apache/spark/pull/37206#issuecomment-1488285148 > > > > We intend to fallback to [Spark-3](https://issues.apache.org/jira/browse/SPARK-3).3.1 Scala-2.12 (instead of Scala 2.13) > > > > > > > > > @tamama Using Scala 2.12 can a

[GitHub] [spark] beliefer commented on a diff in pull request #40563: [SPARK-41232][SPARK-41233][FOLLOWUP] Refactor `array_append` and `array_prepend` with `RuntimeReplaceable`

2023-03-29 Thread via GitHub
beliefer commented on code in PR #40563: URL: https://github.com/apache/spark/pull/40563#discussion_r1151672500 ## connector/connect/common/src/test/resources/query-tests/explain-results/function_array_append.explain: ## @@ -1,2 +1,2 @@ -Project [array_append(e#0, 1) AS array_ap

[GitHub] [spark] beliefer commented on a diff in pull request #40563: [SPARK-41232][SPARK-41233][FOLLOWUP] Refactor `array_append` and `array_prepend` with `RuntimeReplaceable`

2023-03-29 Thread via GitHub
beliefer commented on code in PR #40563: URL: https://github.com/apache/spark/pull/40563#discussion_r1151668147 ## connector/connect/common/src/test/resources/query-tests/explain-results/function_array_append.explain: ## @@ -1,2 +1,2 @@ -Project [array_append(e#0, 1) AS array_ap

[GitHub] [spark] zhengruifeng commented on pull request #40582: [SPARK-42954][PYTHON][CONNECT] Add `YearMonthIntervalType` to PySpark and Spark Connect Python Client

2023-03-29 Thread via GitHub
zhengruifeng commented on PR #40582: URL: https://github.com/apache/spark/pull/40582#issuecomment-1488273071 thank you for reviews, merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] zhengruifeng closed pull request #40582: [SPARK-42954][PYTHON][CONNECT] Add `YearMonthIntervalType` to PySpark and Spark Connect Python Client

2023-03-29 Thread via GitHub
zhengruifeng closed pull request #40582: [SPARK-42954][PYTHON][CONNECT] Add `YearMonthIntervalType` to PySpark and Spark Connect Python Client URL: https://github.com/apache/spark/pull/40582 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] MaxGekk commented on a diff in pull request #40565: [SPARK-42873][SQL] Define Spark SQL types as keywords

2023-03-29 Thread via GitHub
MaxGekk commented on code in PR #40565: URL: https://github.com/apache/spark/pull/40565#discussion_r1151655717 ## sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -993,14 +993,34 @@ colPosition : position=FIRST | position=AFTER after

[GitHub] [spark] peter-toth commented on a diff in pull request #40268: [SPARK-42500][SQL] ConstantPropagation support more cases

2023-03-29 Thread via GitHub
peter-toth commented on code in PR #40268: URL: https://github.com/apache/spark/pull/40268#discussion_r1151559781 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -113,15 +114,13 @@ object ConstantPropagation extends Rule[LogicalPla

  1   2   >