[GitHub] [spark] cloud-fan commented on pull request #41412: [SPARK-43030][SQL][FOLLOWUP] Fix DeduplicateRelations with duplicate aliases

2023-05-31 Thread via GitHub
cloud-fan commented on PR #41412: URL: https://github.com/apache/spark/pull/41412#issuecomment-1571468877 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] MaxGekk commented on pull request #41411: [SPARK-43910][SQL] Strip `__auto_generated_subquery_name` from ids in errors

2023-05-31 Thread via GitHub
MaxGekk commented on PR #41411: URL: https://github.com/apache/spark/pull/41411#issuecomment-1571449691 cc @bersprockets @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [spark] MaxGekk commented on pull request #41368: [SPARK-43867][SQL] Improve suggested candidates for unresolved attribute

2023-05-31 Thread via GitHub
MaxGekk commented on PR #41368: URL: https://github.com/apache/spark/pull/41368#issuecomment-1571449066 > Someone might follow the suggestion and use __auto_generated_subquery_name.c1 in their query, only to have a later update to Spark change the internal name. Not sure if that's in scope

[GitHub] [spark-connect-go] HyukjinKwon closed pull request #4: [SPARK-43909] Adding PR Template

2023-05-31 Thread via GitHub
HyukjinKwon closed pull request #4: [SPARK-43909] Adding PR Template URL: https://github.com/apache/spark-connect-go/pull/4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [spark-connect-go] HyukjinKwon commented on pull request #4: [SPARK-43909] Adding PR Template

2023-05-31 Thread via GitHub
HyukjinKwon commented on PR #4: URL: https://github.com/apache/spark-connect-go/pull/4#issuecomment-1571445525 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark-docker] Yikun commented on pull request #43: [SPARK-43370] Switch spark user only when run driver and executor

2023-05-31 Thread via GitHub
Yikun commented on PR #43: URL: https://github.com/apache/spark-docker/pull/43#issuecomment-1571445035 @HyukjinKwon @yosifkit Thanks for review. Merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark-docker] Yikun closed pull request #44: [SPARK-43370] Switch spark user only when run driver and executor and set root as default

2023-05-31 Thread via GitHub
Yikun closed pull request #44: [SPARK-43370] Switch spark user only when run driver and executor and set root as default URL: https://github.com/apache/spark-docker/pull/44 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark-docker] Yikun closed pull request #43: [SPARK-43370] Switch spark user only when run driver and executor

2023-05-31 Thread via GitHub
Yikun closed pull request #43: [SPARK-43370] Switch spark user only when run driver and executor URL: https://github.com/apache/spark-docker/pull/43 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [spark-connect-go] grundprinzip opened a new pull request, #4: [SPARK-43909] Adding PR Template

2023-05-31 Thread via GitHub
grundprinzip opened a new pull request, #4: URL: https://github.com/apache/spark-connect-go/pull/4 This patch adds the PR template to the repository. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] grundprinzip commented on pull request #41036: [SPARK-43351] [CONNECT] Add Spark Connect Go prototype code and example

2023-05-31 Thread via GitHub
grundprinzip commented on PR #41036: URL: https://github.com/apache/spark/pull/41036#issuecomment-1571432439 Hi @hiboyang, We've prepared the repository at https://github.com/apache/spark-connect-go so that you mostly just need to drop your files in there. I've already addressed the

[GitHub] [spark] YannByron opened a new pull request, #41417: [SPARK-43908][SQL] Choose the bigger rowCount to initialize BloomFilterAggregate in InjectRuntimeFilter

2023-05-31 Thread via GitHub
YannByron opened a new pull request, #41417: URL: https://github.com/apache/spark/pull/41417 ### What changes were proposed in this pull request? Optimize InjectRuntimeFilter by use the proper `rowCount` to initialize `BloomFilterAggregate` if exists. ### Why are the ch

[GitHub] [spark] beliefer commented on pull request #41328: [WIP][SPARK-40586][CONNECT] Decouple plan transformation and validation on server side

2023-05-31 Thread via GitHub
beliefer commented on PR #41328: URL: https://github.com/apache/spark/pull/41328#issuecomment-1571423760 @grundprinzip Thank you for your review. I agree your opinion that list three issues. This PR provides a proposal that references `Analyzer` and `CheckAnalysis`. As you said, `Analyzer`

[GitHub] [spark] allisonwang-db commented on pull request #41412: [SPARK-43030][SQL][FOLLOWUP] Fix DeduplicateRelations with duplicate aliases

2023-05-31 Thread via GitHub
allisonwang-db commented on PR #41412: URL: https://github.com/apache/spark/pull/41412#issuecomment-1571406003 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HeartSaVioR commented on pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-31 Thread via GitHub
HeartSaVioR commented on PR #41099: URL: https://github.com/apache/spark/pull/41099#issuecomment-1571392102 Yeah... it would be awesome if we can do some manual comparison between attempt #1 vs #2 via looking into log file and track the elapsed time for test suite. We actually saw the timeo

[GitHub] [spark] wForget commented on pull request #41407: [SPARK-43900][SQL] Support optimize skewed partitions even if introduce extra shuffle

2023-05-31 Thread via GitHub
wForget commented on PR #41407: URL: https://github.com/apache/spark/pull/41407#issuecomment-1571392124 > Rebalance(groupingExpressions + constant) achieves the same effect. It can make OptimizeSkewInRebalancePartitions effective, but will introduce additional shuffle when there is no

[GitHub] [spark] wForget commented on pull request #41407: [SPARK-43900][SQL] Support optimize skewed partitions even if introduce extra shuffle

2023-05-31 Thread via GitHub
wForget commented on PR #41407: URL: https://github.com/apache/spark/pull/41407#issuecomment-1571389874 > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

[GitHub] [spark] HyukjinKwon commented on pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-05-31 Thread via GitHub
HyukjinKwon commented on PR #41357: URL: https://github.com/apache/spark/pull/41357#issuecomment-1571388121 Looks fine to me but I would leave it to @hvanhovell and @vicennial to check and merge. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-05-31 Thread via GitHub
HyukjinKwon commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1212603254 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -157,10 +159,34 @@ class SparkConnectArtif

[GitHub] [spark] wForget commented on pull request #41407: [SPARK-43900][SQL] Support optimize skewed partitions even if introduce extra shuffle

2023-05-31 Thread via GitHub
wForget commented on PR #41407: URL: https://github.com/apache/spark/pull/41407#issuecomment-1571387439 > I think the more simpler way is to inject a repartition hint for this case, or do rebalance on a different key with gourp by. -- This is an automated message from the Apache Git Servi

[GitHub] [spark] dongjoon-hyun commented on pull request #41414: [SPARK-43904][BUILD] Upgrade jackson to 2.15.2

2023-05-31 Thread via GitHub
dongjoon-hyun commented on PR #41414: URL: https://github.com/apache/spark/pull/41414#issuecomment-1571386876 Could you rebase this PR, @panbingkun ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] chaoqin-li1123 commented on pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-31 Thread via GitHub
chaoqin-li1123 commented on PR #41099: URL: https://github.com/apache/spark/pull/41099#issuecomment-1571386536 I suspect 5.5 hours is an accident in the test runner side(it has never happened before), but sure we can do some investigation. -- This is an automated message from the Apache G

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-05-31 Thread via GitHub
WeichenXu123 commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1212596759 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -157,10 +159,34 @@ class SparkConnectArti

[GitHub] [spark] dongjoon-hyun commented on pull request #41416: Revert [SPARK-43836][SPARK-43845][SPARK-43858][SPARK-43858] to keep Scala 2.12 for Spark 3.x

2023-05-31 Thread via GitHub
dongjoon-hyun commented on PR #41416: URL: https://github.com/apache/spark/pull/41416#issuecomment-1571382616 Thank you. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] dongjoon-hyun closed pull request #41416: Revert [SPARK-43836][SPARK-43845][SPARK-43858][SPARK-43858] to keep Scala 2.12 for Spark 3.x

2023-05-31 Thread via GitHub
dongjoon-hyun closed pull request #41416: Revert [SPARK-43836][SPARK-43845][SPARK-43858][SPARK-43858] to keep Scala 2.12 for Spark 3.x URL: https://github.com/apache/spark/pull/41416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-05-31 Thread via GitHub
WeichenXu123 commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1212596759 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -157,10 +159,34 @@ class SparkConnectArti

[GitHub] [spark] HeartSaVioR closed pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-31 Thread via GitHub
HeartSaVioR closed pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider URL: https://github.com/apache/spark/pull/41099 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [spark] HeartSaVioR commented on pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-31 Thread via GitHub
HeartSaVioR commented on PR #41099: URL: https://github.com/apache/spark/pull/41099#issuecomment-1571375934 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR commented on pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-31 Thread via GitHub
HeartSaVioR commented on PR #41099: URL: https://github.com/apache/spark/pull/41099#issuecomment-1571375720 @chaoqin-li1123 I see you've made another attempt and it passed. Thanks for doing that! I'll merge to unblock, but let's file a JIRA ticket and investigate to ensure that this c

[GitHub] [spark] advancedxy commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-31 Thread via GitHub
advancedxy commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1212578282 ## resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh: ## @@ -75,6 +75,9 @@ elif ! [ -z ${SPARK_HOME+x} ]; then SPARK_CLASSPATH="$SPARK_

[GitHub] [spark] cloud-fan commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-05-31 Thread via GitHub
cloud-fan commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1212568294 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -145,31 +154,38 @@ case class ShuffledHashJoinExec( } /** -

[GitHub] [spark] cloud-fan commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-05-31 Thread via GitHub
cloud-fan commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1212566624 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -57,6 +57,8 @@ case class ShuffledHashJoinExec( override def ou

[GitHub] [spark] cloud-fan commented on pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-05-31 Thread via GitHub
cloud-fan commented on PR #41398: URL: https://github.com/apache/spark/pull/41398#issuecomment-1571331319 cc @maryannxue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-05-31 Thread via GitHub
WeichenXu123 commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1212562805 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAddArtifactsHandler.scala: ## @@ -100,7 +100,15 @@ class SparkConnectAd

[GitHub] [spark-connect-go] grundprinzip commented on pull request #3: [SPARK-43895] Basic Repository Layout

2023-05-31 Thread via GitHub
grundprinzip commented on PR #3: URL: https://github.com/apache/spark-connect-go/pull/3#issuecomment-1571327620 Thanks for merging -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-05-31 Thread via GitHub
WeichenXu123 commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1212561246 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -157,10 +159,34 @@ class SparkConnectArti

[GitHub] [spark-connect-go] grundprinzip commented on a diff in pull request #3: [SPARK-43895] Basic Repository Layout

2023-05-31 Thread via GitHub
grundprinzip commented on code in PR #3: URL: https://github.com/apache/spark-connect-go/pull/3#discussion_r1212561827 ## Makefile: ## @@ -0,0 +1,105 @@ +# Review Comment: I wrote this. -- This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [spark] turboFei commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

2023-05-31 Thread via GitHub
turboFei commented on PR #41181: URL: https://github.com/apache/spark/pull/41181#issuecomment-1571327090 thanks all !!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-05-31 Thread via GitHub
WeichenXu123 commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1212561246 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -157,10 +159,34 @@ class SparkConnectArti

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-05-31 Thread via GitHub
WeichenXu123 commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1212554762 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -187,6 +188,19 @@ def _parse_artifacts(self, path_or_uri: str, pyfile: bool, archive: bool) -> Lis

[GitHub] [spark] dongjoon-hyun commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

2023-05-31 Thread via GitHub
dongjoon-hyun commented on PR #41181: URL: https://github.com/apache/spark/pull/41181#issuecomment-1571318994 Merged to master for Apache Spark 3.5.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun closed pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

2023-05-31 Thread via GitHub
dongjoon-hyun closed pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod URL: https://github.com/apache/spark/pull/41181 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] wangyum commented on a diff in pull request #40744: [SPARK-24497][SQL] Support recursive SQL

2023-05-31 Thread via GitHub
wangyum commented on code in PR #40744: URL: https://github.com/apache/spark/pull/40744#discussion_r1212539620 ## sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala: ## @@ -714,6 +717,121 @@ case class UnionExec(children: Seq[SparkPlan]) extends

[GitHub] [spark] wForget commented on pull request #41407: [SPARK-43900][SQL] Support optimize skewed partitions even if introduce extra shuffle

2023-05-31 Thread via GitHub
wForget commented on PR #41407: URL: https://github.com/apache/spark/pull/41407#issuecomment-1571297406 `Rebalance(groupingExpressions + constant)` achieves the same effect. ``` SELECT c1, count(1) FROM (SELECT /*+ REBALANCE(c1, n) */ c1, 1 as n FROM v) t group by c1 ``` -- T

[GitHub] [spark] wangyum commented on a diff in pull request #40744: [SPARK-24497][SQL] Support recursive SQL

2023-05-31 Thread via GitHub
wangyum commented on code in PR #40744: URL: https://github.com/apache/spark/pull/40744#discussion_r1212534490 ## sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala: ## @@ -714,6 +717,121 @@ case class UnionExec(children: Seq[SparkPlan]) extends

[GitHub] [spark] LuciferYang commented on pull request #41416: Revert [SPARK-43836][SPARK-43845][SPARK-43858][SPARK-43858] to keep Scala 2.12 for Spark 3.x

2023-05-31 Thread via GitHub
LuciferYang commented on PR #41416: URL: https://github.com/apache/spark/pull/41416#issuecomment-1571292176 > We may keep them in a separate directory or different file names later. But, yes, we are unable to overwrite them for now.. Sorry for that. Got it, Let's wait ~ -- This is

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-05-31 Thread via GitHub
HyukjinKwon commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1212527926 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -187,6 +188,19 @@ def _parse_artifacts(self, path_or_uri: str, pyfile: bool, archive: bool) -> Lis

[GitHub] [spark-docker] Yikun commented on pull request #43: [SPARK-43370] Switch spark user only when run driver and executor

2023-05-31 Thread via GitHub
Yikun commented on PR #43: URL: https://github.com/apache/spark-docker/pull/43#issuecomment-1571280119 According the suggestion of https://github.com/docker-library/official-images/pull/13089#issuecomment-1570733215 > I recommend that if the entrypoint needs to do setup work as root b

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-05-31 Thread via GitHub
HyukjinKwon commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1212527198 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -157,10 +159,34 @@ class SparkConnectArtif

[GitHub] [spark] dongjoon-hyun commented on pull request #41416: Revert [SPARK-43836][SPARK-43845][SPARK-43858][SPARK-43858] to keep Scala 2.12 for Spark 3.x

2023-05-31 Thread via GitHub
dongjoon-hyun commented on PR #41416: URL: https://github.com/apache/spark/pull/41416#issuecomment-1571278690 We may keep them in a separate directory or different file names later. But, yes, we are unable to overwrite them for now.. Sorry for that. -- This is an automated message from th

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-05-31 Thread via GitHub
HyukjinKwon commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1212526198 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectAddArtifactsHandler.scala: ## @@ -100,7 +100,15 @@ class SparkConnectAdd

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40953: [SPARK-43267][JDBC] Handle postgres unknown user-defined column as string in array

2023-05-31 Thread via GitHub
Hisoka-X commented on code in PR #40953: URL: https://github.com/apache/spark/pull/40953#discussion_r1212524603 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala: ## @@ -93,13 +93,15 @@ private object PostgresDialect extends JdbcDialect with SQLConfHelp

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-05-31 Thread via GitHub
HyukjinKwon commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1212522454 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -157,10 +159,34 @@ class SparkConnectArtif

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-05-31 Thread via GitHub
HyukjinKwon commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1212522100 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -187,6 +188,19 @@ def _parse_artifacts(self, path_or_uri: str, pyfile: bool, archive: bool) -> Lis

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-05-31 Thread via GitHub
HyukjinKwon commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1212521469 ## python/pyspark/sql/connect/session.py: ## @@ -625,6 +625,26 @@ def addArtifacts(self, *path: str, pyfile: bool = False, archive: bool = False) addArtifac

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41357: [SPARK-43790][PYTHON][CONNECT][ML] Add `copyFromLocalToFs` API

2023-05-31 Thread via GitHub
HyukjinKwon commented on code in PR #41357: URL: https://github.com/apache/spark/pull/41357#discussion_r1212521231 ## connector/connect/server/src/test/scala/org/apache/spark/sql/connect/artifact/ArtifactManagerSuite.scala: ## @@ -145,4 +145,23 @@ class ArtifactManagerSuite exte

[GitHub] [spark] LuciferYang commented on pull request #41416: Revert [SPARK-43836][SPARK-43845][SPARK-43858][SPARK-43858] to keep Scala 2.12 for Spark 3.x

2023-05-31 Thread via GitHub
LuciferYang commented on PR #41416: URL: https://github.com/apache/spark/pull/41416#issuecomment-1571267866 So we needn't to update the benchmark results for Scala 2.13 for now, right? I'll close [SPARK-43857](https://issues.apache.org/jira/browse/SPARK-43857) -- This is an automated mess

[GitHub] [spark] LuciferYang commented on pull request #41402: [SPARK-43898][CORE] Automatically register `immutable.ArraySeq$ofRef` to `KryoSerializer` for Scala 2.13

2023-05-31 Thread via GitHub
LuciferYang commented on PR #41402: URL: https://github.com/apache/spark/pull/41402#issuecomment-1571265943 - 8: https://github.com/LuciferYang/spark/actions/runs/5139902307 - 11: https://github.com/LuciferYang/spark/actions/runs/5139903469 - 17: https://github.com/LuciferYang/spark/act

[GitHub] [spark] LuciferYang commented on pull request #41402: [SPARK-43898][CORE] Automatically register `immutable.ArraySeq$ofRef` to `KryoSerializer` for Scala 2.13

2023-05-31 Thread via GitHub
LuciferYang commented on PR #41402: URL: https://github.com/apache/spark/pull/41402#issuecomment-1571260670 > * 8: https://github.com/LuciferYang/spark/actions/runs/5138855322 > * 11: https://github.com/LuciferYang/spark/actions/runs/5138856913 > * 17: https://github.com/LuciferYang/spa

[GitHub] [spark] dongjoon-hyun commented on pull request #41399: [SPARK-43894][PYTHON] Fix bug in df.cache()

2023-05-31 Thread via GitHub
dongjoon-hyun commented on PR #41399: URL: https://github.com/apache/spark/pull/41399#issuecomment-1571247978 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun commented on pull request #41416: Revert [SPARK-43836][SPARK-43845][SPARK-43858][SPARK-43858] to keep Scala 2.12 for Spark 3.x

2023-05-31 Thread via GitHub
dongjoon-hyun commented on PR #41416: URL: https://github.com/apache/spark/pull/41416#issuecomment-1571244211 Thank you, @HyukjinKwon . Also, cc @srowen , @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] WeichenXu123 commented on pull request #41404: [SPARK-41593][FOLLOW-UP][ML] Torch distributor log streaming server: Avoid duplicated log to stdout redirection

2023-05-31 Thread via GitHub
WeichenXu123 commented on PR #41404: URL: https://github.com/apache/spark/pull/41404#issuecomment-1571237500 @HyukjinKwon Yes there's some > I reran the tests but I think the failure is related. Yes test failed because the test checks the log server redirection output but test

[GitHub] [spark] dongjoon-hyun opened a new pull request, #41416: Revert 2.13

2023-05-31 Thread via GitHub
dongjoon-hyun opened a new pull request, #41416: URL: https://github.com/apache/spark/pull/41416 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

[GitHub] [spark] HeartSaVioR commented on pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-31 Thread via GitHub
HeartSaVioR commented on PR #41099: URL: https://github.com/apache/spark/pull/41099#issuecomment-1571233768 @chaoqin-li1123 Could you please look at how long the module `[Run / Build modules: sql - other tests](https://github.com/chaoqin-li1123/spark/actions/runs/5135977794/jobs/9242562

[GitHub] [spark] HyukjinKwon commented on pull request #41415: [SPARK-43906][PYTHON][CONNECT] Implement the file support in SparkSession.addArtifacts

2023-05-31 Thread via GitHub
HyukjinKwon commented on PR #41415: URL: https://github.com/apache/spark/pull/41415#issuecomment-1571229800 . cc @hvanhovell @vicennial, mind taking a look please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] HyukjinKwon opened a new pull request, #41415: [SPARK-43906][PYTHON][CONNECT] Implement the file support in SparkSession.addArtifacts

2023-05-31 Thread via GitHub
HyukjinKwon opened a new pull request, #41415: URL: https://github.com/apache/spark/pull/41415 ### What changes were proposed in this pull request? This PR proposes to add the support of the regular files in `SparkSession.addArtifacts`. ### Why are the changes needed? So

[GitHub] [spark] wForget commented on pull request #41407: [SPARK-43900][SQL] Support optimize skewed partitions even if introduce extra shuffle

2023-05-31 Thread via GitHub
wForget commented on PR #41407: URL: https://github.com/apache/spark/pull/41407#issuecomment-1571228458 > I think the more simpler way is to inject a repartition hint for this case, There seems to be data skew, although relatively minor. > or do rebalance on a different key with

[GitHub] [spark] dongjoon-hyun closed pull request #41402: [SPARK-43898][CORE] Automatically register `immutable.ArraySeq$ofRef` to `KryoSerializer` for Scala 2.13

2023-05-31 Thread via GitHub
dongjoon-hyun closed pull request #41402: [SPARK-43898][CORE] Automatically register `immutable.ArraySeq$ofRef` to `KryoSerializer` for Scala 2.13 URL: https://github.com/apache/spark/pull/41402 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] dongjoon-hyun commented on pull request #41402: [SPARK-43898][CORE] Automatically register `immutable.ArraySeq$ofRef` to `KryoSerializer` for Scala 2.13

2023-05-31 Thread via GitHub
dongjoon-hyun commented on PR #41402: URL: https://github.com/apache/spark/pull/41402#issuecomment-1571227390 Thank you! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] wForget commented on pull request #41407: [SPARK-43900][SQL] Support optimize skewed partitions even if introduce extra shuffle

2023-05-31 Thread via GitHub
wForget commented on PR #41407: URL: https://github.com/apache/spark/pull/41407#issuecomment-1571217222 > I think the more simpler way is to inject a repartition hint for this case, or do rebalance on a different key with gourp by. Optimize skew can not handle some cases which keys have hig

[GitHub] [spark] beliefer commented on pull request #41366: [SPARK-43852][SPARK-43853][SPARK-43854][SPARK-43855][SPARK-43856] Assign names to the error class _LEGACY_ERROR_TEMP_2418-2425

2023-05-31 Thread via GitHub
beliefer commented on PR #41366: URL: https://github.com/apache/spark/pull/41366#issuecomment-1571207939 @MaxGekk Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] ulysses-you commented on pull request #41407: [SPARK-43900][SQL] Support optimize skewed partitions even if introduce extra shuffle

2023-05-31 Thread via GitHub
ulysses-you commented on PR #41407: URL: https://github.com/apache/spark/pull/41407#issuecomment-1571197826 I think the more simpler way is to inject a repartition hint for this case, or do rebalance on a different key with gourp by. Optimize skew can not handle some cases which keys have h

[GitHub] [spark] ulysses-you commented on a diff in pull request #40953: [SPARK-43267][JDBC] Handle postgres unknown user-defined column as string in array

2023-05-31 Thread via GitHub
ulysses-you commented on code in PR #40953: URL: https://github.com/apache/spark/pull/40953#discussion_r1212467515 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala: ## @@ -93,13 +93,15 @@ private object PostgresDialect extends JdbcDialect with SQLConfH

[GitHub] [spark] HyukjinKwon closed pull request #41405: [SPARK-43516][ML][FOLLOW-UP] Make `pyspark.mlv2` module supports python < 3.9

2023-05-31 Thread via GitHub
HyukjinKwon closed pull request #41405: [SPARK-43516][ML][FOLLOW-UP] Make `pyspark.mlv2` module supports python < 3.9 URL: https://github.com/apache/spark/pull/41405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #41405: [SPARK-43516][ML][FOLLOW-UP] Make `pyspark.mlv2` module supports python < 3.9

2023-05-31 Thread via GitHub
HyukjinKwon commented on PR #41405: URL: https://github.com/apache/spark/pull/41405#issuecomment-1571186928 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on pull request #41404: [SPARK-41593][FOLLOW-UP][ML] Torch distributor log streaming server: Avoid duplicated log to stdout redirection

2023-05-31 Thread via GitHub
HyukjinKwon commented on PR #41404: URL: https://github.com/apache/spark/pull/41404#issuecomment-1571186699 I reran the tests but I think the failure is related. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] panbingkun opened a new pull request, #41414: [SPARK-43904][BUILD] Upgrade jackson to 2.15.2

2023-05-31 Thread via GitHub
panbingkun opened a new pull request, #41414: URL: https://github.com/apache/spark/pull/41414 ### What changes were proposed in this pull request? Upgrade FasterXML jackson from 2.15.1 to 2.15.2 ### Why are the changes needed? New version that fix some bugs, release notes as fol

[GitHub] [spark] huaxingao commented on pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-05-31 Thread via GitHub
huaxingao commented on PR #41398: URL: https://github.com/apache/spark/pull/41398#issuecomment-1571166360 @szehon-ho Thanks for the PR! The change looks reasonable to me. I have left a few minor comments. -- This is an automated message from the Apache Git Service. To respond to the messa

[GitHub] [spark] huaxingao commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-05-31 Thread via GitHub
huaxingao commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1212455882 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -83,8 +85,10 @@ case class ShuffledHashJoinExec( iter,

[GitHub] [spark] HyukjinKwon closed pull request #41400: [SPARK-43896][TESTS][PS][CONNECT] Enable `test_iterrows` and `test_itertuples` on Connect

2023-05-31 Thread via GitHub
HyukjinKwon closed pull request #41400: [SPARK-43896][TESTS][PS][CONNECT] Enable `test_iterrows` and `test_itertuples` on Connect URL: https://github.com/apache/spark/pull/41400 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #41400: [SPARK-43896][TESTS][PS][CONNECT] Enable `test_iterrows` and `test_itertuples` on Connect

2023-05-31 Thread via GitHub
HyukjinKwon commented on PR #41400: URL: https://github.com/apache/spark/pull/41400#issuecomment-1571157663 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] huaxingao commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-05-31 Thread via GitHub
huaxingao commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1212452822 ## sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala: ## @@ -622,28 +632,23 @@ class JoinHintSuite extends PlanTest with SharedSparkSession with Adapt

[GitHub] [spark] puchengy commented on pull request #41332: [SPARK-43801][SQL] Support unwrap date type to string type in UnwrapCastInBinaryComparison

2023-05-31 Thread via GitHub
puchengy commented on PR #41332: URL: https://github.com/apache/spark/pull/41332#issuecomment-1571137648 @wangyum Hi, sorry I did not get a chance to look into this. Will take a look as soon as possible. -- This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [spark] github-actions[bot] commented on pull request #39967: [SPARK-42395][K8S]The code logic of the configmap max size validation lacks extra content

2023-05-31 Thread via GitHub
github-actions[bot] commented on PR #39967: URL: https://github.com/apache/spark/pull/39967#issuecomment-1571137183 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #39908: [SPARK-42360][SQL] Transform LeftOuter join with IsNull filter on right side to Anti join

2023-05-31 Thread via GitHub
github-actions[bot] closed pull request #39908: [SPARK-42360][SQL] Transform LeftOuter join with IsNull filter on right side to Anti join URL: https://github.com/apache/spark/pull/39908 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [spark] huaxingao commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-05-31 Thread via GitHub
huaxingao commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1212437084 ## sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala: ## @@ -507,8 +507,6 @@ class JoinHintSuite extends PlanTest with SharedSparkSession with Adaptiv

[GitHub] [spark-connect-go] HyukjinKwon commented on pull request #3: [SPARK-43895] Basic Repository Layout

2023-05-31 Thread via GitHub
HyukjinKwon commented on PR #3: URL: https://github.com/apache/spark-connect-go/pull/3#issuecomment-1571127467 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark-connect-go] HyukjinKwon closed pull request #3: [SPARK-43895] Basic Repository Layout

2023-05-31 Thread via GitHub
HyukjinKwon closed pull request #3: [SPARK-43895] Basic Repository Layout URL: https://github.com/apache/spark-connect-go/pull/3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang commented on pull request #41402: [SPARK-43898][CORE] Automatically register `immutable.ArraySeq$ofRef` to `KryoSerializer` for Scala 2.13

2023-05-31 Thread via GitHub
LuciferYang commented on PR #41402: URL: https://github.com/apache/spark/pull/41402#issuecomment-1571126773 - 8: https://github.com/LuciferYang/spark/actions/runs/5138855322 - 11: https://github.com/LuciferYang/spark/actions/runs/5138856913 - 17: https://github.com/LuciferYang/spark/act

[GitHub] [spark] LuciferYang commented on pull request #41402: [SPARK-43898][CORE] Automatically register `immutable.ArraySeq$ofRef` to `KryoSerializer` for Scala 2.13

2023-05-31 Thread via GitHub
LuciferYang commented on PR #41402: URL: https://github.com/apache/spark/pull/41402#issuecomment-1571121079 > Is this the last missed class? What I mean is the benchmark suite passes with this patch? Let me double check again with this one. -- This is an automated message from the

[GitHub] [spark-connect-go] HyukjinKwon commented on a diff in pull request #3: [SPARK-43895] Basic Repository Layout

2023-05-31 Thread via GitHub
HyukjinKwon commented on code in PR #3: URL: https://github.com/apache/spark-connect-go/pull/3#discussion_r1212431887 ## Makefile: ## @@ -0,0 +1,105 @@ +# Review Comment: qq, did you write this Makefile? or get it from somewhere? -- This is an automated message from the

[GitHub] [spark-connect-go] HyukjinKwon commented on a diff in pull request #3: [SPARK-43895] Basic Repository Layout

2023-05-31 Thread via GitHub
HyukjinKwon commented on code in PR #3: URL: https://github.com/apache/spark-connect-go/pull/3#discussion_r1212431632 ## go.mod: ## @@ -0,0 +1,53 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file di

[GitHub] [spark-connect-go] HyukjinKwon commented on a diff in pull request #3: [SPARK-43895] Basic Repository Layout

2023-05-31 Thread via GitHub
HyukjinKwon commented on code in PR #3: URL: https://github.com/apache/spark-connect-go/pull/3#discussion_r1212430527 ## Makefile: ## @@ -0,0 +1,105 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE fil

[GitHub] [spark] henrymai opened a new pull request, #41413: [SPARK-PendingAccountCreation] Refactor code to consolidate BlockId handling

2023-05-31 Thread via GitHub
henrymai opened a new pull request, #41413: URL: https://github.com/apache/spark/pull/41413 ### What changes were proposed in this pull request? Consolidating BlockId parsing and handling helps to cut down on errors arising from parsing the BlockId and also eliminates the need to manu

[GitHub] [spark-connect-go] grundprinzip commented on pull request #3: [Spark 43895] Basic Repository Layout

2023-05-31 Thread via GitHub
grundprinzip commented on PR #3: URL: https://github.com/apache/spark-connect-go/pull/3#issuecomment-1571097971 @zhengruifeng @HyukjinKwon this PR should be good now and the last PR before we can bring in the changes of Bo for the base client. The next step would be adding the workflow inte

[GitHub] [spark] zeruibao commented on a diff in pull request #41052: [SPARK-43380][SQL] Fix Avro data type conversion issues to avoid producing incorrect results

2023-05-31 Thread via GitHub
zeruibao commented on code in PR #41052: URL: https://github.com/apache/spark/pull/41052#discussion_r1212410836 ## connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala: ## @@ -117,6 +119,10 @@ private[sql] class AvroDeserializer( val incompatibleMs

[GitHub] [spark] zeruibao commented on a diff in pull request #41052: [SPARK-43380][SQL] Fix Avro data type conversion issues to avoid producing incorrect results

2023-05-31 Thread via GitHub
zeruibao commented on code in PR #41052: URL: https://github.com/apache/spark/pull/41052#discussion_r1212410513 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -3510,4 +3510,34 @@ private[sql] object QueryCompilationErrors extends

[GitHub] [spark] anishshri-db commented on pull request #41410: [SPARK-43902][SS] Use keyMayExist to check if key is absent and avoid gets while tracking metrics using RocksDB state store provider

2023-05-31 Thread via GitHub
anishshri-db commented on PR #41410: URL: https://github.com/apache/spark/pull/41410#issuecomment-1571086374 @siying - could you please take a look too ? From the comment it seems that keyMayExist is a lighter check than get. Is there more nuance here ? https://github.com/facebook/ro

[GitHub] [spark] rangadi commented on a diff in pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-05-31 Thread via GitHub
rangadi commented on code in PR #41409: URL: https://github.com/apache/spark/pull/41409#discussion_r1212391568 ## connector/avro/src/main/java/org/apache/spark/sql/avro/CustomDecimal.scala: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] rangadi commented on a diff in pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-05-31 Thread via GitHub
rangadi commented on code in PR #41409: URL: https://github.com/apache/spark/pull/41409#discussion_r1212390633 ## connector/avro/src/main/java/org/apache/spark/sql/avro/CustomDecimal.scala: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] dtenedor commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general constant expressions as CREATE/REPLACE TABLE OPTIONS values

2023-05-31 Thread via GitHub
dtenedor commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1212368577 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableSpec.scala: ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (A

  1   2   >