[GitHub] [spark] dongjoon-hyun closed pull request #41209: [SPARK-43548][SS] Remove workaround for HADOOP-16255

2023-05-17 Thread via GitHub
dongjoon-hyun closed pull request #41209: [SPARK-43548][SS] Remove workaround for HADOOP-16255 URL: https://github.com/apache/spark/pull/41209 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #41206: [SPARK-43509][PYTHON][CONNECT][FOLLOW-UP] Set SPARK_CONNECT_MODE_ENABLED when running pyspark shell with remote is local

2023-05-17 Thread via GitHub
HyukjinKwon closed pull request #41206: [SPARK-43509][PYTHON][CONNECT][FOLLOW-UP] Set SPARK_CONNECT_MODE_ENABLED when running pyspark shell with remote is local URL: https://github.com/apache/spark/pull/41206 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HyukjinKwon commented on pull request #41206: [SPARK-43509][PYTHON][CONNECT][FOLLOW-UP] Set SPARK_CONNECT_MODE_ENABLED when running pyspark shell with remote is local

2023-05-17 Thread via GitHub
HyukjinKwon commented on PR #41206: URL: https://github.com/apache/spark/pull/41206#issuecomment-1552430701 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #41162: [SPARK-43491][SQL] In expression should act as same as EqualTo when elements in IN expression have same DataType.

2023-05-17 Thread via GitHub
cloud-fan commented on PR #41162: URL: https://github.com/apache/spark/pull/41162#issuecomment-1552411237 I think this is indeed an issue, but it seems a bit weird to special-case the 1-element-in-list case. Thoughts? @gengliangwang @srielau -- This is an automated message from the

[GitHub] [spark] rangadi commented on pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-17 Thread via GitHub
rangadi commented on PR #41192: URL: https://github.com/apache/spark/pull/41192#issuecomment-1552404914 @gengliangwang PTAL when you get chance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #41209: [SPARK-43548][SS] Remove workaround for HADOOP-16255

2023-05-17 Thread via GitHub
HyukjinKwon commented on PR #41209: URL: https://github.com/apache/spark/pull/41209#issuecomment-1552395877 Thanks for clarification. Lgtm2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #41007: [SPARK-43205] IDENTIFIER clause

2023-05-17 Thread via GitHub
cloud-fan commented on code in PR #41007: URL: https://github.com/apache/spark/pull/41007#discussion_r1197357196 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -368,7 +369,7 @@ class AstBuilder extends

[GitHub] [spark] otterc commented on a diff in pull request #41071: [SPARK-43391][CORE] Idle connection should be kept when closeIdleConnection is disabled

2023-05-17 Thread via GitHub
otterc commented on code in PR #41071: URL: https://github.com/apache/spark/pull/41071#discussion_r1197131313 ## common/network-common/src/main/java/org/apache/spark/network/server/TransportChannelHandler.java: ## @@ -163,14 +163,11 @@ public void

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #41099: [SPARK-43421][SS] Implement Changelog based Checkpointing for RocksDB State Store Provider

2023-05-17 Thread via GitHub
chaoqin-li1123 commented on code in PR #41099: URL: https://github.com/apache/spark/pull/41099#discussion_r1197345730 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala: ## @@ -705,7 +795,9 @@ object RocksDBConf { RocksDBConf(

[GitHub] [spark] LuciferYang commented on pull request #41198: [SPARK-43537][INFA][BUILD] Upgrading the ASM dependencies used in the `tools` module to 9.4

2023-05-17 Thread via GitHub
LuciferYang commented on PR #41198: URL: https://github.com/apache/spark/pull/41198#issuecomment-1552371620 Thanks @srowen @dongjoon-hyun ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #41209: [SPARK-43548][SS] Remove workaround for HADOOP-16255

2023-05-17 Thread via GitHub
HyukjinKwon commented on PR #41209: URL: https://github.com/apache/spark/pull/41209#issuecomment-1552366068 Hm, we currently build Spark w/ Hadoop 3.3.0 by default it might be fine but I would also ask some more looks e.g., @srowen @mridulm @tgravescs @dongjoon-hyun -- This is an

[GitHub] [spark] HyukjinKwon commented on pull request #41209: [SPARK-43548][SS] Remove workaround for HADOOP-16255

2023-05-17 Thread via GitHub
HyukjinKwon commented on PR #41209: URL: https://github.com/apache/spark/pull/41209#issuecomment-1552365417 Hmm .. does that mean Hadoop 3.2.0 won't work with this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] turboFei commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

2023-05-17 Thread via GitHub
turboFei commented on PR #41181: URL: https://github.com/apache/spark/pull/41181#issuecomment-1552365072 gentle ping @dongjoon-hyun would you like to review again? thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] rangadi commented on a diff in pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-17 Thread via GitHub
rangadi commented on code in PR #41192: URL: https://github.com/apache/spark/pull/41192#discussion_r1197326408 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala: ## @@ -26,14 +26,14 @@ import org.apache.spark.sql.types.{BinaryType,

[GitHub] [spark] grundprinzip commented on pull request #41206: [SPARK-43509][PYTHON][CONNECT][FOLLOW-UP] Set SPARK_CONNECT_MODE_ENABLED when running pyspark shell with remote is local

2023-05-17 Thread via GitHub
grundprinzip commented on PR #41206: URL: https://github.com/apache/spark/pull/41206#issuecomment-1552363238 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] advancedxy commented on a diff in pull request #41196: [SPARK-43505][K8S] support env variables substitution and executor library path

2023-05-17 Thread via GitHub
advancedxy commented on code in PR #41196: URL: https://github.com/apache/spark/pull/41196#discussion_r1197323037 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBuilder.scala: ## @@ -71,7 +71,9 @@ private[spark]

[GitHub] [spark] panbingkun commented on pull request #41209: [SPARK-43548][SS] Remove workaround for HADOOP-16255

2023-05-17 Thread via GitHub
panbingkun commented on PR #41209: URL: https://github.com/apache/spark/pull/41209#issuecomment-1552341297 https://github.com/apache/spark/assets/15246973/6da74b5d-4e71-440e-bb47-d17ba7f7de1e;> -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon closed pull request #41208: [3.4][SPARK-43547][PS][DOCS] Update "Supported Pandas API" page to point out the proper pandas docs

2023-05-17 Thread via GitHub
HyukjinKwon closed pull request #41208: [3.4][SPARK-43547][PS][DOCS] Update "Supported Pandas API" page to point out the proper pandas docs URL: https://github.com/apache/spark/pull/41208 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #41208: [3.4][SPARK-43547][PS][DOCS] Update "Supported Pandas API" page to point out the proper pandas docs

2023-05-17 Thread via GitHub
HyukjinKwon commented on PR #41208: URL: https://github.com/apache/spark/pull/41208#issuecomment-1552340648 Merged to branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #41207: [3.4][SPARK-42826][FOLLOWUP][PS][DOCS] Update migration notes for pandas API on Spark.

2023-05-17 Thread via GitHub
HyukjinKwon closed pull request #41207: [3.4][SPARK-42826][FOLLOWUP][PS][DOCS] Update migration notes for pandas API on Spark. URL: https://github.com/apache/spark/pull/41207 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #41207: [3.4][SPARK-42826][FOLLOWUP][PS][DOCS] Update migration notes for pandas API on Spark.

2023-05-17 Thread via GitHub
HyukjinKwon commented on PR #41207: URL: https://github.com/apache/spark/pull/41207#issuecomment-1552339005 Merged to branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] pralabhkumar commented on pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
pralabhkumar commented on PR #41201: URL: https://github.com/apache/spark/pull/41201#issuecomment-1552336382 LGTM . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on pull request #41209: [SPARK-43548][SS] Remove workaround for HADOOP-16255

2023-05-17 Thread via GitHub
LuciferYang commented on PR #41209: URL: https://github.com/apache/spark/pull/41209#issuecomment-1552334558 cc @attilapiros @viirya @sunchao @pan3793 FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] panbingkun opened a new pull request, #41209: [SPARK-43548][SS] Remove workaround for HADOOP-16255

2023-05-17 Thread via GitHub
panbingkun opened a new pull request, #41209: URL: https://github.com/apache/spark/pull/41209 ### What changes were proposed in this pull request? The pr aims to remove workaround for HADOOP-16255. ### Why are the changes needed? - Because HADOOP-16255 has been fix after hadoop

[GitHub] [spark] advancedxy commented on a diff in pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-17 Thread via GitHub
advancedxy commented on code in PR #41192: URL: https://github.com/apache/spark/pull/41192#discussion_r1197287073 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala: ## @@ -26,14 +26,14 @@ import

[GitHub] [spark] rangadi commented on a diff in pull request #41129: [SPARK-43133] Scala Client DataStreamWriter Foreach support

2023-05-17 Thread via GitHub
rangadi commented on code in PR #41129: URL: https://github.com/apache/spark/pull/41129#discussion_r1197276979 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -216,6 +216,7 @@ message WriteStreamOperationStart { message StreamingForeachWriter

[GitHub] [spark] rangadi commented on a diff in pull request #41129: [SPARK-43133] Scala Client DataStreamWriter Foreach support

2023-05-17 Thread via GitHub
rangadi commented on code in PR #41129: URL: https://github.com/apache/spark/pull/41129#discussion_r1197275368 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/foreachWriterPacket.scala: ## @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] rangadi commented on a diff in pull request #41129: [SPARK-43133] Scala Client DataStreamWriter Foreach support

2023-05-17 Thread via GitHub
rangadi commented on code in PR #41129: URL: https://github.com/apache/spark/pull/41129#discussion_r1197276153 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2386,10 +2393,26 @@ class SparkConnectPlanner(val

[GitHub] [spark] pan3793 commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
pan3793 commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1196873001 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -425,7 +428,7 @@ private[spark] class SparkSubmit extends Logging { case

[GitHub] [spark] LuciferYang commented on pull request #40654: [SPARK-43022][CONNECT] Support protobuf functions for Scala client

2023-05-17 Thread via GitHub
LuciferYang commented on PR #40654: URL: https://github.com/apache/spark/pull/40654#issuecomment-1552308089 Merged to master. Thanks @hvanhovell @HyukjinKwon @rangadi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] LuciferYang closed pull request #40654: [SPARK-43022][CONNECT] Support protobuf functions for Scala client

2023-05-17 Thread via GitHub
LuciferYang closed pull request #40654: [SPARK-43022][CONNECT] Support protobuf functions for Scala client URL: https://github.com/apache/spark/pull/40654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] rangadi commented on pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-17 Thread via GitHub
rangadi commented on PR #41192: URL: https://github.com/apache/spark/pull/41192#issuecomment-1552305443 @advancedxy broadcast is an interesting idea. Lets continue the discussion in a code comment here: https://github.com/apache/spark/pull/41192#discussion_r1197264386 -- This is an

[GitHub] [spark] rangadi commented on a diff in pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-17 Thread via GitHub
rangadi commented on code in PR #41192: URL: https://github.com/apache/spark/pull/41192#discussion_r1197260139 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/functions.scala: ## @@ -148,8 +212,38 @@ object functions { messageName: String,

[GitHub] [spark] turboFei commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
turboFei commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1197261326 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -425,7 +428,7 @@ private[spark] class SparkSubmit extends Logging { case

[GitHub] [spark] liukuijian8040 commented on a diff in pull request #41162: [SPARK-43491][SQL] In expression should act as same as EqualTo when elements in IN expression have same DataType.

2023-05-17 Thread via GitHub
liukuijian8040 commented on code in PR #41162: URL: https://github.com/apache/spark/pull/41162#discussion_r1197258991 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala: ## @@ -509,16 +509,25 @@ case class In(value: Expression, list:

[GitHub] [spark] liukuijian8040 commented on a diff in pull request #41162: [SPARK-43491][SQL] In expression should act as same as EqualTo when elements in IN expression have same DataType.

2023-05-17 Thread via GitHub
liukuijian8040 commented on code in PR #41162: URL: https://github.com/apache/spark/pull/41162#discussion_r1197258991 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala: ## @@ -509,16 +509,25 @@ case class In(value: Expression, list:

[GitHub] [spark] xinrong-meng commented on a diff in pull request #41147: [SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF

2023-05-17 Thread via GitHub
xinrong-meng commented on code in PR #41147: URL: https://github.com/apache/spark/pull/41147#discussion_r1196924307 ## python/pyspark/sql/pandas/serializers.py: ## @@ -317,66 +320,6 @@ def arrow_to_pandas(self, arrow_column): s =

[GitHub] [spark] wzhfy commented on a diff in pull request #41162: [SPARK-43491][SQL] In expression should act as same as EqualTo when elements in IN expression have same DataType.

2023-05-17 Thread via GitHub
wzhfy commented on code in PR #41162: URL: https://github.com/apache/spark/pull/41162#discussion_r1197253055 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala: ## @@ -509,16 +509,25 @@ case class In(value: Expression, list:

[GitHub] [spark] wzhfy commented on pull request #41162: [SPARK-43491][SQL] In expression should act as same as EqualTo when elements in IN expression have same DataType.

2023-05-17 Thread via GitHub
wzhfy commented on PR #41162: URL: https://github.com/apache/spark/pull/41162#issuecomment-1552291841 I also think that the different results between 0 in ('00') and 0 = '00' are confusing, and seems hive already fixed this problem. Could you also take a look? @cloud-fan @MaxGekk --

[GitHub] [spark] itholic opened a new pull request, #41208: [3.4][SPARK-43547][PS][DOCS] Update "Supported Pandas API" page to point out the proper pandas docs

2023-05-17 Thread via GitHub
itholic opened a new pull request, #41208: URL: https://github.com/apache/spark/pull/41208 ### What changes were proposed in this pull request? This PR proposes to fix [Supported pandas

[GitHub] [spark] gerashegalov commented on a diff in pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-05-17 Thread via GitHub
gerashegalov commented on code in PR #41203: URL: https://github.com/apache/spark/pull/41203#discussion_r1197205234 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpectsInputTypes.scala: ## @@ -74,3 +74,44 @@ object ExpectsInputTypes extends

[GitHub] [spark] xinrong-meng commented on a diff in pull request #41147: [SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF

2023-05-17 Thread via GitHub
xinrong-meng commented on code in PR #41147: URL: https://github.com/apache/spark/pull/41147#discussion_r1196924307 ## python/pyspark/sql/pandas/serializers.py: ## @@ -317,66 +320,6 @@ def arrow_to_pandas(self, arrow_column): s =

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41206: [SPARK-43509][PYTHON][CONNECT][FOLLOW-UP] Set SPARK_CONNECT_MODE_ENABLED when running pyspark shell with remote is local

2023-05-17 Thread via GitHub
HyukjinKwon commented on code in PR #41206: URL: https://github.com/apache/spark/pull/41206#discussion_r1197216790 ## python/pyspark/shell.py: ## @@ -100,10 +100,11 @@ % (platform.python_version(), platform.python_build()[0], platform.python_build()[1]) ) if

[GitHub] [spark] Kimahriman commented on pull request #41195: [SPARK-43534][BUILD] Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided

2023-05-17 Thread via GitHub
Kimahriman commented on PR #41195: URL: https://github.com/apache/spark/pull/41195#issuecomment-1552245794 Maybe similar reason I made https://github.com/apache/spark/pull/37694 a while ago? Basically Spark logging setup assumes log4j2, but with hadoop provided you get 1.x from Hadoop. So

[GitHub] [spark] itholic opened a new pull request, #41207: [SPARK-42826][FOLLOWUP][PS][DOCS] Update migration notes for pandas API on Spark.

2023-05-17 Thread via GitHub
itholic opened a new pull request, #41207: URL: https://github.com/apache/spark/pull/41207 ### What changes were proposed in this pull request? This is follow-up for https://github.com/apache/spark/pull/40459 to fix the incorrect information and to elaborate more detailed changes.

[GitHub] [spark] github-actions[bot] closed pull request #38861: [SPARK-41294][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1203 / 1168

2023-05-17 Thread via GitHub
github-actions[bot] closed pull request #38861: [SPARK-41294][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1203 / 1168 URL: https://github.com/apache/spark/pull/38861 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] github-actions[bot] commented on pull request #39515: [SPARK-38743][SQL][TEST] Test the error class: MISSING_STATIC_PARTITION_COLUMN

2023-05-17 Thread via GitHub
github-actions[bot] commented on PR #39515: URL: https://github.com/apache/spark/pull/39515#issuecomment-1552242062 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #38875: [SPARK-40988][SQL][TEST] Test case for insert partition should verify value

2023-05-17 Thread via GitHub
github-actions[bot] commented on PR #38875: URL: https://github.com/apache/spark/pull/38875#issuecomment-1552242081 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41206: [SPARK-43509][PYTHON][CONNECT][FOLLOW-UP] Set SPARK_CONNECT_MODE_ENABLED when running pyspark shell with remote is local

2023-05-17 Thread via GitHub
HyukjinKwon commented on code in PR #41206: URL: https://github.com/apache/spark/pull/41206#discussion_r1197200251 ## python/pyspark/shell.py: ## @@ -100,10 +100,9 @@ % (platform.python_version(), platform.python_build()[0], platform.python_build()[1]) ) if is_remote():

[GitHub] [spark] HyukjinKwon commented on pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-17 Thread via GitHub
HyukjinKwon commented on PR #41013: URL: https://github.com/apache/spark/pull/41013#issuecomment-1552233690 https://github.com/apache/spark/pull/41206 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon opened a new pull request, #41206: [SPARK-43509][PYTHON][CONNECT][FOLLOW-UP] Set SPARK_CONNECT_MODE_ENABLED when running pyspark shell with remote is local

2023-05-17 Thread via GitHub
HyukjinKwon opened a new pull request, #41206: URL: https://github.com/apache/spark/pull/41206 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/41013 that sets `SPARK_CONNECT_MODE_ENABLED` when running PySpark shell

[GitHub] [spark] HyukjinKwon commented on pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-17 Thread via GitHub
HyukjinKwon commented on PR #41013: URL: https://github.com/apache/spark/pull/41013#issuecomment-1552230867 creating a followup now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] turboFei commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
turboFei commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1197147035 ## core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: ## @@ -1618,6 +1618,24 @@ class SparkSubmitSuite conf.get(k) should be (v) } } +

[GitHub] [spark] turboFei commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
turboFei commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1197146855 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -425,7 +428,7 @@ private[spark] class SparkSubmit extends Logging { case

[GitHub] [spark] otterc commented on a diff in pull request #41071: [SPARK-43391][CORE] Idle connection should be kept when closeIdleConnection is disabled

2023-05-17 Thread via GitHub
otterc commented on code in PR #41071: URL: https://github.com/apache/spark/pull/41071#discussion_r1197131313 ## common/network-common/src/main/java/org/apache/spark/network/server/TransportChannelHandler.java: ## @@ -163,14 +163,11 @@ public void

[GitHub] [spark] ueshin commented on pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-17 Thread via GitHub
ueshin commented on PR #41013: URL: https://github.com/apache/spark/pull/41013#issuecomment-1552174756 Hi, `./bin/pyspark --remote local` shows the following error after this commit. ```py % ./bin/pyspark --remote local ... Traceback (most recent call last): File

[GitHub] [spark] warrenzhu25 commented on a diff in pull request #41071: [SPARK-43391][CORE] Idle connection should be kept when closeIdleConnection is disabled

2023-05-17 Thread via GitHub
warrenzhu25 commented on code in PR #41071: URL: https://github.com/apache/spark/pull/41071#discussion_r1197093878 ## common/network-common/src/main/java/org/apache/spark/network/server/TransportChannelHandler.java: ## @@ -163,14 +163,11 @@ public void

[GitHub] [spark] warrenzhu25 commented on pull request #41083: [SPARK-43399][CORE] Add config to control threshold of unregister map ouput when fetch failed

2023-05-17 Thread via GitHub
warrenzhu25 commented on PR #41083: URL: https://github.com/apache/spark/pull/41083#issuecomment-1552141121 > These looks like things which can be handled by appropriate configuration tuning ? The PR itself requires a bit more work if that is not a feasible direction (efficient cleanup,

[GitHub] [spark] robreeves commented on pull request #40812: [SPARK-43157][SQL] Clone InMemoryRelation cached plan to prevent cloned plan from referencing same objects

2023-05-17 Thread via GitHub
robreeves commented on PR #40812: URL: https://github.com/apache/spark/pull/40812#issuecomment-1552124399 > @cloud-fan Cloning the cachedPlan is also problematic because it contains state (accumulators in private fields) when it includes a `CollectMetricsExec` operator.

[GitHub] [spark] dtenedor commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general constant expressions as OPTIONS values in the parser

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1197029651 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3187,7 +3189,37 @@ class AstBuilder extends

[GitHub] [spark] dongjoon-hyun closed pull request #41122: [SPARK-43436][BUILD] Upgrade rocksdbjni to 8.1.1.1

2023-05-17 Thread via GitHub
dongjoon-hyun closed pull request #41122: [SPARK-43436][BUILD] Upgrade rocksdbjni to 8.1.1.1 URL: https://github.com/apache/spark/pull/41122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #41122: [SPARK-43436][BUILD] Upgrade rocksdbjni to 8.1.1.1

2023-05-17 Thread via GitHub
dongjoon-hyun commented on PR #41122: URL: https://github.com/apache/spark/pull/41122#issuecomment-1552036061 Merged to master for Apache Spark 3.5.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dtenedor commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general expressions as OPTIONS values in the parser

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1196991836 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3187,7 +3189,37 @@ class AstBuilder extends

[GitHub] [spark] dtenedor commented on pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-05-17 Thread via GitHub
dtenedor commented on PR #41203: URL: https://github.com/apache/spark/pull/41203#issuecomment-1551964588 The new trait looks good. In the future we can think about reusing it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] srowen commented on pull request #41198: [SPARK-43537][INFA][BUILD] Upgrading the ASM dependencies used in the `tools` module to 9.4

2023-05-17 Thread via GitHub
srowen commented on PR #41198: URL: https://github.com/apache/spark/pull/41198#issuecomment-1551953229 OK yeah it was fine, false alarm. Oops. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] RyanBerti commented on pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-05-17 Thread via GitHub
RyanBerti commented on PR #41203: URL: https://github.com/apache/spark/pull/41203#issuecomment-1551953337 @dtenedor I just pushed a commit that tries to generalize the foldable check, as I'm seeing duplicate code in the datasketches functions as well as others (see

[GitHub] [spark] MaxGekk commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general expressions as OPTIONS values in the parser

2023-05-17 Thread via GitHub
MaxGekk commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1196979589 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3187,7 +3189,37 @@ class AstBuilder extends

[GitHub] [spark] dtenedor commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general expressions as OPTIONS values in the parser

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1196972501 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3187,7 +3187,24 @@ class AstBuilder extends

[GitHub] [spark] xinrong-meng commented on a diff in pull request #41147: [WIP] Standardize nested non-atomic input type support in Pandas UDF

2023-05-17 Thread via GitHub
xinrong-meng commented on code in PR #41147: URL: https://github.com/apache/spark/pull/41147#discussion_r1196924307 ## python/pyspark/sql/pandas/serializers.py: ## @@ -317,66 +320,6 @@ def arrow_to_pandas(self, arrow_column): s =

[GitHub] [spark] ericm-db opened a new pull request, #41205: [WIP] [SC-130782] Define a new error class and apply for the case where streaming query fails due to concurrent run of streaming query with

2023-05-17 Thread via GitHub
ericm-db opened a new pull request, #41205: URL: https://github.com/apache/spark/pull/41205 ### What changes were proposed in this pull request? We are migrating to a new error framework in order to surface errors in a friendlier way to customers. This PR defines a new error

[GitHub] [spark] zhenlineo commented on a diff in pull request #41129: [SPARK-43133] Scala Client DataStreamWriter Foreach support

2023-05-17 Thread via GitHub
zhenlineo commented on code in PR #41129: URL: https://github.com/apache/spark/pull/41129#discussion_r1196867820 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -216,6 +216,7 @@ message WriteStreamOperationStart { message

[GitHub] [spark] gengliangwang commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general expressions as OPTIONS values in the parser

2023-05-17 Thread via GitHub
gengliangwang commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1196885982 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3187,7 +3187,24 @@ class AstBuilder extends

[GitHub] [spark] holdenk commented on pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
holdenk commented on PR #41201: URL: https://github.com/apache/spark/pull/41201#issuecomment-1551827663 +1 looks reasonable module the existing suggestions (clean up the logging + tighten the test). Thanks for making this PR :) -- This is an automated message from the Apache Git

[GitHub] [spark] pan3793 commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
pan3793 commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1196873001 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -425,7 +428,7 @@ private[spark] class SparkSubmit extends Logging { case

[GitHub] [spark] pan3793 commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
pan3793 commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1196873001 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -425,7 +428,7 @@ private[spark] class SparkSubmit extends Logging { case

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41202: [SPARK-43413][SQL][FOLLOWUP] Show a directional message in ListQuery nullability assertion

2023-05-17 Thread via GitHub
dongjoon-hyun commented on code in PR #41202: URL: https://github.com/apache/spark/pull/41202#discussion_r1196870894 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -372,7 +372,8 @@ case class ListQuery( // ListQuery can't be

[GitHub] [spark] dtenedor commented on a diff in pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #41203: URL: https://github.com/apache/spark/pull/41203#discussion_r1196864263 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala: ## @@ -265,6 +288,26 @@ case class HllUnionAgg(

[GitHub] [spark] jchen5 commented on pull request #41202: [SPARK-43413][SQL][FOLLOWUP] Show a directional message in ListQuery nullability assertion

2023-05-17 Thread via GitHub
jchen5 commented on PR #41202: URL: https://github.com/apache/spark/pull/41202#issuecomment-1551806149 Thanks for comments, updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] RyanBerti commented on a diff in pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-05-17 Thread via GitHub
RyanBerti commented on code in PR #41203: URL: https://github.com/apache/spark/pull/41203#discussion_r1196857575 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala: ## @@ -265,6 +288,26 @@ case class HllUnionAgg(

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41202: [SPARK-43413][SQL] Mention flag in assert error message for ListQuery nullable

2023-05-17 Thread via GitHub
dongjoon-hyun commented on code in PR #41202: URL: https://github.com/apache/spark/pull/41202#discussion_r1196850103 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -372,7 +372,8 @@ case class ListQuery( // ListQuery can't be

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41202: [SPARK-43413][SQL] Mention flag in assert error message for ListQuery nullable

2023-05-17 Thread via GitHub
dongjoon-hyun commented on code in PR #41202: URL: https://github.com/apache/spark/pull/41202#discussion_r1196848013 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -372,7 +372,8 @@ case class ListQuery( // ListQuery can't be

[GitHub] [spark] sweisdb commented on pull request #40970: [SPARK-43290][SQL] Adds IV and AAD support to aes_encrypt/aes_decrypt

2023-05-17 Thread via GitHub
sweisdb commented on PR #40970: URL: https://github.com/apache/spark/pull/40970#issuecomment-1551795385 @MaxGekk I am planning to doing the user-facing SQL expression changes in a followup to make each change more simple. I want to land this first. -- This is an automated message from

[GitHub] [spark] rangadi commented on a diff in pull request #41129: [SPARK-43133] Scala Client DataStreamWriter Foreach support

2023-05-17 Thread via GitHub
rangadi commented on code in PR #41129: URL: https://github.com/apache/spark/pull/41129#discussion_r1196819312 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/foreachWriterPacket.scala: ## @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dtenedor commented on a diff in pull request #40996: [SPARK-43313][SQL] Adding missing column DEFAULT values for MERGE INSERT actions

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #40996: URL: https://github.com/apache/spark/pull/40996#discussion_r1196822211 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsCustomSchemaWrite.java: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] dtenedor commented on a diff in pull request #41062: [SPARK-43313][SQL][FOLLOWUP] Improvement for DSv2 API SupportsCustomSchemaWrite

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #41062: URL: https://github.com/apache/spark/pull/41062#discussion_r1196820360 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsCustomSchemaWrite.java: ## @@ -27,12 +28,12 @@ * @since 3.4.1 */ @Evolving -public

[GitHub] [spark] dtenedor commented on a diff in pull request #41062: [SPARK-43313][SQL][FOLLOWUP] Improvement for DSv2 API SupportsCustomSchemaWrite

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #41062: URL: https://github.com/apache/spark/pull/41062#discussion_r1196819921 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsCustomSchemaWrite.java: ## @@ -27,12 +28,12 @@ * @since 3.4.1 */ @Evolving -public

[GitHub] [spark] MaxGekk opened a new pull request, #41204: [WIP][SQL] Fix resolving of `Filter` output

2023-05-17 Thread via GitHub
MaxGekk opened a new pull request, #41204: URL: https://github.com/apache/spark/pull/41204 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] LuciferYang commented on pull request #41198: [SPARK-43537][INFA][BUILD] Upgrading the ASM dependencies used in the `tools` module to 9.4

2023-05-17 Thread via GitHub
LuciferYang commented on PR #41198: URL: https://github.com/apache/spark/pull/41198#issuecomment-1551749676 If there are any issues, please revert and I will resubmit one :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dtenedor commented on a diff in pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #41203: URL: https://github.com/apache/spark/pull/41203#discussion_r1196808900 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala: ## @@ -265,6 +288,26 @@ case class HllUnionAgg(

[GitHub] [spark] dtenedor commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general expressions as OPTIONS values in the parser

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1196805611 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3187,7 +3187,24 @@ class AstBuilder extends

[GitHub] [spark] dtenedor commented on pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-05-17 Thread via GitHub
dtenedor commented on PR #41203: URL: https://github.com/apache/spark/pull/41203#issuecomment-1551745541 @RyanBerti thanks for the update! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #41198: [SPARK-43537][INFA][BUILD] Upgrading the ASM dependencies used in the `tools` module to 9.4

2023-05-17 Thread via GitHub
dongjoon-hyun commented on PR #41198: URL: https://github.com/apache/spark/pull/41198#issuecomment-1551729535 No worry, @srowen ~ I'll monitor together. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] RyanBerti commented on pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-05-17 Thread via GitHub
RyanBerti commented on PR #41203: URL: https://github.com/apache/spark/pull/41203#issuecomment-1551726374 @bersprockets here are the changes to handle non-foldable input args, based on our conversation in https://github.com/apache/spark/pull/40615. cc @dtenedor @mkaravel -- This is an

[GitHub] [spark] srowen commented on pull request #41198: [SPARK-43537][INFA][BUILD] Upgrading the ASM dependencies used in the `tools` module to 9.4

2023-05-17 Thread via GitHub
srowen commented on PR #41198: URL: https://github.com/apache/spark/pull/41198#issuecomment-1551703450 Oh shoot, I was looking at the wrong PR - I'm not sure that tests passed before I merged. Let me watch the result. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] srowen closed pull request #41198: [SPARK-43537][INFA][BUILD] Upgrading the ASM dependencies used in the `tools` module to 9.4

2023-05-17 Thread via GitHub
srowen closed pull request #41198: [SPARK-43537][INFA][BUILD] Upgrading the ASM dependencies used in the `tools` module to 9.4 URL: https://github.com/apache/spark/pull/41198 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] srowen commented on pull request #41195: [SPARK-43534][BUILD] Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided

2023-05-17 Thread via GitHub
srowen commented on PR #41195: URL: https://github.com/apache/spark/pull/41195#issuecomment-1551701267 It seems weird that log4j 2 config works, if you add log4j 1.x. Maybe so, just trying to figure out if this is really what's going on and if we have to let log4j 1.x back in? because then

[GitHub] [spark] jchen5 commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

2023-05-17 Thread via GitHub
jchen5 commented on code in PR #41094: URL: https://github.com/apache/spark/pull/41094#discussion_r1196750077 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4199,6 +4199,16 @@ object SQLConf { .booleanConf

[GitHub] [spark] jchen5 opened a new pull request, #41202: [SPARK-43413][SQL] Mention flag in assert error message for ListQuery nullable

2023-05-17 Thread via GitHub
jchen5 opened a new pull request, #41202: URL: https://github.com/apache/spark/pull/41202 ### What changes were proposed in this pull request? In case the assert for the call to ListQuery.nullable is hit, mention in the assert error message the conf flag that can be used to disable the

[GitHub] [spark] dongjoon-hyun commented on pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
dongjoon-hyun commented on PR #41201: URL: https://github.com/apache/spark/pull/41201#issuecomment-1551615074 cc @pralabhkumar and @holdenk from #37417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
dongjoon-hyun commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1196696629 ## core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: ## @@ -1618,6 +1618,24 @@ class SparkSubmitSuite conf.get(k) should be (v) }

  1   2   >