[PR] [SPARK-46237][SQL][TESTS] Fix test failed of `HiveDDLSuite` [spark]

2023-12-04 Thread via GitHub
LuciferYang opened a new pull request, #44153: URL: https://github.com/apache/spark/pull/44153 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-46237][SQL][TESTS] Fix test failed of `HiveDDLSuite` [spark]

2023-12-04 Thread via GitHub
LuciferYang commented on code in PR #44153: URL: https://github.com/apache/spark/pull/44153#discussion_r1413534980 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -3270,7 +3270,10 @@ class HiveDDLSuite val jarName = "TestUDTF.jar"

Re: [PR] [SPARK-46237][SQL][TESTS] Fix test failed of `HiveDDLSuite` [spark]

2023-12-04 Thread via GitHub
LuciferYang commented on code in PR #44153: URL: https://github.com/apache/spark/pull/44153#discussion_r1413534980 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -3270,7 +3270,10 @@ class HiveDDLSuite val jarName = "TestUDTF.jar"

[PR] [SPARK-46092][SQL][3.5] Don't push down Parquet row group filters that overflow [spark]

2023-12-04 Thread via GitHub
johanl-db opened a new pull request, #44154: URL: https://github.com/apache/spark/pull/44154 This is a cherry-pick from https://github.com/apache/spark/pull/44006 to spark 3.5 ### What changes were proposed in this pull request? This change adds a check for overflows when creating

[PR] [SPARK-46092][SQL][3.4] Don't push down Parquet row group filters that overflow [spark]

2023-12-04 Thread via GitHub
johanl-db opened a new pull request, #44155: URL: https://github.com/apache/spark/pull/44155 This is a cherry-pick from https://github.com/apache/spark/pull/44006 to spark 3.4 ### What changes were proposed in this pull request? This change adds a check for overflows when creating

[PR] [SPARK-46092][SQL][3.3] Don't push down Parquet row group filters that overflow [spark]

2023-12-04 Thread via GitHub
johanl-db opened a new pull request, #44156: URL: https://github.com/apache/spark/pull/44156 This is a cherry-pick from https://github.com/apache/spark/pull/44006 to spark 3.3 ### What changes were proposed in this pull request? This change adds a check for overflows when creating

Re: [PR] [SPARK-46092][SQL] Don't push down Parquet row group filters that overflow [spark]

2023-12-04 Thread via GitHub
johanl-db commented on PR #44006: URL: https://github.com/apache/spark/pull/44006#issuecomment-1838089353 @dongjoon-hyun I created backport PRs for the following branches: - 3.5: https://github.com/apache/spark/pull/44154 -3.4: https://github.com/apache/spark/pull/44155 - 3.3: h

Re: [PR] [SPARK-46209] Add java 11 only yml for version before 3.5 [spark-docker]

2023-12-04 Thread via GitHub
zhengruifeng commented on PR #58: URL: https://github.com/apache/spark-docker/pull/58#issuecomment-1838094399 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-04 Thread via GitHub
dbatomic commented on code in PR #44110: URL: https://github.com/apache/spark/pull/44110#discussion_r1413561777 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala: ## @@ -305,21 +305,28 @@ trait SparkDateTimeUtils { (segment == 0 && digi

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-04 Thread via GitHub
MaxGekk commented on code in PR #44110: URL: https://github.com/apache/spark/pull/44110#discussion_r1413574114 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala: ## @@ -305,21 +305,28 @@ trait SparkDateTimeUtils { (segment == 0 && digit

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-04 Thread via GitHub
MaxGekk commented on code in PR #44110: URL: https://github.com/apache/spark/pull/44110#discussion_r1413582904 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala: ## @@ -305,21 +305,28 @@ trait SparkDateTimeUtils { (segment == 0 && digit

[PR] [WIP][SQL][DOCS] Describe arguments of `decode()` [spark]

2023-12-04 Thread via GitHub
MaxGekk opened a new pull request, #44157: URL: https://github.com/apache/spark/pull/44157 ### What changes were proposed in this pull request? In the PR, I propose to update the description of the `StringDecode` expression and apparently the `decode()` function by describing the argument

[PR] [SPARK-46239][CORE] Hide the version information of Spark Jetty [spark]

2023-12-04 Thread via GitHub
chenyu-opensource opened a new pull request, #44158: URL: https://github.com/apache/spark/pull/44158 **What changes were proposed in this pull request?** The PR sets parameters to hide the version of jetty in spark. **Why are the changes needed?** It can avoid obtaining remote W

Re: [PR] [SPARK-46186][CONNECT] Fix illegal state transition when ExecuteThreadRunner interrupted before started [spark]

2023-12-04 Thread via GitHub
juliuszsompolski commented on PR #44095: URL: https://github.com/apache/spark/pull/44095#issuecomment-1838172064 gentle ping @hvanhovell @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-46075][CONNECT] Improvements to SparkConnectSessionManager [spark]

2023-12-04 Thread via GitHub
juliuszsompolski commented on PR #43985: URL: https://github.com/apache/spark/pull/43985#issuecomment-1838172192 gentle ping @hvanhovell @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] [SPARK-46239][CORE] Hide the version information of Spark Jetty [spark]

2023-12-04 Thread via GitHub
chenyu-opensource commented on PR #44158: URL: https://github.com/apache/spark/pull/44158#issuecomment-1838174465 ![image](https://github.com/apache/spark/assets/119398199/0f995883-268e-4eba-87b1-9603ee6d3995) For example, we need to hide the version of jetty to prevent information l

Re: [PR] [SPARK-46239][CORE] Hide the version information of Spark Jetty [spark]

2023-12-04 Thread via GitHub
chenyu-opensource commented on PR #44158: URL: https://github.com/apache/spark/pull/44158#issuecomment-1838177078 ![image](https://github.com/apache/spark/assets/119398199/b7c0a780-0ad2-4ae7-945e-9d60050b4d6b) After that, we can hide the information. -- This is an automated message fro

Re: [PR] [SPARK-TBD][PYTHON][CONNECT] Forbid Recursive Error handling [spark]

2023-12-04 Thread via GitHub
cdkrot commented on code in PR #44144: URL: https://github.com/apache/spark/pull/44144#discussion_r1413615409 ## python/pyspark/sql/connect/client/core.py: ## @@ -544,6 +544,25 @@ def fromProto(cls, pb: pb2.ConfigResponse) -> "ConfigResult": ) +class ForbidRecursio

Re: [PR] [SPARK-TBD][PYTHON][CONNECT] Forbid Recursive Error handling [spark]

2023-12-04 Thread via GitHub
heyihong commented on code in PR #44144: URL: https://github.com/apache/spark/pull/44144#discussion_r1413618273 ## python/pyspark/sql/connect/client/core.py: ## @@ -544,6 +544,25 @@ def fromProto(cls, pb: pb2.ConfigResponse) -> "ConfigResult": ) +class ForbidRecurs

Re: [PR] [SPARK-TBD][PYTHON][CONNECT] Forbid Recursive Error handling [spark]

2023-12-04 Thread via GitHub
heyihong commented on code in PR #44144: URL: https://github.com/apache/spark/pull/44144#discussion_r1413618273 ## python/pyspark/sql/connect/client/core.py: ## @@ -544,6 +544,25 @@ def fromProto(cls, pb: pb2.ConfigResponse) -> "ConfigResult": ) +class ForbidRecurs

Re: [PR] [SPARK-TBD][PYTHON][CONNECT] Forbid Recursive Error handling [spark]

2023-12-04 Thread via GitHub
heyihong commented on code in PR #44144: URL: https://github.com/apache/spark/pull/44144#discussion_r1413618273 ## python/pyspark/sql/connect/client/core.py: ## @@ -544,6 +544,25 @@ def fromProto(cls, pb: pb2.ConfigResponse) -> "ConfigResult": ) +class ForbidRecurs

Re: [PR] [SPARK-TBD][PYTHON][CONNECT] Forbid Recursive Error handling [spark]

2023-12-04 Thread via GitHub
cdkrot commented on code in PR #44144: URL: https://github.com/apache/spark/pull/44144#discussion_r1413626422 ## python/pyspark/sql/connect/client/core.py: ## @@ -544,6 +544,25 @@ def fromProto(cls, pb: pb2.ConfigResponse) -> "ConfigResult": ) +class ForbidRecursio

Re: [PR] [SPARK-TBD][PYTHON][CONNECT] Forbid Recursive Error handling [spark]

2023-12-04 Thread via GitHub
cdkrot commented on code in PR #44144: URL: https://github.com/apache/spark/pull/44144#discussion_r1413626422 ## python/pyspark/sql/connect/client/core.py: ## @@ -544,6 +544,25 @@ def fromProto(cls, pb: pb2.ConfigResponse) -> "ConfigResult": ) +class ForbidRecursio

Re: [PR] [SPARK-TBD][PYTHON][CONNECT] Forbid Recursive Error handling [spark]

2023-12-04 Thread via GitHub
heyihong commented on code in PR #44144: URL: https://github.com/apache/spark/pull/44144#discussion_r1413627399 ## python/pyspark/sql/connect/client/core.py: ## @@ -544,6 +544,25 @@ def fromProto(cls, pb: pb2.ConfigResponse) -> "ConfigResult": ) +class ForbidRecurs

Re: [PR] [SPARK-TBD][PYTHON][CONNECT] Forbid Recursive Error handling [spark]

2023-12-04 Thread via GitHub
heyihong commented on code in PR #44144: URL: https://github.com/apache/spark/pull/44144#discussion_r1413627399 ## python/pyspark/sql/connect/client/core.py: ## @@ -544,6 +544,25 @@ def fromProto(cls, pb: pb2.ConfigResponse) -> "ConfigResult": ) +class ForbidRecurs

Re: [PR] [SPARK-TBD][PYTHON][CONNECT] Forbid Recursive Error handling [spark]

2023-12-04 Thread via GitHub
cdkrot commented on code in PR #44144: URL: https://github.com/apache/spark/pull/44144#discussion_r1413634624 ## python/pyspark/sql/connect/client/core.py: ## @@ -544,6 +544,25 @@ def fromProto(cls, pb: pb2.ConfigResponse) -> "ConfigResult": ) +class ForbidRecursio

Re: [PR] [SPARK-46186][CONNECT] Fix illegal state transition when ExecuteThreadRunner interrupted before started [spark]

2023-12-04 Thread via GitHub
HyukjinKwon closed pull request #44095: [SPARK-46186][CONNECT] Fix illegal state transition when ExecuteThreadRunner interrupted before started URL: https://github.com/apache/spark/pull/44095 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-46186][CONNECT] Fix illegal state transition when ExecuteThreadRunner interrupted before started [spark]

2023-12-04 Thread via GitHub
HyukjinKwon commented on PR #44095: URL: https://github.com/apache/spark/pull/44095#issuecomment-1838227946 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46241][PYTHON][CONNECT] Forbid Recursive Error handling [spark]

2023-12-04 Thread via GitHub
heyihong commented on code in PR #44144: URL: https://github.com/apache/spark/pull/44144#discussion_r1413650634 ## python/pyspark/sql/connect/client/core.py: ## @@ -544,6 +544,25 @@ def fromProto(cls, pb: pb2.ConfigResponse) -> "ConfigResult": ) +class ForbidRecurs

Re: [PR] [SPARK-46241][PYTHON][CONNECT] Forbid Recursive Error handling [spark]

2023-12-04 Thread via GitHub
heyihong commented on code in PR #44144: URL: https://github.com/apache/spark/pull/44144#discussion_r1413650634 ## python/pyspark/sql/connect/client/core.py: ## @@ -544,6 +544,25 @@ def fromProto(cls, pb: pb2.ConfigResponse) -> "ConfigResult": ) +class ForbidRecurs

Re: [PR] [SPARK-46241][PYTHON][CONNECT] Forbid Recursive Error handling [spark]

2023-12-04 Thread via GitHub
cdkrot commented on code in PR #44144: URL: https://github.com/apache/spark/pull/44144#discussion_r1413653582 ## python/pyspark/sql/connect/client/core.py: ## @@ -544,6 +544,25 @@ def fromProto(cls, pb: pb2.ConfigResponse) -> "ConfigResult": ) +class ForbidRecursio

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-04 Thread via GitHub
dbatomic commented on code in PR #44110: URL: https://github.com/apache/spark/pull/44110#discussion_r1413654244 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala: ## @@ -305,21 +305,28 @@ trait SparkDateTimeUtils { (segment == 0 && digi

Re: [PR] [SPARK-32246][BUILD][INFRA] Enable Kinesis tests in Githuh Actions [spark]

2023-12-04 Thread via GitHub
junyuc25 commented on PR #43736: URL: https://github.com/apache/spark/pull/43736#issuecomment-1838246822 > requires interaction with Amazon Kinesis service which would incur billing costs to users > > > > Note that currently there are totally 57 tests in the Kinesis-asl modul

Re: [PR] [SPARK-46241][PYTHON][CONNECT] Forbid Recursive Error handling [spark]

2023-12-04 Thread via GitHub
cdkrot commented on PR #44144: URL: https://github.com/apache/spark/pull/44144#issuecomment-1838249380 Changed to @heyihong's suggestion to always print a stacktrace if we got one (that makes sense). I checked up, there seems no other recursive problems currently. (Original proposal was ht

Re: [PR] [SPARK-46237][SQL][TESTS] Make `HiveDDLSuite` independently testable [spark]

2023-12-04 Thread via GitHub
HyukjinKwon commented on code in PR #44153: URL: https://github.com/apache/spark/pull/44153#discussion_r1413660735 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -3270,7 +3270,10 @@ class HiveDDLSuite val jarName = "TestUDTF.jar"

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-04 Thread via GitHub
dbatomic commented on PR #44110: URL: https://github.com/apache/spark/pull/44110#issuecomment-1838267647 > > The change also includes a small unit benchmark for this particular case. > > I wonder of other benchmarks. Do you observe perf regressions? I am asking just in case. I

Re: [PR] [SPARK-46234][PYTHON] Introduce `PySparkKeyError` for PySpark error framework [spark]

2023-12-04 Thread via GitHub
zhengruifeng closed pull request #44151: [SPARK-46234][PYTHON] Introduce `PySparkKeyError` for PySpark error framework URL: https://github.com/apache/spark/pull/44151 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-46234][PYTHON] Introduce `PySparkKeyError` for PySpark error framework [spark]

2023-12-04 Thread via GitHub
zhengruifeng commented on PR #44151: URL: https://github.com/apache/spark/pull/44151#issuecomment-1838270153 LGTM, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-46237][SQL][TESTS] Make `HiveDDLSuite` independently testable [spark]

2023-12-04 Thread via GitHub
LuciferYang commented on code in PR #44153: URL: https://github.com/apache/spark/pull/44153#discussion_r1413692187 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala: ## @@ -3270,7 +3270,10 @@ class HiveDDLSuite val jarName = "TestUDTF.jar"

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-04 Thread via GitHub
MaxGekk commented on PR #44110: URL: https://github.com/apache/spark/pull/44110#issuecomment-1838402901 > Looking for feedback if we want to keep the benchmark, given that this is a rather esoteric edge case. I am ok to exclude the benchmark for the pretty specific case from the PR.

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-04 Thread via GitHub
dbatomic commented on PR #44110: URL: https://github.com/apache/spark/pull/44110#issuecomment-1838423819 > > Looking for feedback if we want to keep the benchmark, given that this is a rather esoteric edge case. > > I am ok to exclude the benchmark for the pretty specific case from th

[PR] [SPARK-46244][SQL] INSERT/UPDATE * in MERGE should follow the same semantic of INSERT BY NAME [spark]

2023-12-04 Thread via GitHub
cloud-fan opened a new pull request, #44159: URL: https://github.com/apache/spark/pull/44159 ### What changes were proposed in this pull request? This is to fix the MERGE INSERT/UPDATE * behavior to make it easier to use, and also to make it consistent with the INSERT BY NAME

Re: [PR] [SPARK-46244][SQL] INSERT/UPDATE * in MERGE should follow the same semantic of INSERT BY NAME [spark]

2023-12-04 Thread via GitHub
cloud-fan commented on PR #44159: URL: https://github.com/apache/spark/pull/44159#issuecomment-1838428215 cc @viirya @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-04 Thread via GitHub
beliefer commented on code in PR #44110: URL: https://github.com/apache/spark/pull/44110#discussion_r1413737755 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala: ## @@ -305,21 +305,32 @@ trait SparkDateTimeUtils { (segment == 0 && digi

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-04 Thread via GitHub
cloud-fan commented on code in PR #44110: URL: https://github.com/apache/spark/pull/44110#discussion_r1413744707 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala: ## @@ -305,21 +305,32 @@ trait SparkDateTimeUtils { (segment == 0 && dig

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-04 Thread via GitHub
cloud-fan commented on code in PR #44110: URL: https://github.com/apache/spark/pull/44110#discussion_r1413746879 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala: ## @@ -305,21 +305,32 @@ trait SparkDateTimeUtils { (segment == 0 && dig

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-04 Thread via GitHub
cloud-fan commented on code in PR #44110: URL: https://github.com/apache/spark/pull/44110#discussion_r1413748213 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala: ## @@ -305,21 +305,32 @@ trait SparkDateTimeUtils { (segment == 0 && dig

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-04 Thread via GitHub
cloud-fan commented on code in PR #44110: URL: https://github.com/apache/spark/pull/44110#discussion_r1413750468 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala: ## @@ -305,21 +305,32 @@ trait SparkDateTimeUtils { (segment == 0 && dig

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

2023-12-04 Thread via GitHub
cloud-fan commented on code in PR #43910: URL: https://github.com/apache/spark/pull/43910#discussion_r1413751001 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -1860,6 +1860,11 @@ "message" : [ "WITHIN GROUP is required for inverse distrib

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

2023-12-04 Thread via GitHub
beliefer commented on code in PR #43910: URL: https://github.com/apache/spark/pull/43910#discussion_r1413757547 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -1860,6 +1860,11 @@ "message" : [ "WITHIN GROUP is required for inverse distribu

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-04 Thread via GitHub
beliefer commented on code in PR #44110: URL: https://github.com/apache/spark/pull/44110#discussion_r1413760894 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala: ## @@ -305,21 +305,32 @@ trait SparkDateTimeUtils { (segment == 0 && digi

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-04 Thread via GitHub
dbatomic commented on code in PR #44110: URL: https://github.com/apache/spark/pull/44110#discussion_r1413761230 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala: ## @@ -305,21 +305,32 @@ trait SparkDateTimeUtils { (segment == 0 && digi

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-04 Thread via GitHub
beliefer commented on code in PR #44110: URL: https://github.com/apache/spark/pull/44110#discussion_r1413764128 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala: ## @@ -305,21 +305,32 @@ trait SparkDateTimeUtils { (segment == 0 && digi

Re: [PR] [SPARK-46243][SQL][DOCS] Describe arguments of `decode()` [spark]

2023-12-04 Thread via GitHub
beliefer commented on code in PR #44157: URL: https://github.com/apache/spark/pull/44157#discussion_r1413780878 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2594,6 +2594,11 @@ object Decode { the corresponding re

Re: [PR] [SPARK-46243][SQL][DOCS] Describe arguments of `decode()` [spark]

2023-12-04 Thread via GitHub
MaxGekk commented on code in PR #44157: URL: https://github.com/apache/spark/pull/44157#discussion_r1413790776 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2594,6 +2594,11 @@ object Decode { the corresponding res

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2023-12-04 Thread via GitHub
Ngone51 commented on PR #43954: URL: https://github.com/apache/spark/pull/43954#issuecomment-1838517185 @mridulm Thanks for the detailed comment. > Additional call to `suspend` for existing `killAllTaskAttempts` Note that we always call `markStageAsFinished` after the call to `

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

2023-12-04 Thread via GitHub
beliefer commented on PR #43910: URL: https://github.com/apache/spark/pull/43910#issuecomment-1838565793 The GA failure is unrelated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46243][SQL][DOCS] Describe arguments of `decode()` [spark]

2023-12-04 Thread via GitHub
beliefer commented on code in PR #44157: URL: https://github.com/apache/spark/pull/44157#discussion_r1413832030 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2594,6 +2594,11 @@ object Decode { the corresponding re

[PR] [SPARK-46245][CORE][SQL][YARN][K8S] Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter` [spark]

2023-12-04 Thread via GitHub
LuciferYang opened a new pull request, #44160: URL: https://github.com/apache/spark/pull/44160 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-46009][SQL][CONNECT] Merge the parse rule of PercentileCont and PercentileDisc into functionCall [spark]

2023-12-04 Thread via GitHub
cloud-fan commented on code in PR #43910: URL: https://github.com/apache/spark/pull/43910#discussion_r1413839440 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -1860,6 +1860,11 @@ "message" : [ "WITHIN GROUP is required for inverse distrib

Re: [PR] [SPARK-46245][CORE][SQL][SS][YARN][K8S][UI] Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter` [spark]

2023-12-04 Thread via GitHub
LuciferYang commented on code in PR #44160: URL: https://github.com/apache/spark/pull/44160#discussion_r1413845939 ## core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala: ## @@ -254,8 +254,8 @@ private[spark] class StandaloneSchedulerBackend(

Re: [PR] EXECUTE IMMEDIATE SQL support [spark]

2023-12-04 Thread via GitHub
milastdbx commented on code in PR #44093: URL: https://github.com/apache/spark/pull/44093#discussion_r1413975389 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3909,6 +3943,11 @@ class AstBuilder extends DataTypeAstBuilder with SQLC

Re: [PR] [SPARK-46244][SQL] INSERT/UPDATE * in MERGE should follow the same semantic of INSERT BY NAME [spark]

2023-12-04 Thread via GitHub
cloud-fan closed pull request #44159: [SPARK-46244][SQL] INSERT/UPDATE * in MERGE should follow the same semantic of INSERT BY NAME URL: https://github.com/apache/spark/pull/44159 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] [SPARK-46244][SQL] INSERT/UPDATE * in MERGE should follow the same semantic of INSERT BY NAME [spark]

2023-12-04 Thread via GitHub
cloud-fan commented on PR #44159: URL: https://github.com/apache/spark/pull/44159#issuecomment-1838845564 After a second though, this means we can always do UPDATE/INSERT * with completely different column set in target and source tables, such as `a, b, c` in target and `x, y, z` in source.

Re: [PR] [SPARK-46137] update janino to version 3.1.11 [spark]

2023-12-04 Thread via GitHub
igreenfield commented on PR #44053: URL: https://github.com/apache/spark/pull/44053#issuecomment-1838869496 @LuciferYang I ran the benchmark action 3 times on master and it never finished... how I can do compare if even the master can't finish it? -- This is an automated message from the

Re: [PR] [SPARK-32246][BUILD][INFRA] Enable Kinesis tests in Githuh Actions [spark]

2023-12-04 Thread via GitHub
junyuc25 commented on PR #43736: URL: https://github.com/apache/spark/pull/43736#issuecomment-1838922134 > The code looks working. I manually verified `KinesisCheckpointerSuite` and other tests from the CI logs of this PR. > > ``` > 2023-11-20T06:29:03.8869208Z �[0m[�[0m�[0minfo�[0

Re: [PR] [SPARK-45701][SPARK-45684][SPARK-45692][CORE][SQL][SS][ML][K8S] Clean up the deprecated API usage related to `mutable.SetOps/c.SeqOps/Iterator/Iterable/IterableOps` [spark]

2023-12-04 Thread via GitHub
cloud-fan commented on code in PR #43575: URL: https://github.com/apache/spark/pull/43575#discussion_r1414163363 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -174,7 +174,7 @@ class StreamSuite extends StreamTest { try { q

Re: [PR] [SPARK-45701][SPARK-45684][SPARK-45692][CORE][SQL][SS][ML][K8S] Clean up the deprecated API usage related to `mutable.SetOps/c.SeqOps/Iterator/Iterable/IterableOps` [spark]

2023-12-04 Thread via GitHub
LuciferYang commented on code in PR #43575: URL: https://github.com/apache/spark/pull/43575#discussion_r1414168092 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -174,7 +174,7 @@ class StreamSuite extends StreamTest { try {

Re: [PR] [SPARK-45701][SPARK-45684][SPARK-45692][CORE][SQL][SS][ML][K8S] Clean up the deprecated API usage related to `mutable.SetOps/c.SeqOps/Iterator/Iterable/IterableOps` [spark]

2023-12-04 Thread via GitHub
LuciferYang commented on code in PR #43575: URL: https://github.com/apache/spark/pull/43575#discussion_r1414168092 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -174,7 +174,7 @@ class StreamSuite extends StreamTest { try {

Re: [PR] [SPARK-45701][SPARK-45684][SPARK-45692][CORE][SQL][SS][ML][K8S] Clean up the deprecated API usage related to `mutable.SetOps/c.SeqOps/Iterator/Iterable/IterableOps` [spark]

2023-12-04 Thread via GitHub
LuciferYang commented on code in PR #43575: URL: https://github.com/apache/spark/pull/43575#discussion_r1414176075 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -174,7 +174,7 @@ class StreamSuite extends StreamTest { try {

Re: [PR] [SPARK-46237][SQL][TESTS] Make `HiveDDLSuite` independently testable [spark]

2023-12-04 Thread via GitHub
dongjoon-hyun closed pull request #44153: [SPARK-46237][SQL][TESTS] Make `HiveDDLSuite` independently testable URL: https://github.com/apache/spark/pull/44153 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-46237][SQL][TESTS] Make `HiveDDLSuite` independently testable [spark]

2023-12-04 Thread via GitHub
dongjoon-hyun commented on PR #44153: URL: https://github.com/apache/spark/pull/44153#issuecomment-1839065283 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] [SPARK-46237][SQL][TESTS] Make `HiveDDLSuite` independently testable [spark]

2023-12-04 Thread via GitHub
LuciferYang commented on PR #44153: URL: https://github.com/apache/spark/pull/44153#issuecomment-1839065769 Thanks @dongjoon-hyun @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46231][PYTHON] Migrate all remaining `NotImplementedError` & `TypeError` into PySpark error framework [spark]

2023-12-04 Thread via GitHub
dongjoon-hyun closed pull request #44148: [SPARK-46231][PYTHON] Migrate all remaining `NotImplementedError` & `TypeError` into PySpark error framework URL: https://github.com/apache/spark/pull/44148 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-46092][SQL][3.5] Don't push down Parquet row group filters that overflow [spark]

2023-12-04 Thread via GitHub
dongjoon-hyun closed pull request #44154: [SPARK-46092][SQL][3.5] Don't push down Parquet row group filters that overflow URL: https://github.com/apache/spark/pull/44154 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-46092][SQL][3.5] Don't push down Parquet row group filters that overflow [spark]

2023-12-04 Thread via GitHub
dongjoon-hyun commented on PR #44154: URL: https://github.com/apache/spark/pull/44154#issuecomment-1839076127 Merged to branch-3.5 for Apache Spark 3.5.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-46092][SQL][3.4] Don't push down Parquet row group filters that overflow [spark]

2023-12-04 Thread via GitHub
dongjoon-hyun commented on PR #44155: URL: https://github.com/apache/spark/pull/44155#issuecomment-1839079060 Merged to branch-3.4 for Apache Spark 3.4.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-46092][SQL][3.4] Don't push down Parquet row group filters that overflow [spark]

2023-12-04 Thread via GitHub
dongjoon-hyun closed pull request #44155: [SPARK-46092][SQL][3.4] Don't push down Parquet row group filters that overflow URL: https://github.com/apache/spark/pull/44155 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-46245][CORE][SQL][SS][YARN][K8S][UI] Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter` [spark]

2023-12-04 Thread via GitHub
LuciferYang commented on PR #44160: URL: https://github.com/apache/spark/pull/44160#issuecomment-1839107996 > Are we going to remove all .view.? Yes, for ·MapOps.view·, this last pr, the reason for making this pr can be referred to https://github.com/apache/spark/pull/43445#discussion

Re: [PR] [SPARK-46186][CONNECT] Fix illegal state transition when ExecuteThreadRunner interrupted before started [spark]

2023-12-04 Thread via GitHub
juliuszsompolski commented on PR #44095: URL: https://github.com/apache/spark/pull/44095#issuecomment-1839107831 I am aware of some flakiness in the added tests, I will followup tomorrow to make it stable. -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] [SPARK-46243][SQL][DOCS] Describe arguments of `decode()` [spark]

2023-12-04 Thread via GitHub
cloud-fan commented on code in PR #44157: URL: https://github.com/apache/spark/pull/44157#discussion_r1414251756 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2594,6 +2594,11 @@ object Decode { the corresponding r

Re: [PR] [SPARK-45701][SPARK-45684][SPARK-45692][CORE][SQL][SS][ML][K8S] Clean up the deprecated API usage related to `mutable.SetOps/c.SeqOps/Iterator/Iterable/IterableOps` [spark]

2023-12-04 Thread via GitHub
cloud-fan commented on code in PR #43575: URL: https://github.com/apache/spark/pull/43575#discussion_r1414252458 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -174,7 +174,7 @@ class StreamSuite extends StreamTest { try { q

Re: [PR] [SPARK-46245][CORE][SQL][SS][YARN][K8S][UI] Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter` [spark]

2023-12-04 Thread via GitHub
cloud-fan commented on code in PR #44160: URL: https://github.com/apache/spark/pull/44160#discussion_r1414253928 ## core/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala: ## @@ -254,8 +254,8 @@ private[spark] class StandaloneSchedulerBackend(

Re: [PR] [SPARK-45701][SPARK-45684][SPARK-45692][CORE][SQL][SS][ML][K8S] Clean up the deprecated API usage related to `mutable.SetOps/c.SeqOps/Iterator/Iterable/IterableOps` [spark]

2023-12-04 Thread via GitHub
LuciferYang commented on code in PR #43575: URL: https://github.com/apache/spark/pull/43575#discussion_r1414253765 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -174,7 +174,7 @@ class StreamSuite extends StreamTest { try {

Re: [PR] [SPARK-46173][SQL] Skipping trimAll call during date parsing [spark]

2023-12-04 Thread via GitHub
cloud-fan commented on code in PR #44110: URL: https://github.com/apache/spark/pull/44110#discussion_r1414260156 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala: ## @@ -305,21 +305,32 @@ trait SparkDateTimeUtils { (segment == 0 && dig

Re: [PR] [SPARK-46246] EXECUTE IMMEDIATE SQL support [spark]

2023-12-04 Thread via GitHub
milastdbx commented on code in PR #44093: URL: https://github.com/apache/spark/pull/44093#discussion_r1414266503 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.g4: ## @@ -217,6 +217,7 @@ HOURS: 'HOURS'; IDENTIFIER_KW: 'IDENTIFIER'; IF: 'IF'; IGNO

[PR] [SPARK-45684][SQL][TESTS][FOLLOWUP] Use `++` instead of `s.c.SeqOps#concat` [spark]

2023-12-04 Thread via GitHub
LuciferYang opened a new pull request, #44161: URL: https://github.com/apache/spark/pull/44161 ### What changes were proposed in this pull request? This pr use `++` instead of `s.c.SeqOps#concat` to address comments: https://github.com/apache/spark/pull/43575#discussion_r1414163363

Re: [PR] [SPARK-45684][SQL][SS][TESTS][FOLLOWUP] Use `++` instead of `s.c.SeqOps#concat` [spark]

2023-12-04 Thread via GitHub
LuciferYang commented on code in PR #44161: URL: https://github.com/apache/spark/pull/44161#discussion_r1414272493 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -174,7 +174,7 @@ class StreamSuite extends StreamTest { try {

[PR] [SPARK-46225][CONNECT] Collapse withColumns calls [spark]

2023-12-04 Thread via GitHub
hvanhovell opened a new pull request, #44162: URL: https://github.com/apache/spark/pull/44162 ### What changes were proposed in this pull request? This PR makes the following two changes: - The `withColumns` message is now stackable. This means that stacked `withColumns` calls can now

Re: [PR] [SPARK-46225][CONNECT] Collapse withColumns calls [spark]

2023-12-04 Thread via GitHub
hvanhovell commented on code in PR #44162: URL: https://github.com/apache/spark/pull/44162#discussion_r1414292523 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -800,6 +804,9 @@ message WithColumns { // // An exception is thrown when dup

Re: [PR] [SPARK-46225][CONNECT] Collapse withColumns calls [spark]

2023-12-04 Thread via GitHub
hvanhovell commented on PR #44162: URL: https://github.com/apache/spark/pull/44162#issuecomment-1839188769 For the reviewers, I still need to add this to the python client. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-39800][SQL][WIP] DataSourceV2: View Support [spark]

2023-12-04 Thread via GitHub
jzhuge commented on PR #39796: URL: https://github.com/apache/spark/pull/39796#issuecomment-1839214419 Current status: - Incorporate all @MaxGekk comments (took care of some) - Add unit tests Follow up: - Support user specified column names - Support viewSQLConfigs - Con

Re: [PR] [SPARK-46245][CORE][SQL][SS][YARN][K8S][UI] Replcace `s.c.MapOps.view.filterKeys` with `s.c.MapOps.filter` [spark]

2023-12-04 Thread via GitHub
LuciferYang commented on PR #44160: URL: https://github.com/apache/spark/pull/44160#issuecomment-1839216786 Some flaky test failed, I will re-trigger them later -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-46094] Add support for code profiling executors [spark]

2023-12-04 Thread via GitHub
parthchandra commented on code in PR #44021: URL: https://github.com/apache/spark/pull/44021#discussion_r1410100843 ## connector/profiler/README.md: ## @@ -0,0 +1,86 @@ +# Spark Code Profiler Plugin + +## Build + +To build +``` + ./build/mvn clean package -P code-profiler +``

Re: [PR] [SPARK-46094] Add support for code profiling executors [spark]

2023-12-04 Thread via GitHub
parthchandra commented on PR #44021: URL: https://github.com/apache/spark/pull/44021#issuecomment-1839224900 @dongjoon-hyun addressed your comments. (Sorry it took a while to address the changes) @HyukjinKwon, I've added an explicit usage example to the README -- This is an automat

Re: [PR] [SPARK-46094] Add support for code profiling executors [spark]

2023-12-04 Thread via GitHub
dongjoon-hyun commented on PR #44021: URL: https://github.com/apache/spark/pull/44021#issuecomment-1839226493 Thank you for updates, @parthchandra . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-46040][SQL][Python] Update UDTF API for 'analyze' partitioning/ordering columns to support general expressions [spark]

2023-12-04 Thread via GitHub
dtenedor commented on code in PR #43946: URL: https://github.com/apache/spark/pull/43946#discussion_r1414329642 ## sql/core/src/test/resources/sql-tests/results/udtf/udtf.sql.out: ## @@ -335,6 +335,37 @@ org.apache.spark.sql.AnalysisException } +-- !query +SELECT * FROM UDT

[PR] [SPARK-46248]XML: Support for ignoreCorruptFiles and ignoreMissingFiles options [spark]

2023-12-04 Thread via GitHub
shujingyang-db opened a new pull request, #44163: URL: https://github.com/apache/spark/pull/44163 ### What changes were proposed in this pull request? This PR corrects the handling of corrupt or missing multiline XML files by respecting user-specific options. ### Why ar

Re: [PR] [SPARK-46229][PYTHON][CONNECT] Add applyInArrow to groupBy and cogroup in Spark Connect [spark]

2023-12-04 Thread via GitHub
ueshin commented on code in PR #44146: URL: https://github.com/apache/spark/pull/44146#discussion_r1414336068 ## python/pyspark/sql/connect/_typing.py: ## @@ -14,14 +14,14 @@ # See the License for the specific language governing permissions and # limitations under the License.

[PR] [SPARK-45940][FOLLOWUP][TESTS] Only test Python data source when Python and PySpark environments are available [spark]

2023-12-04 Thread via GitHub
allisonwang-db opened a new pull request, #44164: URL: https://github.com/apache/spark/pull/44164 ### What changes were proposed in this pull request? This is a test-only follow-up PR for https://github.com/apache/spark/pull/44085 to make Python data source tests depend on th

  1   2   3   >