[GitHub] [spark] Yikun commented on a diff in pull request #38789: [SPARK-41253][K8S][TESTS] Make Spark K8S volcano IT work in Github Action

2022-11-30 Thread GitBox
Yikun commented on code in PR #38789: URL: https://github.com/apache/spark/pull/38789#discussion_r1036787447 ## resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/VolcanoTestsSuite.scala: ## @@ -435,10 +464,11 @@

[GitHub] [spark] martin-g commented on a diff in pull request #38789: [SPARK-41253][K8S][TESTS] Make Spark K8S volcano IT work in Github Action

2022-11-30 Thread GitBox
martin-g commented on code in PR #38789: URL: https://github.com/apache/spark/pull/38789#discussion_r1036763619 ## resource-managers/kubernetes/integration-tests/README.md: ## @@ -283,6 +283,14 @@ to the wrapper scripts and using the wrapper scripts will simply set these appro

[GitHub] [spark] Yikun commented on pull request #38789: [SPARK-41253][K8S][TESTS] Make Spark K8S volcano IT work in Github Action

2022-11-30 Thread GitBox
Yikun commented on PR #38789: URL: https://github.com/apache/spark/pull/38789#issuecomment-113201 @holdenk Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on pull request #38860: [SPARK-41348][SQL][TESTS] Refactor `UnsafeArrayWriterSuite` to check error class

2022-11-30 Thread GitBox
LuciferYang commented on PR #38860: URL: https://github.com/apache/spark/pull/38860#issuecomment-105337 cc @MaxGekk a minor refactor for existing test to test `TOO_MANY_ARRAY_ELEMENTS ` -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] LuciferYang opened a new pull request, #38860: [SPARK-41348][SQL][TESTS] Refactor `UnsafeArrayWriterSuite` to check error class

2022-11-30 Thread GitBox
LuciferYang opened a new pull request, #38860: URL: https://github.com/apache/spark/pull/38860 ### What changes were proposed in this pull request? This pr aims to refactor `UnsafeArrayWriterSuite` to check `TOO_MANY_ARRAY_ELEMENTS`. ### Why are the changes needed? Test

[GitHub] [spark] yabola commented on pull request #38560: [SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-30 Thread GitBox
yabola commented on PR #38560: URL: https://github.com/apache/spark/pull/38560#issuecomment-102610 @mridulm If you are back and have time, please review my PR, I think the function is almost done. let me know if there is something inappropriate, I will modify it soon, thanks! --

[GitHub] [spark] HeartSaVioR commented on pull request #38853: [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing

2022-11-30 Thread GitBox
HeartSaVioR commented on PR #38853: URL: https://github.com/apache/spark/pull/38853#issuecomment-1333291415 (@Kimahriman Looks like your git is not correctly set in your dev. I manually changed the author of commit based on your mail address in dev@ mailing list.) -- This is an

[GitHub] [spark] HeartSaVioR closed pull request #38853: [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing

2022-11-30 Thread GitBox
HeartSaVioR closed pull request #38853: [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing URL: https://github.com/apache/spark/pull/38853 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HeartSaVioR commented on pull request #38853: [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing

2022-11-30 Thread GitBox
HeartSaVioR commented on PR #38853: URL: https://github.com/apache/spark/pull/38853#issuecomment-1333287478 Thanks! Merging to master/3.3. ([SPARK-38277](https://issues.apache.org/jira/browse/SPARK-38277) can/should be handled separately.) -- This is an automated message from the Apache

[GitHub] [spark] cloud-fan closed pull request #38851: [SPARK-41338][SQL] Resolve outer references and normal columns in the same analyzer batch

2022-11-30 Thread GitBox
cloud-fan closed pull request #38851: [SPARK-41338][SQL] Resolve outer references and normal columns in the same analyzer batch URL: https://github.com/apache/spark/pull/38851 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on pull request #38851: [SPARK-41338][SQL] Resolve outer references and normal columns in the same analyzer batch

2022-11-30 Thread GitBox
cloud-fan commented on PR #38851: URL: https://github.com/apache/spark/pull/38851#issuecomment-1333269909 thanks for review, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] MaxGekk closed pull request #38769: [SPARK-41228][SQL] Rename & Improve error message for `COLUMN_NOT_IN_GROUP_BY_CLAUSE`.

2022-11-30 Thread GitBox
MaxGekk closed pull request #38769: [SPARK-41228][SQL] Rename & Improve error message for `COLUMN_NOT_IN_GROUP_BY_CLAUSE`. URL: https://github.com/apache/spark/pull/38769 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] MaxGekk commented on pull request #38769: [SPARK-41228][SQL] Rename & Improve error message for `COLUMN_NOT_IN_GROUP_BY_CLAUSE`.

2022-11-30 Thread GitBox
MaxGekk commented on PR #38769: URL: https://github.com/apache/spark/pull/38769#issuecomment-1333268350 +1, LGTM. Merging to master. Thank you, @itholic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] anchovYu commented on a diff in pull request #38851: [SPARK-41338][SQL] Resolve outer references and normal columns in the same analyzer batch

2022-11-30 Thread GitBox
anchovYu commented on code in PR #38851: URL: https://github.com/apache/spark/pull/38851#discussion_r1036726329 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2109,6 +2110,39 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] MaxGekk closed pull request #38772: [SPARK-41237][SQL] Reuse the error class `UNSUPPORTED_DATATYPE` for `_LEGACY_ERROR_TEMP_0030`

2022-11-30 Thread GitBox
MaxGekk closed pull request #38772: [SPARK-41237][SQL] Reuse the error class `UNSUPPORTED_DATATYPE` for `_LEGACY_ERROR_TEMP_0030` URL: https://github.com/apache/spark/pull/38772 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] MaxGekk commented on pull request #38772: [SPARK-41237][SQL] Assign a name to the error class `_LEGACY_ERROR_TEMP_0030`

2022-11-30 Thread GitBox
MaxGekk commented on PR #38772: URL: https://github.com/apache/spark/pull/38772#issuecomment-1333262758 +1, LGTM. Merging to master. Thank you, @itholic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] itholic commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-11-30 Thread GitBox
itholic commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1036705541 ## sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala: ## @@ -964,14 +964,16 @@ class SubquerySuite extends QueryTest | WHERE

[GitHub] [spark] itholic commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-11-30 Thread GitBox
itholic commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1036705541 ## sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala: ## @@ -964,14 +964,16 @@ class SubquerySuite extends QueryTest | WHERE

[GitHub] [spark] cloud-fan closed pull request #38854: [SPARK-41343][CONNECT] Move FunctionName parsing to server side

2022-11-30 Thread GitBox
cloud-fan closed pull request #38854: [SPARK-41343][CONNECT] Move FunctionName parsing to server side URL: https://github.com/apache/spark/pull/38854 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on pull request #38854: [SPARK-41343][CONNECT] Move FunctionName parsing to server side

2022-11-30 Thread GitBox
cloud-fan commented on PR #38854: URL: https://github.com/apache/spark/pull/38854#issuecomment-1333212962 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] shrprasa commented on a diff in pull request #37880: [SPARK-39399] [CORE] [K8S]: Fix proxy-user authentication for Spark on k8s in cluster deploy mode

2022-11-30 Thread GitBox
shrprasa commented on code in PR #37880: URL: https://github.com/apache/spark/pull/37880#discussion_r1036703611 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -157,24 +157,32 @@ private[spark] class SparkSubmit extends Logging { def doRunMain():

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

2022-11-30 Thread GitBox
zhengruifeng commented on code in PR #38858: URL: https://github.com/apache/spark/pull/38858#discussion_r1036702656 ## python/pyspark/sql/connect/functions.py: ## @@ -16,14 +16,247 @@ # from pyspark.sql.connect.column import Column, LiteralExpression, ColumnReference -from

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

2022-11-30 Thread GitBox
zhengruifeng commented on code in PR #38858: URL: https://github.com/apache/spark/pull/38858#discussion_r1036701948 ## python/pyspark/sql/connect/column.py: ## @@ -84,7 +85,7 @@ def __init__(self) -> None: def to_plan(self, session: "SparkConnectClient") ->

[GitHub] [spark] amaliujia commented on a diff in pull request #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

2022-11-30 Thread GitBox
amaliujia commented on code in PR #38858: URL: https://github.com/apache/spark/pull/38858#discussion_r1036698430 ## python/pyspark/sql/connect/functions.py: ## @@ -16,14 +16,247 @@ # from pyspark.sql.connect.column import Column, LiteralExpression, ColumnReference -from

[GitHub] [spark] amaliujia commented on a diff in pull request #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

2022-11-30 Thread GitBox
amaliujia commented on code in PR #38858: URL: https://github.com/apache/spark/pull/38858#discussion_r1036698050 ## python/pyspark/sql/connect/column.py: ## @@ -701,11 +688,23 @@ def to_plan(self, session: "SparkConnectClient") -> proto.Expression: def alias(self, *alias:

[GitHub] [spark] amaliujia commented on pull request #38859: [SPARK-41347][CONNECT] Add Cast to Expression proto

2022-11-30 Thread GitBox
amaliujia commented on PR #38859: URL: https://github.com/apache/spark/pull/38859#issuecomment-1333184749 cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] amaliujia opened a new pull request, #38859: [SPARK-41347][CONNECT] Add Cast to Expression proto

2022-11-30 Thread GitBox
amaliujia opened a new pull request, #38859: URL: https://github.com/apache/spark/pull/38859 ### What changes were proposed in this pull request? We need a dedicated Cast in proto because Cast takes a DataType (which is not a Expression in current proto design). This

[GitHub] [spark] zhengruifeng commented on pull request #38778: [SPARK-41227][CONNECT][PYTHON] Implement DataFrame cross join

2022-11-30 Thread GitBox
zhengruifeng commented on PR #38778: URL: https://github.com/apache/spark/pull/38778#issuecomment-1333181249 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #38778: [SPARK-41227][CONNECT][PYTHON] Implement DataFrame cross join

2022-11-30 Thread GitBox
zhengruifeng closed pull request #38778: [SPARK-41227][CONNECT][PYTHON] Implement DataFrame cross join URL: https://github.com/apache/spark/pull/38778 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] itholic commented on a diff in pull request #38576: [SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORRELATED_REFERENCE`

2022-11-30 Thread GitBox
itholic commented on code in PR #38576: URL: https://github.com/apache/spark/pull/38576#discussion_r1036693778 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveSubquerySuite.scala: ## @@ -56,7 +56,8 @@ class ResolveSubquerySuite extends AnalysisTest

[GitHub] [spark] zhengruifeng opened a new pull request, #38858: [SPARK-41346][CONNECT][PYTHON] Implement `asc` and `desc` functions

2022-11-30 Thread GitBox
zhengruifeng opened a new pull request, #38858: URL: https://github.com/apache/spark/pull/38858 ### What changes were proposed in this pull request? 1, remove `asc` and `desc` from `Expression`; 2, add `asc_nulls_first`, `asc_nulls_last`, `desc_nulls_first`, `desc_nulls_last` `in

[GitHub] [spark] HeartSaVioR commented on pull request #38853: [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing

2022-11-30 Thread GitBox
HeartSaVioR commented on PR #38853: URL: https://github.com/apache/spark/pull/38853#issuecomment-1333166532 Just to give you more context for your previous comment (https://github.com/apache/spark/pull/38853#issuecomment-1333073885)... We have two different set of code path, 1) two

[GitHub] [spark] cloud-fan commented on pull request #38765: [SPARK-35531][SQL][FOLLOWUP] Support alter table command with CASE_SENSITIVE is true

2022-11-30 Thread GitBox
cloud-fan commented on PR #38765: URL: https://github.com/apache/spark/pull/38765#issuecomment-1333159661 @wankunde thanks for the explanation! I think this is rather a hive bug but we need to work around it in Spark. Is this a long-standing issue or a new regression? You marked the PR as

[GitHub] [spark] amaliujia commented on pull request #38857: [SPARK-41345][CONNECT][PROTOBUF] Add Hint to Connect Proto

2022-11-30 Thread GitBox
amaliujia commented on PR #38857: URL: https://github.com/apache/spark/pull/38857#issuecomment-1333158051 cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] amaliujia opened a new pull request, #38857: [SPARK-41345][CONNECT][PROTOBUF] Add Hint to Connect Proto

2022-11-30 Thread GitBox
amaliujia opened a new pull request, #38857: URL: https://github.com/apache/spark/pull/38857 ### What changes were proposed in this pull request? This PR adds `Hint` to Connect proto. The Hint technically can accept `ANY` parameter, however in this version of

[GitHub] [spark] cloud-fan commented on a diff in pull request #38851: [SPARK-41338][SQL] Resolve outer references and normal columns in the same analyzer batch

2022-11-30 Thread GitBox
cloud-fan commented on code in PR #38851: URL: https://github.com/apache/spark/pull/38851#discussion_r1036678320 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2109,6 +2110,51 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] LuciferYang commented on pull request #38856: [SPARK-41314][SQL] Assign a name to the error class `_LEGACY_ERROR_TEMP_1094`

2022-11-30 Thread GitBox
LuciferYang commented on PR #38856: URL: https://github.com/apache/spark/pull/38856#issuecomment-1333143698 cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang opened a new pull request, #38856: [SPARK-41314][SQL] Assign a name to the error class `_LEGACY_ERROR_TEMP_1094`

2022-11-30 Thread GitBox
LuciferYang opened a new pull request, #38856: URL: https://github.com/apache/spark/pull/38856 ### What changes were proposed in this pull request? This pr aims to rename error class `_LEGACY_ERROR_TEMP_1094` to `INVALID_SCHEMA.NON_STRUCT_TYPE`. ### Why are the changes needed?

[GitHub] [spark] warrenzhu25 commented on a diff in pull request #38852: [SPARK-41341][CORE] Wait shuffle fetch to finish when decommission executor

2022-11-30 Thread GitBox
warrenzhu25 commented on code in PR #38852: URL: https://github.com/apache/spark/pull/38852#discussion_r1036663332 ## core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala: ## @@ -352,8 +353,20 @@ private[spark] class CoarseGrainedExecutorBackend(

[GitHub] [spark] viirya commented on a diff in pull request #38851: [SPARK-41338][SQL] Resolve outer references and normal columns in the same analyzer batch

2022-11-30 Thread GitBox
viirya commented on code in PR #38851: URL: https://github.com/apache/spark/pull/38851#discussion_r1036659792 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2109,6 +2110,51 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] Ngone51 commented on a diff in pull request #38852: [SPARK-41341][CORE] Wait shuffle fetch to finish when decommission executor

2022-11-30 Thread GitBox
Ngone51 commented on code in PR #38852: URL: https://github.com/apache/spark/pull/38852#discussion_r1036661551 ## core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala: ## @@ -352,8 +353,20 @@ private[spark] class CoarseGrainedExecutorBackend(

[GitHub] [spark] viirya commented on a diff in pull request #38851: [SPARK-41338][SQL] Resolve outer references and normal columns in the same analyzer batch

2022-11-30 Thread GitBox
viirya commented on code in PR #38851: URL: https://github.com/apache/spark/pull/38851#discussion_r1036659792 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2109,6 +2110,51 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] Ngone51 commented on a diff in pull request #38668: [SPARK-41153][CORE] Log migrated shuffle data size and migration time

2022-11-30 Thread GitBox
Ngone51 commented on code in PR #38668: URL: https://github.com/apache/spark/pull/38668#discussion_r1036658717 ## core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala: ## @@ -125,7 +126,10 @@ private[storage] class BlockManagerDecommissioner(

[GitHub] [spark] cloud-fan commented on a diff in pull request #38777: [SPARK-41151][FOLLOW-UP][SQL] Keep built-in file _metadata fields nullable value consistent

2022-11-30 Thread GitBox
cloud-fan commented on code in PR #38777: URL: https://github.com/apache/spark/pull/38777#discussion_r1036655896 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -272,7 +273,7 @@ object FileSourceStrategy extends Strategy

[GitHub] [spark] cloud-fan commented on a diff in pull request #38777: [SPARK-41151][FOLLOW-UP][SQL] Keep built-in file _metadata fields nullable value consistent

2022-11-30 Thread GitBox
cloud-fan commented on code in PR #38777: URL: https://github.com/apache/spark/pull/38777#discussion_r1036655709 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala: ## @@ -234,7 +234,8 @@ object FileSourceStrategy extends Strategy

[GitHub] [spark] cloud-fan commented on pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

2022-11-30 Thread GitBox
cloud-fan commented on PR #38750: URL: https://github.com/apache/spark/pull/38750#issuecomment-1333112105 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan closed pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

2022-11-30 Thread GitBox
cloud-fan closed pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types URL: https://github.com/apache/spark/pull/38750 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on a diff in pull request #38851: [SPARK-41338][SQL] Resolve outer references and normal columns in the same analyzer batch

2022-11-30 Thread GitBox
cloud-fan commented on code in PR #38851: URL: https://github.com/apache/spark/pull/38851#discussion_r1036654360 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2109,6 +2110,39 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] xiuzhu9527 commented on pull request #38674: [SPARK-41160][YARN] Fix error when submitting a task to the yarn that enabled the timeline service

2022-11-30 Thread GitBox
xiuzhu9527 commented on PR #38674: URL: https://github.com/apache/spark/pull/38674#issuecomment-1333100399 This problem should be puzzling many people at present. Important for Yarn Timeline compatibility. I think should consider how to make the timeline work normally instead of closing it

[GitHub] [spark] xiuzhu9527 commented on pull request #38674: [SPARK-41160][YARN] Fix error when submitting a task to the yarn that enabled the timeline service

2022-11-30 Thread GitBox
xiuzhu9527 commented on PR #38674: URL: https://github.com/apache/spark/pull/38674#issuecomment-1333085766 Jersey 1 and Jersey 2 are two different packages. One is com.sun, and the other is org.grasfish.jersey. In addition, META-INF/services/does not exist in the Jersey 2 client, and I

[GitHub] [spark] xiuzhu9527 commented on pull request #38674: [SPARK-41160][YARN] Fix error when submitting a task to the yarn that enabled the timeline service

2022-11-30 Thread GitBox
xiuzhu9527 commented on PR #38674: URL: https://github.com/apache/spark/pull/38674#issuecomment-1333085409 @tgravescs 1. I am using version 2.7.4 of yarn 2. https://issues.apache.org/jira/browse/YARN-5271 YARN-5271 does not fundamentally solve the problem, but it captures

[GitHub] [spark] zhengruifeng commented on pull request #38838: [SPARK-41321][CONNECT] Support target field for UnresolvedStar

2022-11-30 Thread GitBox
zhengruifeng commented on PR #38838: URL: https://github.com/apache/spark/pull/38838#issuecomment-1333078020 @dengziming thank you for working on this! merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HeartSaVioR commented on pull request #38853: [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing

2022-11-30 Thread GitBox
HeartSaVioR commented on PR #38853: URL: https://github.com/apache/spark/pull/38853#issuecomment-1333077999 It's not intended. It looks like we missed to deal with ticket. Anyway we clear the writebatch before starting a new task for next microbatch, but I see issue when an executor

[GitHub] [spark] zhengruifeng closed pull request #38838: [SPARK-41321][CONNECT] Support target field for UnresolvedStar

2022-11-30 Thread GitBox
zhengruifeng closed pull request #38838: [SPARK-41321][CONNECT] Support target field for UnresolvedStar URL: https://github.com/apache/spark/pull/38838 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] Kimahriman commented on pull request #38853: [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing

2022-11-30 Thread GitBox
Kimahriman commented on PR #38853: URL: https://github.com/apache/spark/pull/38853#issuecomment-1333073885 One question I did have, as I started digging into this I thought the problem was simply related to https://issues.apache.org/jira/browse/SPARK-38277, but as I dug into it I realized

[GitHub] [spark] Kimahriman commented on pull request #38853: [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing

2022-11-30 Thread GitBox
Kimahriman commented on PR #38853: URL: https://github.com/apache/spark/pull/38853#issuecomment-1333068984 > 1. Have you run the fix with your production workload for a while and see there is no longer the same memory issue? Yes I confirmed that today. I have executors with a 25 GiB heap

[GitHub] [spark] HeartSaVioR commented on pull request #38853: [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing

2022-11-30 Thread GitBox
HeartSaVioR commented on PR #38853: URL: https://github.com/apache/spark/pull/38853#issuecomment-1333053315 Nice finding! Looks like your explanation makes sense. I've quickly googled and there is `reserve` which may shrink the memory as desired but doesn't seem to be guaranteed. (And it's

[GitHub] [spark] deepak-shivanandappa closed pull request #38855: Branch 3.2 switch jdk

2022-11-30 Thread GitBox
deepak-shivanandappa closed pull request #38855: Branch 3.2 switch jdk URL: https://github.com/apache/spark/pull/38855 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] deepak-shivanandappa opened a new pull request, #38855: Branch 3.2 switch jdk

2022-11-30 Thread GitBox
deepak-shivanandappa opened a new pull request, #38855: URL: https://github.com/apache/spark/pull/38855 Switch JDK from openjdk to eclipse temurin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] zhouyejoe commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side read metrics

2022-11-30 Thread GitBox
zhouyejoe commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1036608522 ## sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala: ## @@ -44,6 +44,26 @@ class SQLShuffleReadMetricsReporter(

[GitHub] [spark] amaliujia opened a new pull request, #38854: [SPARK-41343][CONNECT] Move FunctionName parsing to server side

2022-11-30 Thread GitBox
amaliujia opened a new pull request, #38854: URL: https://github.com/apache/spark/pull/38854 ### What changes were proposed in this pull request? This PR propose to change the name of `UnresolvedFunction` from a sequence of name parts to a single name string, which help to

[GitHub] [spark] github-actions[bot] commented on pull request #37571: [SPARK-21487][CORE][WEB UI] Change extension of mustache template files to .mustache

2022-11-30 Thread GitBox
github-actions[bot] commented on PR #37571: URL: https://github.com/apache/spark/pull/37571#issuecomment-1332933243 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #37591: [SPARK-40158][SQL] Remove useless configuration & extract common code for parquet read

2022-11-30 Thread GitBox
github-actions[bot] commented on PR #37591: URL: https://github.com/apache/spark/pull/37591#issuecomment-1332933203 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #37573: [SPARK-40141][CORE] Remove unnecessary TaskContext addTaskXxxListener overloads

2022-11-30 Thread GitBox
github-actions[bot] commented on PR #37573: URL: https://github.com/apache/spark/pull/37573#issuecomment-1332933221 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #37602: [SPARK-40164][SQL] The partitionSpec should be distinct keys after filter one row of row_number

2022-11-30 Thread GitBox
github-actions[bot] commented on PR #37602: URL: https://github.com/apache/spark/pull/37602#issuecomment-1332933187 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] HeartSaVioR commented on pull request #38777: [SPARK-41151][FOLLOW-UP][SQL] Keep built-in file _metadata fields nullable value consistent

2022-11-30 Thread GitBox
HeartSaVioR commented on PR #38777: URL: https://github.com/apache/spark/pull/38777#issuecomment-1332915287 I don't have context for that, sorry. I'm OK either way if the nullability of column is guaranteed to not fluctuate. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] Kimahriman opened a new pull request, #38853: [SPARK-41339][SQL] Close and recreate RocksDB write batch instead of just clearing

2022-11-30 Thread GitBox
Kimahriman opened a new pull request, #38853: URL: https://github.com/apache/spark/pull/38853 ### What changes were proposed in this pull request? Instead of just calling `writeBatch.clear`, close the write batch and recreate it. ### Why are the changes needed?

[GitHub] [spark] AmplabJenkins commented on pull request #38852: [SPARK-41341][CORE] Wait shuffle fetch to finish when decommission executor

2022-11-30 Thread GitBox
AmplabJenkins commented on PR #38852: URL: https://github.com/apache/spark/pull/38852#issuecomment-1332905185 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] holdenk commented on a diff in pull request #38518: [SPARK-33349][K8S] Reset the executor pods watcher when we receive a version changed from k8s

2022-11-30 Thread GitBox
holdenk commented on code in PR #38518: URL: https://github.com/apache/spark/pull/38518#discussion_r1036510431 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala: ## @@ -86,8 +97,14 @@ class

[GitHub] [spark] holdenk commented on a diff in pull request #37880: [SPARK-39399] [CORE] [K8S]: Fix proxy-user authentication for Spark on k8s in cluster deploy mode

2022-11-30 Thread GitBox
holdenk commented on code in PR #37880: URL: https://github.com/apache/spark/pull/37880#discussion_r1036499087 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -157,24 +157,32 @@ private[spark] class SparkSubmit extends Logging { def doRunMain():

[GitHub] [spark] holdenk commented on pull request #37821: [SPARK-40379][K8S] Propagate decommission executor loss reason in K8s

2022-11-30 Thread GitBox
holdenk commented on PR #37821: URL: https://github.com/apache/spark/pull/37821#issuecomment-1332804844 cc @attilapiros are my responses ok? or still more concerns? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] holdenk commented on pull request #38574: [SPARK-41060][K8S] Fix generating driver and executor Config Maps

2022-11-30 Thread GitBox
holdenk commented on PR #38574: URL: https://github.com/apache/spark/pull/38574#issuecomment-1332802078 The CI failure is in Kube (e.g. `[info] 22/11/10 13:25:54 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are

[GitHub] [spark] warrenzhu25 commented on pull request #38852: [SPARK-41341][CORE] Wait shuffle fetch to finish when decommission executor

2022-11-30 Thread GitBox
warrenzhu25 commented on PR #38852: URL: https://github.com/apache/spark/pull/38852#issuecomment-1332773585 @holdenk @dongjoon-hyun Help take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] holdenk commented on pull request #38789: [SPARK-41253][K8S][TESTS] Make Spark K8S volcano IT work in Github Action

2022-11-30 Thread GitBox
holdenk commented on PR #38789: URL: https://github.com/apache/spark/pull/38789#issuecomment-1332767162 I like this, supporting integration tests with downstream consumers and upstream platforms is +1. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] warrenzhu25 opened a new pull request, #38852: [SPARK-41341][CORE] Wait shuffle fetch to finish when decommission executor

2022-11-30 Thread GitBox
warrenzhu25 opened a new pull request, #38852: URL: https://github.com/apache/spark/pull/38852 ### What changes were proposed in this pull request? Wait shuffle fetch to finish when decommission executor by checking num of opening streams. ### Why are the changes needed? Avoid

[GitHub] [spark] dongjoon-hyun closed pull request #38843: [SPARK-41327][CORE] Fix `SparkStatusTracker.getExecutorInfos` by switch On/OffHeapStorageMemory info

2022-11-30 Thread GitBox
dongjoon-hyun closed pull request #38843: [SPARK-41327][CORE] Fix `SparkStatusTracker.getExecutorInfos` by switch On/OffHeapStorageMemory info URL: https://github.com/apache/spark/pull/38843 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dongjoon-hyun commented on pull request #38843: [SPARK-41327][CORE] Fix `SparkStatusTracker.getExecutorInfos` by switch On/OffHeapStorageMemory info

2022-11-30 Thread GitBox
dongjoon-hyun commented on PR #38843: URL: https://github.com/apache/spark/pull/38843#issuecomment-1332656635 All tests passed. Merged to master/3.3/3.2. ![Screen Shot 2022-11-30 at 11 42 28

[GitHub] [spark] dongjoon-hyun commented on pull request #38843: [SPARK-41327][CORE] Fix `SparkStatusTracker.getExecutorInfos` by switch On/OffHeapStorageMemory info

2022-11-30 Thread GitBox
dongjoon-hyun commented on PR #38843: URL: https://github.com/apache/spark/pull/38843#issuecomment-1332655935 Thank you, @attilapiros ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] desmondcheongzx commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

2022-11-30 Thread GitBox
desmondcheongzx commented on code in PR #38750: URL: https://github.com/apache/spark/pull/38750#discussion_r1036314217 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] anchovYu commented on a diff in pull request #38851: [SPARK-41338][SQL] Resolve outer references and normal columns in the same analyzer batch

2022-11-30 Thread GitBox
anchovYu commented on code in PR #38851: URL: https://github.com/apache/spark/pull/38851#discussion_r1036272043 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2109,6 +2110,39 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] anchovYu commented on a diff in pull request #38851: [SPARK-41338][SQL] Resolve outer references and normal columns in the same analyzer batch

2022-11-30 Thread GitBox
anchovYu commented on code in PR #38851: URL: https://github.com/apache/spark/pull/38851#discussion_r1036272043 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2109,6 +2110,39 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] dongjoon-hyun commented on pull request #38843: [SPARK-41327][CORE] Fix `SparkStatusTracker.getExecutorInfos` by switch On/OffHeapStorageMemory info

2022-11-30 Thread GitBox
dongjoon-hyun commented on PR #38843: URL: https://github.com/apache/spark/pull/38843#issuecomment-1332494831 Thank you, @mridulm , @Ngone51 , @LuciferYang . @ylybest re-triggered the failed pipeline. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] xinglin commented on a diff in pull request #38832: [WIP] SPARK-41313 Combine fixes for SPARK-3900 and SPARK-21138

2022-11-30 Thread GitBox
xinglin commented on code in PR #38832: URL: https://github.com/apache/spark/pull/38832#discussion_r1036211437 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala: ## @@ -240,6 +240,9 @@ private[spark] class ApplicationMaster(

[GitHub] [spark] xkrogen commented on pull request #38712: [WIP][SPARK-41271][SQL] Parameterized SQL queries

2022-11-30 Thread GitBox
xkrogen commented on PR #38712: URL: https://github.com/apache/spark/pull/38712#issuecomment-1332434274 All 3 of the examples you provided use different syntax (`:id` vs `@id` vs `{{ id }}`). Can we provide more motivation for why we are picking the syntax used in this implementation?

[GitHub] [spark] cloud-fan commented on pull request #38851: [SPARK-41338][SQL] Resolve outer references and normal columns in the same analyzer batch

2022-11-30 Thread GitBox
cloud-fan commented on PR #38851: URL: https://github.com/apache/spark/pull/38851#issuecomment-1332373661 cc @viirya @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan opened a new pull request, #38851: [SPARK-41338][SQL] Resolve outer references and normal columns in the same analyzer batch

2022-11-30 Thread GitBox
cloud-fan opened a new pull request, #38851: URL: https://github.com/apache/spark/pull/38851 ### What changes were proposed in this pull request? Today, the way we resolve outer references is very inefficient. It invokes the entire analyzer to resolve the subquery plan, then

[GitHub] [spark] srielau commented on a diff in pull request #38728: [SPARK-41204] [CONNECT] Migrate custom exceptions to use Spark exceptions

2022-11-30 Thread GitBox
srielau commented on code in PR #38728: URL: https://github.com/apache/spark/pull/38728#discussion_r1036118316 ## core/src/main/resources/error/error-classes.json: ## @@ -132,7 +132,97 @@ }, "INTERCEPTOR_RUNTIME_ERROR" : { "message" : [ - "Error

[GitHub] [spark] wankunde commented on pull request #38765: [SPARK-35531][SQL][FOLLOWUP] Support alter table command with CASE_SENSITIVE is true

2022-11-30 Thread GitBox
wankunde commented on PR #38765: URL: https://github.com/apache/spark/pull/38765#issuecomment-1332326017 For example: * We want to create a table called `tAb_I` * Hive metastore will check if the table name is valid by `MetaStoreUtils.validateName(tbl.getTableName())` * Hive will

[GitHub] [spark] kmozaid commented on pull request #31573: [SPARK-34444][SQL] Pushdown scalar-subquery filter to FileSourceScan

2022-11-30 Thread GitBox
kmozaid commented on PR #31573: URL: https://github.com/apache/spark/pull/31573#issuecomment-1332320279 @SaurabhChawla100 Hi Saurabh, I didn't get your statement `If Scalar Subquery completes first, than only scan of t1 starts before this change and after this PR also , Than push down of

[GitHub] [spark] roczei commented on pull request #38828: [SPARK-35084][CORE] Spark 3: supporting --packages in k8s cluster mode

2022-11-30 Thread GitBox
roczei commented on PR #38828: URL: https://github.com/apache/spark/pull/38828#issuecomment-1332308308 Hi @ocworld, Thanks a lot for this fix! I have tested it and it works for me as well. Do you plan to add unit tests? Have you found a solution for this problem what you have

[GitHub] [spark] tgravescs commented on pull request #38674: [SPARK-41160][YARN] Fix error when submitting a task to the yarn that enabled the timeline service

2022-11-30 Thread GitBox
tgravescs commented on PR #38674: URL: https://github.com/apache/spark/pull/38674#issuecomment-1332225421 What version of YARN are you using? See issue:

[GitHub] [spark] LuciferYang commented on pull request #38811: [SPARK-41276][SQL][ML][MLLIB][PROTOBUF][PYTHON][R][SS][AVRO] Optimize constructor use of `StructType`

2022-11-30 Thread GitBox
LuciferYang commented on PR #38811: URL: https://github.com/apache/spark/pull/38811#issuecomment-1332223630 Thanks @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] srowen closed pull request #38811: [SPARK-41276][SQL][ML][MLLIB][PROTOBUF][PYTHON][R][SS][AVRO] Optimize constructor use of `StructType`

2022-11-30 Thread GitBox
srowen closed pull request #38811: [SPARK-41276][SQL][ML][MLLIB][PROTOBUF][PYTHON][R][SS][AVRO] Optimize constructor use of `StructType` URL: https://github.com/apache/spark/pull/38811 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] srowen commented on pull request #38811: [SPARK-41276][SQL][ML][MLLIB][PROTOBUF][PYTHON][R][SS][AVRO] Optimize constructor use of `StructType`

2022-11-30 Thread GitBox
srowen commented on PR #38811: URL: https://github.com/apache/spark/pull/38811#issuecomment-133466 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-30 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1036003754 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -383,8 +383,8 @@ private[spark] class DAGScheduler( /** * Called by the

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-30 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1036003754 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -383,8 +383,8 @@ private[spark] class DAGScheduler( /** * Called by the

[GitHub] [spark] toujours33 commented on a diff in pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-30 Thread GitBox
toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1036003754 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -383,8 +383,8 @@ private[spark] class DAGScheduler( /** * Called by the

[GitHub] [spark] bjornjorgensen closed pull request #38773: [SPARK-41016][PS] Identical expressions should not be used on both sides of a binary operator

2022-11-30 Thread GitBox
bjornjorgensen closed pull request #38773: [SPARK-41016][PS] Identical expressions should not be used on both sides of a binary operator URL: https://github.com/apache/spark/pull/38773 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] LuciferYang commented on pull request #38843: [SPARK-41327][CORE] Fix `SparkStatusTracker.getExecutorInfos` by switch On/OffHeapStorageMemory info

2022-11-30 Thread GitBox
LuciferYang commented on PR #38843: URL: https://github.com/apache/spark/pull/38843#issuecomment-1332180177 Please re-trigger the failed GA task @ylybest -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] Ngone51 commented on a diff in pull request #38668: [SPARK-41153][CORE] Log migrated shuffle data size and migration time

2022-11-30 Thread GitBox
Ngone51 commented on code in PR #38668: URL: https://github.com/apache/spark/pull/38668#discussion_r1035963605 ## core/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala: ## @@ -125,7 +126,10 @@ private[storage] class BlockManagerDecommissioner(

  1   2   >