[GitHub] [spark] williamhyun opened a new pull request, #38724: [SPARK-41202][BUILD] Update ORC to 1.7.7

2022-11-18 Thread GitBox
williamhyun opened a new pull request, #38724: URL: https://github.com/apache/spark/pull/38724 ### What changes were proposed in this pull request? This PR aims to update ORC to 1.7.7. ### Why are the changes needed? This will bring the latest bug fixes. ### Does this PR

[GitHub] [spark] AmplabJenkins commented on pull request #38693: [SPARK-41196] [CONNECT] Homogenize the protobuf version across the Spark connect server to use the same major version.

2022-11-18 Thread GitBox
AmplabJenkins commented on PR #38693: URL: https://github.com/apache/spark/pull/38693#issuecomment-1320825239 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] grundprinzip commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-18 Thread GitBox
grundprinzip commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1027046810 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -271,8 +273,12 @@ class SparkConnectPlanner(session: Spar

[GitHub] [spark] MaxGekk closed pull request #38696: [SPARK-41175][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1078

2022-11-18 Thread GitBox
MaxGekk closed pull request #38696: [SPARK-41175][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1078 URL: https://github.com/apache/spark/pull/38696 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] MaxGekk commented on pull request #38696: [SPARK-41175][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1078

2022-11-18 Thread GitBox
MaxGekk commented on PR #38696: URL: https://github.com/apache/spark/pull/38696#issuecomment-1320812183 +1, LGTM. Merging to master. Thank you, @panbingkun and @srielau for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [spark] amaliujia commented on pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-18 Thread GitBox
amaliujia commented on PR #38723: URL: https://github.com/apache/spark/pull/38723#issuecomment-1320811915 @zhengruifeng @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia opened a new pull request, #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

2022-11-18 Thread GitBox
amaliujia opened a new pull request, #38723: URL: https://github.com/apache/spark/pull/38723 ### What changes were proposed in this pull request? Implement `DataFrame.SelectExpr` in Python client. `SelectExpr` also has a good amount of usage. ### Why are the changes nee

[GitHub] [spark] AmplabJenkins commented on pull request #38696: [SPARK-41175][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1078

2022-11-18 Thread GitBox
AmplabJenkins commented on PR #38696: URL: https://github.com/apache/spark/pull/38696#issuecomment-1320808013 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] WangGuangxin opened a new pull request, #38722: [SPARK-41200][CORE] BytesToBytesMap's longArray size can be up to MAX_CAPACITY

2022-11-18 Thread GitBox
WangGuangxin opened a new pull request, #38722: URL: https://github.com/apache/spark/pull/38722 ### What changes were proposed in this pull request? In BytesToBytesMap, the longArray size can be up to `MAX_CAPACITY` instead `MAX_CAPACITY/2` since `MAX_CAPACITY` already take `two array ent

[GitHub] [spark] wangyum commented on pull request #38682: [SPARK-41167][SQL] Improve multi like performance by creating a balanced expression tree predicate

2022-11-18 Thread GitBox
wangyum commented on PR #38682: URL: https://github.com/apache/spark/pull/38682#issuecomment-1320804263 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [spark] panbingkun opened a new pull request, #38721: [WIP][SPARK-41172][SQL] Migrate the ambiguous ref error to an error class

2022-11-18 Thread GitBox
panbingkun opened a new pull request, #38721: URL: https://github.com/apache/spark/pull/38721 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### How was this patch te

[GitHub] [spark] WeichenXu123 commented on pull request #38699: [SPARK-41188][CORE][ML] Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes

2022-11-18 Thread GitBox
WeichenXu123 commented on PR #38699: URL: https://github.com/apache/spark/pull/38699#issuecomment-1320790862 > If we are setting it in `SparkContext`, do we want to get rid of this from other places like `PythonRunner.compute` ? I think we can remove code in PythonRunner.compute --

[GitHub] [spark] viirya closed pull request #38716: [SPARK-XXXXX][SS] Use latestCommittedBatchId as currentBatchId when resuming late batch

2022-11-18 Thread GitBox
viirya closed pull request #38716: [SPARK-X][SS] Use latestCommittedBatchId as currentBatchId when resuming late batch URL: https://github.com/apache/spark/pull/38716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] Yikun commented on pull request #38698: [SPARK-41186][INFRA][PS][TESTS] Upgrade infra and replace `list_run_infos` with `search_runs` in mlflow doctest

2022-11-18 Thread GitBox
Yikun commented on PR #38698: URL: https://github.com/apache/spark/pull/38698#issuecomment-1320737666 Merge to master, @HyukjinKwon @harupy Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] Yikun closed pull request #38698: [SPARK-41186][INFRA][PS][TESTS] Upgrade infra and replace `list_run_infos` with `search_runs` in mlflow doctest

2022-11-18 Thread GitBox
Yikun closed pull request #38698: [SPARK-41186][INFRA][PS][TESTS] Upgrade infra and replace `list_run_infos` with `search_runs` in mlflow doctest URL: https://github.com/apache/spark/pull/38698 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] wangyum commented on pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on PR #38682: URL: https://github.com/apache/spark/pull/38682#issuecomment-1320735591 @wankunde Please fix the PR title and description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] wangyum commented on a diff in pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on code in PR #38682: URL: https://github.com/apache/spark/pull/38682#discussion_r1027001153 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala: ## @@ -117,4 +117,29 @@ object ExprUtils extends QueryErrorsBase { Type

[GitHub] [spark] viirya commented on a diff in pull request #38719: [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

2022-11-18 Thread GitBox
viirya commented on code in PR #38719: URL: https://github.com/apache/spark/pull/38719#discussion_r1026999005 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala: ## @@ -345,7 +345,14 @@ trait ProgressReporter extends Logging { val a

[GitHub] [spark] HyukjinKwon commented on pull request #38698: [SPARK-41186][INFRA][PS][TESTS] Upgrade infra and replace `list_run_infos` with `search_runs` in mlflow doctest

2022-11-18 Thread GitBox
HyukjinKwon commented on PR #38698: URL: https://github.com/apache/spark/pull/38698#issuecomment-1320711448 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #38718: [SPARK-41196][CONNECT][FOLLOW-UP] Fix out of sync generated files for Python

2022-11-18 Thread GitBox
HyukjinKwon closed pull request #38718: [SPARK-41196][CONNECT][FOLLOW-UP] Fix out of sync generated files for Python URL: https://github.com/apache/spark/pull/38718 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #38718: [SPARK-41196][CONNECT][FOLLOW-UP] Fix out of sync generated files for Python

2022-11-18 Thread GitBox
HyukjinKwon commented on PR #38718: URL: https://github.com/apache/spark/pull/38718#issuecomment-1320707595 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38719: [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

2022-11-18 Thread GitBox
HeartSaVioR commented on code in PR #38719: URL: https://github.com/apache/spark/pull/38719#discussion_r1026994040 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala: ## @@ -345,7 +345,14 @@ trait ProgressReporter extends Logging {

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38719: [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

2022-11-18 Thread GitBox
HeartSaVioR commented on code in PR #38719: URL: https://github.com/apache/spark/pull/38719#discussion_r1026993747 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala: ## @@ -345,7 +345,14 @@ trait ProgressReporter extends Logging {

[GitHub] [spark] hvanhovell opened a new pull request, #38720: [SPARK-41165][SPARK-41184][CONNECT] Fix arrow collect (again) and reenable tests.

2022-11-18 Thread GitBox
hvanhovell opened a new pull request, #38720: URL: https://github.com/apache/spark/pull/38720 ### What changes were proposed in this pull request? The arrow collect code path for connect contains a bug where it would always fall back to JSON. This was caused by the assumption that `NonFat

[GitHub] [spark] viirya commented on a diff in pull request #38719: [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

2022-11-18 Thread GitBox
viirya commented on code in PR #38719: URL: https://github.com/apache/spark/pull/38719#discussion_r1026993007 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala: ## @@ -345,7 +345,14 @@ trait ProgressReporter extends Logging { val a

[GitHub] [spark] github-actions[bot] closed pull request #36695: [SPARK-38474][CORE] Use error class in org.apache.spark.security

2022-11-18 Thread GitBox
github-actions[bot] closed pull request #36695: [SPARK-38474][CORE] Use error class in org.apache.spark.security URL: https://github.com/apache/spark/pull/36695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [spark] github-actions[bot] commented on pull request #36767: [SPARK-39363][K8S] Deprecate k8s memory overhead and make it optional

2022-11-18 Thread GitBox
github-actions[bot] commented on PR #36767: URL: https://github.com/apache/spark/pull/36767#issuecomment-1320692121 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #37359: [SPARK-25342][CORE][SQL]Support rolling back a result stage and rerunning all result tasks when writing files

2022-11-18 Thread GitBox
github-actions[bot] closed pull request #37359: [SPARK-25342][CORE][SQL]Support rolling back a result stage and rerunning all result tasks when writing files URL: https://github.com/apache/spark/pull/37359 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] github-actions[bot] closed pull request #37129: [SPARK-39710][SQL] Support push local topK through outer join

2022-11-18 Thread GitBox
github-actions[bot] closed pull request #37129: [SPARK-39710][SQL] Support push local topK through outer join URL: https://github.com/apache/spark/pull/37129 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] github-actions[bot] commented on pull request #37460: [WIP][SPARK-40031][SQL] Remove unnecessary TryEval in TryCast

2022-11-18 Thread GitBox
github-actions[bot] commented on PR #37460: URL: https://github.com/apache/spark/pull/37460#issuecomment-1320692056 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] wangyum commented on a diff in pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on code in PR #38682: URL: https://github.com/apache/spark/pull/38682#discussion_r1026987556 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala: ## @@ -117,4 +117,23 @@ object ExprUtils extends QueryErrorsBase { Type

[GitHub] [spark] viirya commented on pull request #38716: [SPARK-XXXXX][SS] Use latestCommittedBatchId as currentBatchId when resuming late batch

2022-11-18 Thread GitBox
viirya commented on PR #38716: URL: https://github.com/apache/spark/pull/38716#issuecomment-1320636937 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] liuzqt commented on a diff in pull request #38704: [SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultCollec

2022-11-18 Thread GitBox
liuzqt commented on code in PR #38704: URL: https://github.com/apache/spark/pull/38704#discussion_r1026966457 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2251,7 +2251,11 @@ class DatasetLargeResultCollectingSuite extends QueryTest with SharedSpa

[GitHub] [spark] AmplabJenkins commented on pull request #38702: [SPARK-41187][Core] LiveExecutor MemoryLeak in AppStatusListener when ExecutorLost happen

2022-11-18 Thread GitBox
AmplabJenkins commented on PR #38702: URL: https://github.com/apache/spark/pull/38702#issuecomment-1320614168 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #38703: [SPARK-41191] [SQL] Cache Table is not working while nested caches exist

2022-11-18 Thread GitBox
AmplabJenkins commented on PR #38703: URL: https://github.com/apache/spark/pull/38703#issuecomment-1320614138 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] tedyu commented on pull request #38715: [SPARK-41197] Upgrade Kafka version to 3.3 release

2022-11-18 Thread GitBox
tedyu commented on PR #38715: URL: https://github.com/apache/spark/pull/38715#issuecomment-1320568475 @HeartSaVioR Can you take a look ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] HeartSaVioR commented on pull request #38719: [SPARK-41199][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

2022-11-18 Thread GitBox
HeartSaVioR commented on PR #38719: URL: https://github.com/apache/spark/pull/38719#issuecomment-1320531883 cc. @zsxwing @viirya @xuanyuanking Please take a look. Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [spark] HeartSaVioR commented on pull request #38717: [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source

2022-11-18 Thread GitBox
HeartSaVioR commented on PR #38717: URL: https://github.com/apache/spark/pull/38717#issuecomment-1320531472 cc. @zsxwing @cloud-fan @viirya Please take a look. Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] xkrogen commented on pull request #35969: [SPARK-38651][SQL] Add configuration to support writing out empty schemas in supported filebased datasources

2022-11-18 Thread GitBox
xkrogen commented on PR #35969: URL: https://github.com/apache/spark/pull/35969#issuecomment-1320527156 @cloud-fan , any more concerns on this approach based on what @thejdeep shared? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HeartSaVioR opened a new pull request, #38719: [SPARK-41999][SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used

2022-11-18 Thread GitBox
HeartSaVioR opened a new pull request, #38719: URL: https://github.com/apache/spark/pull/38719 ### What changes were proposed in this pull request? This PR proposes to fix the metrics issue for streaming query when DSv1 streaming source and DSv2 streaming source are co-used. If the st

[GitHub] [spark] amaliujia commented on pull request #38718: [SPARK-41196][CONNECT][FOLLOW-UP] Fix out of sync generated files for Python

2022-11-18 Thread GitBox
amaliujia commented on PR #38718: URL: https://github.com/apache/spark/pull/38718#issuecomment-1320510474 @zhengruifeng @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia opened a new pull request, #38718: [SPARK-41196][CONNECT][FOLLOW-UP] Fix out of sync generated files for Python

2022-11-18 Thread GitBox
amaliujia opened a new pull request, #38718: URL: https://github.com/apache/spark/pull/38718 ### What changes were proposed in this pull request? Fix out of sync generated files for Python. This happens on a rare case for protobuf version change. There were something no

[GitHub] [spark] MaxGekk commented on pull request #38696: [SPARK-41175][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1078

2022-11-18 Thread GitBox
MaxGekk commented on PR #38696: URL: https://github.com/apache/spark/pull/38696#issuecomment-1320493195 cc @srielau -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] HeartSaVioR opened a new pull request, #38717: [SPARK-41198][SS] Fix metrics in streaming query having CTE and DSv1 streaming source

2022-11-18 Thread GitBox
HeartSaVioR opened a new pull request, #38717: URL: https://github.com/apache/spark/pull/38717 ### What changes were proposed in this pull request? This PR proposes to fix the broken metrics when the streaming query has CTE, via applying InlineCTE manually against analyzed plan when c

[GitHub] [spark] viirya opened a new pull request, #38716: [SPARK-XXXXX][SS] Use latestCommittedBatchId as currentBatchId when resuming late batch

2022-11-18 Thread GitBox
viirya opened a new pull request, #38716: URL: https://github.com/apache/spark/pull/38716 ### What changes were proposed in this pull request? This patch changes `currentBatchId` when `MicroBatchExecution` tries to resume from late batch from offset log. Previously it take

[GitHub] [spark] geofflangenderfer commented on pull request #4093: [SPARK-5307] SerializationDebugger to help debug NotSerializableException

2022-11-18 Thread GitBox
geofflangenderfer commented on PR #4093: URL: https://github.com/apache/spark/pull/4093#issuecomment-1320457789 could someone give a simple example of how to read the graph? I'm not sure where to start -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

2022-11-18 Thread GitBox
ryan-johnson-databricks commented on code in PR #38692: URL: https://github.com/apache/spark/pull/38692#discussion_r1026804602 ## sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala: ## @@ -192,6 +192,23 @@ class SparkSessionExtensionSuite extends Spark

[GitHub] [spark] MaxGekk closed pull request #38705: [SPARK-41173][SQL] Move `require()` out from the constructors of string expressions

2022-11-18 Thread GitBox
MaxGekk closed pull request #38705: [SPARK-41173][SQL] Move `require()` out from the constructors of string expressions URL: https://github.com/apache/spark/pull/38705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] MaxGekk commented on pull request #38705: [SPARK-41173][SQL] Move `require()` out from the constructors of string expressions

2022-11-18 Thread GitBox
MaxGekk commented on PR #38705: URL: https://github.com/apache/spark/pull/38705#issuecomment-1320429350 +1, LGTM. Merging to master. Thank you, @LuciferYang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] tedyu opened a new pull request, #38715: [SPARK-41197] Upgrade Kafka version to 3.3 release

2022-11-18 Thread GitBox
tedyu opened a new pull request, #38715: URL: https://github.com/apache/spark/pull/38715 ### What changes were proposed in this pull request? This PR upgrades Kafka to 3.3.0 release. ### Why are the changes needed? Kafka 3.3.0 release has new features along with bug fixes: https

[GitHub] [spark] ahshahid opened a new pull request, #38714: [WIP][SPARK-41141]. avoid introducing a new aggregate expression in the analysis phase when subquery is referencing it

2022-11-18 Thread GitBox
ahshahid opened a new pull request, #38714: URL: https://github.com/apache/spark/pull/38714 ### What changes were proposed in this pull request? This is a PR for improvement When a subquery references the outer query's aggregate functions, in some cases, it ends up introducing extra a

[GitHub] [spark] amaliujia commented on pull request #38693: [SPARK-41196] [CONNECT] Homogenize the protobuf version across the Spark connect server to use the same major version.

2022-11-18 Thread GitBox
amaliujia commented on PR #38693: URL: https://github.com/apache/spark/pull/38693#issuecomment-1320392539 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [spark] hvanhovell closed pull request #38693: [SPARK-41196] [CONNECT] Homogenize the protobuf version across the Spark connect server to use the same major version.

2022-11-18 Thread GitBox
hvanhovell closed pull request #38693: [SPARK-41196] [CONNECT] Homogenize the protobuf version across the Spark connect server to use the same major version. URL: https://github.com/apache/spark/pull/38693 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] otterc commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-18 Thread GitBox
otterc commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1320389162 Looks good to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [spark] AmplabJenkins commented on pull request #38707: [SPARK-41176][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1042

2022-11-18 Thread GitBox
AmplabJenkins commented on PR #38707: URL: https://github.com/apache/spark/pull/38707#issuecomment-1320346254 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] EnricoMi commented on a diff in pull request #38676: [SPARK-41162][SQL] Do not push down anti-join predicates that become ambiguous

2022-11-18 Thread GitBox
EnricoMi commented on code in PR #38676: URL: https://github.com/apache/spark/pull/38676#discussion_r1026695873 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -1938,7 +1940,10 @@ case class LateralJoin( joinType

[GitHub] [spark] EnricoMi commented on pull request #38676: [SPARK-41162][SQL] Do not push down anti-join predicates that become ambiguous

2022-11-18 Thread GitBox
EnricoMi commented on PR #38676: URL: https://github.com/apache/spark/pull/38676#issuecomment-1320300231 Problem is that `DeduplicateRelations` is only considering duplicates between left `output` and right `output`, and not duplicates between left `references` and right `output`. I have sk

[GitHub] [spark] mridulm commented on a diff in pull request #38704: [SPARK-41193][SQL][TESTS] Ignore `collect data with single partition larger than 2GB bytes array limit` in `DatasetLargeResultColle

2022-11-18 Thread GitBox
mridulm commented on code in PR #38704: URL: https://github.com/apache/spark/pull/38704#discussion_r1026679895 ## sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala: ## @@ -2251,7 +2251,11 @@ class DatasetLargeResultCollectingSuite extends QueryTest with SharedSp

[GitHub] [spark] mridulm commented on pull request #38699: [SPARK-41188][CORE][ML] Set executorEnv OMP_NUM_THREADS to be spark.task.cpus by default for spark executor JVM processes

2022-11-18 Thread GitBox
mridulm commented on PR #38699: URL: https://github.com/apache/spark/pull/38699#issuecomment-1320294062 If we are setting it in `SparkContext`, do we want to get rid of this from other places like `PythonRunner.compute` ? -- This is an automated message from the Apache Git Service. To res

[GitHub] [spark] srielau commented on a diff in pull request #38713: [SPARK-41195][SQL] Support PIVOT/UNPIVOT with join children

2022-11-18 Thread GitBox
srielau commented on code in PR #38713: URL: https://github.com/apache/spark/pull/38713#discussion_r1026672022 ## sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -697,12 +697,12 @@ setQuantifier ; relation -: LATERAL? relatio

[GitHub] [spark] antonipp commented on a diff in pull request #38376: [SPARK-40817] [Kubernetes] Do not discard remote user-specified files when launching Spark jobs on Kubernetes

2022-11-18 Thread GitBox
antonipp commented on code in PR #38376: URL: https://github.com/apache/spark/pull/38376#discussion_r1026638180 ## core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: ## @@ -1609,6 +1609,16 @@ class TestFileSystem extends org.apache.hadoop.fs.LocalFileSystem {

[GitHub] [spark] AmplabJenkins commented on pull request #38710: [SPARK-41179][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1092

2022-11-18 Thread GitBox
AmplabJenkins commented on PR #38710: URL: https://github.com/apache/spark/pull/38710#issuecomment-1320213406 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-18 Thread GitBox
AmplabJenkins commented on PR #38711: URL: https://github.com/apache/spark/pull/38711#issuecomment-1320213339 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] cloud-fan commented on a diff in pull request #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

2022-11-18 Thread GitBox
cloud-fan commented on code in PR #38692: URL: https://github.com/apache/spark/pull/38692#discussion_r1026569958 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala: ## @@ -217,6 +218,22 @@ class SparkSessionExtensions { checkRuleBuilders += builder

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

2022-11-18 Thread GitBox
ryan-johnson-databricks commented on code in PR #38692: URL: https://github.com/apache/spark/pull/38692#discussion_r1026540525 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala: ## @@ -217,6 +218,22 @@ class SparkSessionExtensions { checkRuleBuild

[GitHub] [spark] cloud-fan commented on a diff in pull request #38713: [SPARK-41195][SQL] Support PIVOT/UNPIVOT with join children

2022-11-18 Thread GitBox
cloud-fan commented on code in PR #38713: URL: https://github.com/apache/spark/pull/38713#discussion_r1026557894 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/UnpivotParserSuite.scala: ## @@ -192,4 +193,131 @@ class UnpivotParserSuite extends AnalysisTest {

[GitHub] [spark] cloud-fan commented on pull request #38713: [SPARK-41195][SQL] Support PIVOT/UNPIVOT with join children

2022-11-18 Thread GitBox
cloud-fan commented on PR #38713: URL: https://github.com/apache/spark/pull/38713#issuecomment-1320161201 cc @viirya @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] cloud-fan opened a new pull request, #38713: [SPARK-41195][SQL] Support PIVOT/UNPIVOT with join children

2022-11-18 Thread GitBox
cloud-fan opened a new pull request, #38713: URL: https://github.com/apache/spark/pull/38713 ### What changes were proposed in this pull request? Today, our SQL parser only supports PIVOT/UNPIVOT at the end of the FROM clause. This is quite limited and it's better to allow PIV

[GitHub] [spark] wangyum commented on a diff in pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on code in PR #38682: URL: https://github.com/apache/spark/pull/38682#discussion_r1026534660 ## sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/LikeAnyBenchmark.scala: ## @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] wangyum commented on a diff in pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on code in PR #38682: URL: https://github.com/apache/spark/pull/38682#discussion_r1026531799 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LikeSimplificationSuite.scala: ## @@ -207,11 +207,17 @@ class LikeSimplificationSuite extends Pla

[GitHub] [spark] LuciferYang commented on a diff in pull request #38075: [WIP][SPARK-40633][BUILD] Upgrade janino to 3.1.8

2022-11-18 Thread GitBox
LuciferYang commented on code in PR #38075: URL: https://github.com/apache/spark/pull/38075#discussion_r1026500589 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala: ## @@ -1310,7 +1310,7 @@ case class CatalystToExternalMap private(

[GitHub] [spark] LuciferYang commented on a diff in pull request #38075: [WIP][SPARK-40633][BUILD] Upgrade janino to 3.1.8

2022-11-18 Thread GitBox
LuciferYang commented on code in PR #38075: URL: https://github.com/apache/spark/pull/38075#discussion_r1026497690 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala: ## @@ -1310,7 +1310,7 @@ case class CatalystToExternalMap private(

[GitHub] [spark] LuciferYang commented on a diff in pull request #38705: [SPARK-41173][SQL] Move `require()` out from the constructors of string expressions

2022-11-18 Thread GitBox
LuciferYang commented on code in PR #38705: URL: https://github.com/apache/spark/pull/38705#discussion_r1026475352 ## sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out: ## @@ -14,7 +29,7 @@ select format_string() struct<> -- !query output org.apache.

[GitHub] [spark] LuciferYang commented on a diff in pull request #38705: [SPARK-41173][SQL] Move `require()` out from the constructors of string expressions

2022-11-18 Thread GitBox
LuciferYang commented on code in PR #38705: URL: https://github.com/apache/spark/pull/38705#discussion_r1026474420 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -1662,8 +1675,7 @@ case class StringRPad(str: Expression, le

[GitHub] [spark] wangyum commented on a diff in pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on code in PR #38682: URL: https://github.com/apache/spark/pull/38682#discussion_r1026470585 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala: ## @@ -117,4 +117,23 @@ object ExprUtils extends QueryErrorsBase { Type

[GitHub] [spark] wangyum commented on a diff in pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on code in PR #38682: URL: https://github.com/apache/spark/pull/38682#discussion_r1026470585 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala: ## @@ -117,4 +117,23 @@ object ExprUtils extends QueryErrorsBase { Type

[GitHub] [spark] LuciferYang commented on pull request #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-18 Thread GitBox
LuciferYang commented on PR #38711: URL: https://github.com/apache/spark/pull/38711#issuecomment-1320030900 cc @mridulm @Ngone51 FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] cloud-fan commented on a diff in pull request #38692: [SPARK-41183][SQL] Add an extension API to do plan normalization for caching

2022-11-18 Thread GitBox
cloud-fan commented on code in PR #38692: URL: https://github.com/apache/spark/pull/38692#discussion_r1026457875 ## sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala: ## @@ -192,6 +192,23 @@ class SparkSessionExtensionSuite extends SparkFunSuite with

[GitHub] [spark] cloud-fan closed pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-18 Thread GitBox
cloud-fan closed pull request #38497: [SPARK-40999] Hint propagation to subqueries URL: https://github.com/apache/spark/pull/38497 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] cloud-fan commented on pull request #38497: [SPARK-40999] Hint propagation to subqueries

2022-11-18 Thread GitBox
cloud-fan commented on PR #38497: URL: https://github.com/apache/spark/pull/38497#issuecomment-1320017014 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] cloud-fan commented on pull request #38687: [SPARK-41154][SQL] Incorrect relation caching for queries with time travel spec

2022-11-18 Thread GitBox
cloud-fan commented on PR #38687: URL: https://github.com/apache/spark/pull/38687#issuecomment-1320010391 there is another cache in `SessionCatalog.tableRelationCache`, shall we update it as well? -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38706: [TEST ONLY] Come back to collect.foreach(send)

2022-11-18 Thread GitBox
HyukjinKwon commented on code in PR #38706: URL: https://github.com/apache/spark/pull/38706#discussion_r1026369143 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -57,13 +55,7 @@ class SparkConnectStreamHandler(resp

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-18 Thread GitBox
HyukjinKwon commented on code in PR #38659: URL: https://github.com/apache/spark/pull/38659#discussion_r1026348675 ## sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala: ## @@ -21,24 +21,22 @@ import java.nio.charset.StandardCharsets import

[GitHub] [spark] HyukjinKwon closed pull request #38675: [SPARK-41161][BUILD] Upgrade scala-parser-combinators to 2.1.1

2022-11-18 Thread GitBox
HyukjinKwon closed pull request #38675: [SPARK-41161][BUILD] Upgrade scala-parser-combinators to 2.1.1 URL: https://github.com/apache/spark/pull/38675 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #38675: [SPARK-41161][BUILD] Upgrade scala-parser-combinators to 2.1.1

2022-11-18 Thread GitBox
HyukjinKwon commented on PR #38675: URL: https://github.com/apache/spark/pull/38675#issuecomment-1319891935 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] wangyum commented on a diff in pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on code in PR #38682: URL: https://github.com/apache/spark/pull/38682#discussion_r1026330362 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -743,6 +743,21 @@ object LikeSimplification extends Rule[LogicalPlan] {

[GitHub] [spark] MaxGekk commented on a diff in pull request #38705: [SPARK-41173][SQL] Move `require()` out from the constructors of string expressions

2022-11-18 Thread GitBox
MaxGekk commented on code in PR #38705: URL: https://github.com/apache/spark/pull/38705#discussion_r1026301316 ## sql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out: ## @@ -14,7 +29,7 @@ select format_string() struct<> -- !query output org.apache.spar

[GitHub] [spark] EnricoMi commented on pull request #38676: [SPARK-41162][SQL] Do not push down anti-join predicates that become ambiguous

2022-11-18 Thread GitBox
EnricoMi commented on PR #38676: URL: https://github.com/apache/spark/pull/38676#issuecomment-1319839395 > Could we fix the `DeduplicateRelations`? Interesting, that sounds like a better solution. I'll look into it. -- This is an automated message from the Apache Git Service. To res

[GitHub] [spark] MaxGekk closed pull request #38688: [SPARK-41166][SQL][TESTS] Check errorSubClass of DataTypeMismatch in *ExpressionSuites

2022-11-18 Thread GitBox
MaxGekk closed pull request #38688: [SPARK-41166][SQL][TESTS] Check errorSubClass of DataTypeMismatch in *ExpressionSuites URL: https://github.com/apache/spark/pull/38688 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] MaxGekk commented on pull request #38688: [SPARK-41166][TESTS] Check errorSubClass of DataTypeMismatch in *ExpressionSuites

2022-11-18 Thread GitBox
MaxGekk commented on PR #38688: URL: https://github.com/apache/spark/pull/38688#issuecomment-1319815373 +1, LGTM. Merging to master. Thank you, @panbingkun. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] wangyum commented on a diff in pull request #38682: [SPARK-41167][SQL] Optimize LikeSimplification rule to improve multi like performance

2022-11-18 Thread GitBox
wangyum commented on code in PR #38682: URL: https://github.com/apache/spark/pull/38682#discussion_r1026264942 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -756,16 +771,16 @@ object LikeSimplification extends Rule[LogicalPlan] {

[GitHub] [spark] MaxGekk commented on a diff in pull request #38664: [SPARK-41147][SQL] Assign a name to the legacy error class `_LEGACY_ERROR_TEMP_1042`

2022-11-18 Thread GitBox
MaxGekk commented on code in PR #38664: URL: https://github.com/apache/spark/pull/38664#discussion_r1026264084 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala: ## @@ -146,7 +147,10 @@ object FunctionRegistryBase { .filter

[GitHub] [spark] MaxGekk commented on a diff in pull request #38650: [SPARK-41135][SQL] Rename `UNSUPPORTED_EMPTY_LOCATION` to `INVALID_EMPTY_LOCATION`

2022-11-18 Thread GitBox
MaxGekk commented on code in PR #38650: URL: https://github.com/apache/spark/pull/38650#discussion_r1026258331 ## core/src/main/resources/error/error-classes.json: ## @@ -656,6 +656,11 @@ ], "sqlState" : "42000" }, + "INVALID_EMPTY_LOCATION" : { +"message" : [

[GitHub] [spark] MaxGekk closed pull request #38644: [SPARK-41130][SQL] Rename `OUT_OF_DECIMAL_TYPE_RANGE` to `NUMERIC_OUT_OF_SUPPORTED_RANGE`

2022-11-18 Thread GitBox
MaxGekk closed pull request #38644: [SPARK-41130][SQL] Rename `OUT_OF_DECIMAL_TYPE_RANGE` to `NUMERIC_OUT_OF_SUPPORTED_RANGE` URL: https://github.com/apache/spark/pull/38644 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] MaxGekk commented on pull request #38644: [SPARK-41130][SQL] Rename `OUT_OF_DECIMAL_TYPE_RANGE` to `NUMERIC_OUT_OF_SUPPORTED_RANGE`

2022-11-18 Thread GitBox
MaxGekk commented on PR #38644: URL: https://github.com/apache/spark/pull/38644#issuecomment-1319797393 +1, LGTM. Merging to master. Thank you, @itholic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] MaxGekk commented on a diff in pull request #38644: [SPARK-41130][SQL] Rename `OUT_OF_DECIMAL_TYPE_RANGE` to `NUMERIC_OUT_OF_SUPPORTED_RANGE`

2022-11-18 Thread GitBox
MaxGekk commented on code in PR #38644: URL: https://github.com/apache/spark/pull/38644#discussion_r1026251775 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastWithAnsiOnSuite.scala: ## @@ -244,7 +244,7 @@ class CastWithAnsiOnSuite extends CastSuiteBa

[GitHub] [spark] MaxGekk closed pull request #38665: [SPARK-41156][SQL] Remove the class `TypeCheckFailure`

2022-11-18 Thread GitBox
MaxGekk closed pull request #38665: [SPARK-41156][SQL] Remove the class `TypeCheckFailure` URL: https://github.com/apache/spark/pull/38665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] MaxGekk commented on pull request #38665: [SPARK-41156][SQL] Remove the class `TypeCheckFailure`

2022-11-18 Thread GitBox
MaxGekk commented on PR #38665: URL: https://github.com/apache/spark/pull/38665#issuecomment-1319784931 > There are still some uses in spark-rapids. I haven't found other uses in other famous repositories ok. Let's leave `TypeCheckFailure` as is. -- This is an automated message fr

[GitHub] [spark] MaxGekk opened a new pull request, #38712: [WIP][SQL] Parameterized SQL queries

2022-11-18 Thread GitBox
MaxGekk opened a new pull request, #38712: URL: https://github.com/apache/spark/pull/38712 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] toujours33 opened a new pull request, #38711: [SPARK-41192][Core] Remove unscheduled speculative tasks when task finished to obtain better dynamic

2022-11-18 Thread GitBox
toujours33 opened a new pull request, #38711: URL: https://github.com/apache/spark/pull/38711 ### What changes were proposed in this pull request? ExecutorAllocationManager only record count for speculative task, `stageAttemptToNumSpeculativeTasks` increment when speculative task submit,

  1   2   >