[GitHub] [spark] dilipbiswal commented on a change in pull request #28953: [SPARK-32013][SQL] Support query execution before/after reading/writing DataFrame over JDBC

2020-06-29 Thread GitBox
dilipbiswal commented on a change in pull request #28953: URL: https://github.com/apache/spark/pull/28953#discussion_r447427847 ## File path: docs/sql-data-sources-jdbc.md ## @@ -156,6 +156,20 @@ the following case-insensitive options: + + preActions +

[GitHub] [spark] dilipbiswal commented on a change in pull request #28953: [SPARK-32013][SQL] Support query execution before/after reading/writing DataFrame over JDBC

2020-06-29 Thread GitBox
dilipbiswal commented on a change in pull request #28953: URL: https://github.com/apache/spark/pull/28953#discussion_r447427697 ## File path: docs/sql-data-sources-jdbc.md ## @@ -156,6 +156,20 @@ the following case-insensitive options: + + preActions +

[GitHub] [spark] dilipbiswal commented on a change in pull request #28951: [SPARK-32131][SQL] union and set operations have wrong exception infomation

2020-06-29 Thread GitBox
dilipbiswal commented on a change in pull request #28951: URL: https://github.com/apache/spark/pull/28951#discussion_r447426571 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala ## @@ -831,4 +831,77 @@ class AnalysisSuite

[GitHub] [spark] dilipbiswal commented on a change in pull request #28951: [SPARK-32131][SQL] union and set operations have wrong exception infomation

2020-06-29 Thread GitBox
dilipbiswal commented on a change in pull request #28951: URL: https://github.com/apache/spark/pull/28951#discussion_r447426485 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala ## @@ -831,4 +831,77 @@ class AnalysisSuite

[GitHub] [spark] dilipbiswal commented on a change in pull request #28951: [SPARK-32131][SQL] union and set operations have wrong exception infomation

2020-06-29 Thread GitBox
dilipbiswal commented on a change in pull request #28951: URL: https://github.com/apache/spark/pull/28951#discussion_r447426527 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala ## @@ -831,4 +831,77 @@ class AnalysisSuite

[GitHub] [spark] liancheng commented on a change in pull request #28948: [SPARK-31935][SQL][FOLLOWUP] Hadoop file system config should be effective in data source options

2020-06-29 Thread GitBox
liancheng commented on a change in pull request #28948: URL: https://github.com/apache/spark/pull/28948#discussion_r447426398 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ## @@ -248,12 +248,17 @@ class CacheManager extends Logging

[GitHub] [spark] imback82 commented on a change in pull request #28676: [WIP][SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-06-29 Thread GitBox
imback82 commented on a change in pull request #28676: URL: https://github.com/apache/spark/pull/28676#discussion_r447418040 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala ## @@ -554,7 +554,7 @@ class

[GitHub] [spark] imback82 commented on a change in pull request #28676: [WIP][SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-06-29 Thread GitBox
imback82 commented on a change in pull request #28676: URL: https://github.com/apache/spark/pull/28676#discussion_r447411442 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ## @@ -60,6 +62,92 @@ case class

[GitHub] [spark] imback82 commented on a change in pull request #28676: [WIP][SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-06-29 Thread GitBox
imback82 commented on a change in pull request #28676: URL: https://github.com/apache/spark/pull/28676#discussion_r447411442 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ## @@ -60,6 +62,92 @@ case class

[GitHub] [spark] cloud-fan commented on pull request #28916: [SPARK-32083][SQL] Coalesce to one partition when all partitions are empty in AQE

2020-06-29 Thread GitBox
cloud-fan commented on pull request #28916: URL: https://github.com/apache/spark/pull/28916#issuecomment-651527077 I think the key problem is we skip `CoalesceShufflePartitions` when `ShuffleQueryStageExec#mapStats` is None. This can happen when the input RDD of the shuffle has 0

[GitHub] [spark] HyukjinKwon commented on pull request #28940: [SPARK-32121][SHUFFLE][TEST] Fix ExternalShuffleBlockResolverSuite failed on Windows

2020-06-29 Thread GitBox
HyukjinKwon commented on pull request #28940: URL: https://github.com/apache/spark/pull/28940#issuecomment-651517591 Build started: [CORE] `org.apache.spark.network.shuffle.ExternalShuffleBlockResolverSuite`

[GitHub] [spark] viirya commented on pull request #28952: [SPARK-32056][SQL][Follow-up] Coalesce partitions for repartiotion hint and sql when AQE is enabled

2020-06-29 Thread GitBox
viirya commented on pull request #28952: URL: https://github.com/apache/spark/pull/28952#issuecomment-651516656 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] beliefer commented on a change in pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-06-29 Thread GitBox
beliefer commented on a change in pull request #28917: URL: https://github.com/apache/spark/pull/28917#discussion_r447395194 ## File path: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ## @@ -278,7 +280,26 @@ class DAGSchedulerSuite extends

[GitHub] [spark] beliefer commented on a change in pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-06-29 Thread GitBox
beliefer commented on a change in pull request #28917: URL: https://github.com/apache/spark/pull/28917#discussion_r447395077 ## File path: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ## @@ -278,7 +280,26 @@ class DAGSchedulerSuite extends

[GitHub] [spark] HyukjinKwon commented on pull request #28951: [SPARK-32131][SQL] union and set operations have wrong exception infomation

2020-06-29 Thread GitBox
HyukjinKwon commented on pull request #28951: URL: https://github.com/apache/spark/pull/28951#issuecomment-651512938 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HyukjinKwon commented on pull request #28950: [SPARK-32094][PYTHON] Update cloudpickle to v1.4.1

2020-06-29 Thread GitBox
HyukjinKwon commented on pull request #28950: URL: https://github.com/apache/spark/pull/28950#issuecomment-651512488 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HyukjinKwon commented on pull request #28950: [SPARK-32094][PYTHON] Update cloudpickle to v1.4.1

2020-06-29 Thread GitBox
HyukjinKwon commented on pull request #28950: URL: https://github.com/apache/spark/pull/28950#issuecomment-651511487 Yeah, to upgrade we should drop Python 2. I target to drop it in Spark 3.1. I will make a PR to officially drop first.

[GitHub] [spark] kiszk commented on a change in pull request #28953: [SPARK-32013][SQL] Support query execution before/after reading/writing DataFrame over JDBC

2020-06-29 Thread GitBox
kiszk commented on a change in pull request #28953: URL: https://github.com/apache/spark/pull/28953#discussion_r447389935 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala ## @@ -156,9 +157,16 @@ object JDBCRDD extends Logging

[GitHub] [spark] erenavsarogullari edited a comment on pull request #28865: [SPARK-32026][CORE][TEST] Add PrometheusServletSuite

2020-06-29 Thread GitBox
erenavsarogullari edited a comment on pull request #28865: URL: https://github.com/apache/spark/pull/28865#issuecomment-651498167 Thanks @dongjoon-hyun for the review. All comments are addressed. I think it is ready to go. Also, we plan to use Prometheus + Grafana with proposed format

[GitHub] [spark] erenavsarogullari edited a comment on pull request #28865: [SPARK-32026][CORE][TEST] Add PrometheusServletSuite

2020-06-29 Thread GitBox
erenavsarogullari edited a comment on pull request #28865: URL: https://github.com/apache/spark/pull/28865#issuecomment-651498167 Thanks @dongjoon-hyun for the review. All comments are addressed. I think it is ready to go. Also, we plan to use Prometheus + Grafana with proposed format

[GitHub] [spark] turboFei commented on pull request #26339: [SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyExistsExceptio

2020-06-29 Thread GitBox
turboFei commented on pull request #26339: URL: https://github.com/apache/spark/pull/26339#issuecomment-651498450 Gentle ping @dongjoon-hyun @dbtsai This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] erenavsarogullari commented on pull request #28865: [SPARK-32026][CORE][TEST] Add PrometheusServletSuite

2020-06-29 Thread GitBox
erenavsarogullari commented on pull request #28865: URL: https://github.com/apache/spark/pull/28865#issuecomment-651498167 Thanks @dongjoon-hyun for the review. All comments are addressed. I think it is ready to go. This is

[GitHub] [spark] erenavsarogullari commented on a change in pull request #28865: [SPARK-32026][CORE][TEST] Add PrometheusServletSuite

2020-06-29 Thread GitBox
erenavsarogullari commented on a change in pull request #28865: URL: https://github.com/apache/spark/pull/28865#discussion_r447380991 ## File path: core/src/test/scala/org/apache/spark/metrics/sink/PrometheusServletSuite.scala ## @@ -0,0 +1,81 @@ +/* + * Licensed to the

[GitHub] [spark] LuciferYang commented on pull request #26339: [SPARK-27194][SPARK-29302][SQL] For dynamic partition overwrite operation, fix speculation task conflict issue and FileAlreadyExistsExcep

2020-06-29 Thread GitBox
LuciferYang commented on pull request #26339: URL: https://github.com/apache/spark/pull/26339#issuecomment-651497369 @dongjoon-hyun @turboFei Is this PR still being worked on? We are having similar issues in our production environment, and I found there are similar PRs try to solve this

[GitHub] [spark] HeartSaVioR edited a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-29 Thread GitBox
HeartSaVioR edited a comment on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-651484627 (IMHO it might be still good chance to leverage this PR to construct a good way for versioning properly - so that version 2 can be used as an interim with best

[GitHub] [spark] dongjoon-hyun commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-29 Thread GitBox
dongjoon-hyun commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-651487745 Retest this please. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dongjoon-hyun commented on pull request #28865: [SPARK-32026][CORE][TEST] Add PrometheusServletSuite

2020-06-29 Thread GitBox
dongjoon-hyun commented on pull request #28865: URL: https://github.com/apache/spark/pull/28865#issuecomment-651487054 Retest this please. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dongjoon-hyun commented on pull request #28950: [SPARK-32094][PYTHON] Update cloudpickle to v1.4.1

2020-06-29 Thread GitBox
dongjoon-hyun commented on pull request #28950: URL: https://github.com/apache/spark/pull/28950#issuecomment-651485642 +1 for @holdenk 's advice. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] sarutak commented on pull request #28942: [SPARK-32125][UI] Support get taskList by status in Web UI and SHS Rest API

2020-06-29 Thread GitBox
sarutak commented on pull request #28942: URL: https://github.com/apache/spark/pull/28942#issuecomment-651485037 ok to test. This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28942: [SPARK-32125][UI] Support get taskList by status in Web UI and SHS Rest API

2020-06-29 Thread GitBox
AmplabJenkins removed a comment on pull request #28942: URL: https://github.com/apache/spark/pull/28942#issuecomment-650836727 Can one of the admins verify this patch? This is an automated message from the Apache Git

[GitHub] [spark] sarutak commented on pull request #28942: [SPARK-32125][UI] Support get taskList by status in Web UI and SHS Rest API

2020-06-29 Thread GitBox
sarutak commented on pull request #28942: URL: https://github.com/apache/spark/pull/28942#issuecomment-651485158 cc: @squito This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] HeartSaVioR commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-29 Thread GitBox
HeartSaVioR commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-651484627 (IMHO it might be still good to leverage this PR to be a chance to construct a good way for versioning properly - so that version 2 can be used as an interim with best

[GitHub] [spark] beliefer commented on a change in pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-06-29 Thread GitBox
beliefer commented on a change in pull request #28917: URL: https://github.com/apache/spark/pull/28917#discussion_r447372063 ## File path: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ## @@ -278,7 +280,26 @@ class DAGSchedulerSuite extends

[GitHub] [spark] HeartSaVioR edited a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-29 Thread GitBox
HeartSaVioR edited a comment on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-651479578 @zsxwing Thanks a lot for your detailed comment! I think considering all of these would take me to redesign metadata log as well as file stream source

[GitHub] [spark] HeartSaVioR edited a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-29 Thread GitBox
HeartSaVioR edited a comment on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-651479578 @zsxwing Thanks a lot for your detailed comment! I think considering all of these would take me to redesign metadata log as well as file stream source

[GitHub] [spark] HeartSaVioR edited a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-29 Thread GitBox
HeartSaVioR edited a comment on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-651479578 @zsxwing Thanks a lot for your detailed comment! I think considering all of these would take me to redesign, which wasn't a goal actually. As I commented

[GitHub] [spark] HeartSaVioR edited a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-29 Thread GitBox
HeartSaVioR edited a comment on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-651479578 @zsxwing Thanks a lot for your detailed comment! I think considering all of these would take me to redesign, which wasn't a goal actually. As I commented

[GitHub] [spark] HeartSaVioR edited a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-29 Thread GitBox
HeartSaVioR edited a comment on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-651479578 @zsxwing Thanks a lot for your detailed comment! I think considering all of these would take me to redesign, which wasn't a goal actually. As I commented

[GitHub] [spark] HeartSaVioR edited a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-29 Thread GitBox
HeartSaVioR edited a comment on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-651479578 @zsxwing Thanks a lot for your detailed comment! I think considering all of these would take me to redesign, which wasn't a goal actually. As I commented

[GitHub] [spark] HeartSaVioR commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-29 Thread GitBox
HeartSaVioR commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-651479578 @zsxwing Thanks a lot for your detailed comment! I think considering all of these would take me to redesign, which wasn't a goal actually. As I commented

[GitHub] [spark] maropu commented on pull request #28863: [SPARK-31336][SQL] Support Oracle Kerberos login in JDBC connector

2020-06-29 Thread GitBox
maropu commented on pull request #28863: URL: https://github.com/apache/spark/pull/28863#issuecomment-651478897 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] holdenk commented on a change in pull request #28864: [SPARK-32004][ALL] Drop references to slave

2020-06-29 Thread GitBox
holdenk commented on a change in pull request #28864: URL: https://github.com/apache/spark/pull/28864#discussion_r447366211 ## File path: resource-managers/mesos/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala ## @@ -48,7 +46,7 @@ private[spark] class

[GitHub] [spark] maropu commented on pull request #28953: [SPARK-32013][SQL] Support query execution before/after reading/writing DataFrame over JDBC

2020-06-29 Thread GitBox
maropu commented on pull request #28953: URL: https://github.com/apache/spark/pull/28953#issuecomment-651469890 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] maropu commented on a change in pull request #28953: [SPARK-32013][SQL] Support query execution before/after reading/writing DataFrame over JDBC

2020-06-29 Thread GitBox
maropu commented on a change in pull request #28953: URL: https://github.com/apache/spark/pull/28953#discussion_r447359906 ## File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala ## @@ -574,6 +576,41 @@ class JDBCWriteSuite extends

[GitHub] [spark] moomindani edited a comment on pull request #28953: [SPARK-32013][SQL] Support query execution before/after reading/writing DataFrame over JDBC

2020-06-29 Thread GitBox
moomindani edited a comment on pull request #28953: URL: https://github.com/apache/spark/pull/28953#issuecomment-651467865 @dilipbiswal Sure I added it in this PR description. This is an automated message from the Apache Git

[GitHub] [spark] LantaoJin commented on pull request #28935: [SPARK-20680][SQL] Adding HiveVoidType in Spark to be compatible with Hive

2020-06-29 Thread GitBox
LantaoJin commented on pull request #28935: URL: https://github.com/apache/spark/pull/28935#issuecomment-651469114 @cloud-fan I refactor some codes, now I think this PR could be no dependency. This is an automated message

[GitHub] [spark] moomindani commented on pull request #28953: [SPARK-32013][SQL] Support query execution before/after reading/writing DataFrame over JDBC

2020-06-29 Thread GitBox
moomindani commented on pull request #28953: URL: https://github.com/apache/spark/pull/28953#issuecomment-651467865 @dilipbiswal Sure I will add it in this PR description. This is an automated message from the Apache Git

[GitHub] [spark] dilipbiswal commented on pull request #28953: [SPARK-32013][SQL] Support query execution before/after reading/writing DataFrame over JDBC

2020-06-29 Thread GitBox
dilipbiswal commented on pull request #28953: URL: https://github.com/apache/spark/pull/28953#issuecomment-651466295 @moomindani Thanks. Could we illustrate the usage of these two options via examples in the PR description ? I think, it will help the reviewers.

[GitHub] [spark] xianyinxin commented on pull request #28943: [SPARK-32127][SQL]: Check rules for MERGE INTO should use MergeAction.conditition other than MergeAction.children

2020-06-29 Thread GitBox
xianyinxin commented on pull request #28943: URL: https://github.com/apache/spark/pull/28943#issuecomment-651465152 Thanks @cloud-fan ! This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] xianyinxin commented on pull request #28875: [SPARK-32030][SPARK-32127][SQL] Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-29 Thread GitBox
xianyinxin commented on pull request #28875: URL: https://github.com/apache/spark/pull/28875#issuecomment-651465114 Thanks @cloud-fan ! This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] moomindani opened a new pull request #28953: [SPARK-32013][SQL] Support query execution before/after reading/writing DataFrame over JDBC

2020-06-29 Thread GitBox
moomindani opened a new pull request #28953: URL: https://github.com/apache/spark/pull/28953 ### What changes were proposed in this pull request? This pull request is to support query execution before/after reading/writing over JDBC. There are two new options; `preActions`

[GitHub] [spark] tharradine commented on pull request #28946: [SPARK-32123][PYSPARK] Setting `spark.sql.session.timeZone` only partially respected

2020-06-29 Thread GitBox
tharradine commented on pull request #28946: URL: https://github.com/apache/spark/pull/28946#issuecomment-651455471 This isn't exactly the SPARK-32123 fix I was expecting, I was expecting the behaviour mentioned in the

[GitHub] [spark] zsxwing commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-06-29 Thread GitBox
zsxwing commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-651454246 The numbers are pretty impressive. Thanks a lot for your work. My high level comments regarding the PR: - The compression codec should not be hardcoded. It's better

[GitHub] [spark] dongjoon-hyun commented on pull request #28863: [SPARK-31336][SQL] Support Oracle Kerberos login in JDBC connector

2020-06-29 Thread GitBox
dongjoon-hyun commented on pull request #28863: URL: https://github.com/apache/spark/pull/28863#issuecomment-651449133 Hi, @gaborgsomogyi . Is `OracleKrbIntegrationSuite` missing here? This is an automated message from the

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28863: [SPARK-31336][SQL] Support Oracle Kerberos login in JDBC connector

2020-06-29 Thread GitBox
AmplabJenkins removed a comment on pull request #28863: URL: https://github.com/apache/spark/pull/28863#issuecomment-651138861 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] dongjoon-hyun commented on pull request #28863: [SPARK-31336][SQL] Support Oracle Kerberos login in JDBC connector

2020-06-29 Thread GitBox
dongjoon-hyun commented on pull request #28863: URL: https://github.com/apache/spark/pull/28863#issuecomment-651442483 Retest this please. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] rdblue commented on a change in pull request #28864: [SPARK-32004][ALL] Drop references to slave

2020-06-29 Thread GitBox
rdblue commented on a change in pull request #28864: URL: https://github.com/apache/spark/pull/28864#discussion_r447336956 ## File path: resource-managers/mesos/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala ## @@ -48,7 +46,7 @@ private[spark] class

[GitHub] [spark] rdblue commented on pull request #28864: [SPARK-32004][ALL] Drop references to slave

2020-06-29 Thread GitBox
rdblue commented on pull request #28864: URL: https://github.com/apache/spark/pull/28864#issuecomment-651442680 The updates look good. +1 This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] github-actions[bot] commented on pull request #27963: [SPARK-31199]separate shuffle io connect timeout from idle timeout

2020-06-29 Thread GitBox
github-actions[bot] commented on pull request #27963: URL: https://github.com/apache/spark/pull/27963#issuecomment-651439774 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue

[GitHub] [spark] github-actions[bot] closed pull request #27971: [SPARK-31206][SQL] AQE should not use the same SubqueryExec when reuse is off

2020-06-29 Thread GitBox
github-actions[bot] closed pull request #27971: URL: https://github.com/apache/spark/pull/27971 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] closed pull request #24939: [SPARK-18569][ML][R] Support RFormula arithmetic, I() and spark functions

2020-06-29 Thread GitBox
github-actions[bot] closed pull request #24939: URL: https://github.com/apache/spark/pull/24939 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #28865: [SPARK-32026][CORE][TEST] Add PrometheusServletSuite

2020-06-29 Thread GitBox
dongjoon-hyun commented on pull request #28865: URL: https://github.com/apache/spark/pull/28865#issuecomment-651438818 Also, please update the PR description consistently together. Thanks! This is an automated message from

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28865: [SPARK-32026][TEST] Add PrometheusServlet Unit Test coverage

2020-06-29 Thread GitBox
dongjoon-hyun commented on a change in pull request #28865: URL: https://github.com/apache/spark/pull/28865#discussion_r447332799 ## File path: core/src/test/scala/org/apache/spark/metrics/sink/PrometheusServletSuite.scala ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28865: [SPARK-32026][TEST] Add PrometheusServlet Unit Test coverage

2020-06-29 Thread GitBox
dongjoon-hyun commented on a change in pull request #28865: URL: https://github.com/apache/spark/pull/28865#discussion_r447332927 ## File path: core/src/test/scala/org/apache/spark/metrics/sink/PrometheusServletSuite.scala ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28865: [SPARK-32026][TEST] Add PrometheusServlet Unit Test coverage

2020-06-29 Thread GitBox
dongjoon-hyun commented on a change in pull request #28865: URL: https://github.com/apache/spark/pull/28865#discussion_r447332338 ## File path: core/src/test/scala/org/apache/spark/metrics/sink/PrometheusServletSuite.scala ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28865: [SPARK-32026][TEST] Add PrometheusServlet Unit Test coverage

2020-06-29 Thread GitBox
dongjoon-hyun commented on a change in pull request #28865: URL: https://github.com/apache/spark/pull/28865#discussion_r447331955 ## File path: core/src/test/scala/org/apache/spark/metrics/sink/PrometheusServletSuite.scala ## @@ -0,0 +1,81 @@ +/* + * Licensed to the Apache

[GitHub] [spark] dongjoon-hyun commented on pull request #28865: [SPARK-32026][TEST] Add PrometheusServlet Unit Test coverage

2020-06-29 Thread GitBox
dongjoon-hyun commented on pull request #28865: URL: https://github.com/apache/spark/pull/28865#issuecomment-651435667 Retest this please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] LantaoJin commented on pull request #28947: [SPARK-32129][SQL] Support AQE skew join with Union

2020-06-29 Thread GitBox
LantaoJin commented on pull request #28947: URL: https://github.com/apache/spark/pull/28947#issuecomment-651435088 retest this please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] TJX2014 commented on a change in pull request #28882: [SPARK-31751][SQL]Serde property `path` overwrites hive table property location

2020-06-29 Thread GitBox
TJX2014 commented on a change in pull request #28882: URL: https://github.com/apache/spark/pull/28882#discussion_r447323119 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala ## @@ -218,4 +219,26 @@ class HiveExternalCatalogSuite

[GitHub] [spark] viirya commented on pull request #28952: [SPARK-32056][SQL][Follow-up] Coalesce partitions for repartiotion hint and sql when AQE is enabled

2020-06-29 Thread GitBox
viirya commented on pull request #28952: URL: https://github.com/apache/spark/pull/28952#issuecomment-651432458 I also think this might be worth creating a new jira ticket, but as initially we discussed it as follow-up. So I put it as a follow-up first.

[GitHub] [spark] viirya opened a new pull request #28952: [SPARK-32056][SQL][Follow-up] Coalesce partitions for repartiotion hint and sql when AQE is enabled

2020-06-29 Thread GitBox
viirya opened a new pull request #28952: URL: https://github.com/apache/spark/pull/28952 ### What changes were proposed in this pull request? As the followup of #28900, this patch extends coalescing partitions to repartitioning using hints and SQL syntax without

[GitHub] [spark] rajatahujaatinmobi commented on a change in pull request #28880: [SPARK-29465][YARN][WEBUI] Adding Check to not to set UI port (spark.ui.port) property if mentioned explicitly

2020-06-29 Thread GitBox
rajatahujaatinmobi commented on a change in pull request #28880: URL: https://github.com/apache/spark/pull/28880#discussion_r446745968 ## File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ## @@ -211,9 +211,11 @@

[GitHub] [spark] TJX2014 commented on a change in pull request #28882: [SPARK-31751][SQL]Serde property `path` overwrites hive table property location

2020-06-29 Thread GitBox
TJX2014 commented on a change in pull request #28882: URL: https://github.com/apache/spark/pull/28882#discussion_r447323119 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala ## @@ -218,4 +219,26 @@ class HiveExternalCatalogSuite

[GitHub] [spark] TJX2014 commented on pull request #28918: [SPARK-32068][WEBUI] Correct task lauchtime show issue due to timezone in stage tab

2020-06-29 Thread GitBox
TJX2014 commented on pull request #28918: URL: https://github.com/apache/spark/pull/28918#issuecomment-651425597 Thanks all for your suggestion and attention very much :-) This is an automated message from the Apache Git

[GitHub] [spark] TJX2014 commented on a change in pull request #28926: [SPARK-32133][SQL] Forbid time field steps for date start/end in Sequence

2020-06-29 Thread GitBox
TJX2014 commented on a change in pull request #28926: URL: https://github.com/apache/spark/pull/28926#discussion_r447317750 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2623,8 +2628,16 @@ object Sequence

[GitHub] [spark] TJX2014 commented on a change in pull request #28926: [SPARK-32133][SQL] Forbid time field steps for date start/end in Sequence

2020-06-29 Thread GitBox
TJX2014 commented on a change in pull request #28926: URL: https://github.com/apache/spark/pull/28926#discussion_r447316906 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2612,6 +2614,9 @@ object Sequence

[GitHub] [spark] holdenk commented on pull request #28850: [SPARK-32015][Core]Remote inheritable thread local variables after spark context is stopped

2020-06-29 Thread GitBox
holdenk commented on pull request #28850: URL: https://github.com/apache/spark/pull/28850#issuecomment-651421291 I could see this being useful in testing using something like `spark-testing-base`, you often want a fresh Spark context but not a whole fresh JVM.

[GitHub] [spark] holdenk commented on pull request #28864: [SPARK-32004][ALL] Drop references to slave

2020-06-29 Thread GitBox
holdenk commented on pull request #28864: URL: https://github.com/apache/spark/pull/28864#issuecomment-651420282 Let me know when you've had a chance @tgravescs :) This is an automated message from the Apache Git Service. To

[GitHub] [spark] holdenk commented on pull request #28864: [SPARK-32004][ALL] Drop references to slave

2020-06-29 Thread GitBox
holdenk commented on pull request #28864: URL: https://github.com/apache/spark/pull/28864#issuecomment-651420495 > > The only other thing is that the use of the Mesos API stands out. We could address that as well. Types could be renamed when imported, or we could create subclasses and use

[GitHub] [spark] holdenk commented on a change in pull request #28911: [SPARK-32077][CORE] Support host-local shuffle data reading when external shuffle service is disabled

2020-06-29 Thread GitBox
holdenk commented on a change in pull request #28911: URL: https://github.com/apache/spark/pull/28911#discussion_r447311636 ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -1391,10 +1391,11 @@ package object config { private[spark]

[GitHub] [spark] zhli1142015 commented on pull request #28949: [SPARK-32028][WEBUI][FOLLOWUP] fix app id link for multi attempts app in history summary page

2020-06-29 Thread GitBox
zhli1142015 commented on pull request #28949: URL: https://github.com/apache/spark/pull/28949#issuecomment-651418623 @srowen , thanks for taking care this. This looks good to me. This is an automated message from the Apache

[GitHub] [spark] holdenk commented on pull request #28951: [SPARK-32131][SQL] union and set operations have wrong exception infomation

2020-06-29 Thread GitBox
holdenk commented on pull request #28951: URL: https://github.com/apache/spark/pull/28951#issuecomment-651415959 Good catch. LGTM but I'll leave it for a bit of a SQL committer has any thoughts. This is an automated message

[GitHub] [spark] holdenk commented on pull request #28950: [SPARK-32094][PYTHON] Update cloudpickle to v1.4.1

2020-06-29 Thread GitBox
holdenk commented on pull request #28950: URL: https://github.com/apache/spark/pull/28950#issuecomment-651414728 Thanks for the ping @dongjoon-hyun & thanks for working on this PR @codesue, I've been meaning to take a look at cloudpickle's updates. @viirya I think backporting cloudpickle

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28946: [SPARK-32123][PYSPARK] Setting `spark.sql.session.timeZone` only partially respected

2020-06-29 Thread GitBox
AmplabJenkins removed a comment on pull request #28946: URL: https://github.com/apache/spark/pull/28946#issuecomment-651096568 Can one of the admins verify this patch? This is an automated message from the Apache Git

[GitHub] [spark] holdenk commented on pull request #28946: [SPARK-32123][PYSPARK] Setting `spark.sql.session.timeZone` only partially respected

2020-06-29 Thread GitBox
holdenk commented on pull request #28946: URL: https://github.com/apache/spark/pull/28946#issuecomment-651413402 Jenkins ok to test cc @BryanCutler This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28944: [SPARK-32128][SQL]import SQLConf.PARTITION_OVERWRITE_VERIFY_PATH config

2020-06-29 Thread GitBox
AmplabJenkins removed a comment on pull request #28944: URL: https://github.com/apache/spark/pull/28944#issuecomment-650986142 Can one of the admins verify this patch? This is an automated message from the Apache Git

[GitHub] [spark] holdenk commented on pull request #28944: [SPARK-32128][SQL]import SQLConf.PARTITION_OVERWRITE_VERIFY_PATH config

2020-06-29 Thread GitBox
holdenk commented on pull request #28944: URL: https://github.com/apache/spark/pull/28944#issuecomment-651413175 Jenkins ok to test This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28933: [SPARK-32104][SQL]Avoid full outer join OOM on skewed dataset

2020-06-29 Thread GitBox
AmplabJenkins removed a comment on pull request #28933: URL: https://github.com/apache/spark/pull/28933#issuecomment-650117903 Can one of the admins verify this patch? This is an automated message from the Apache Git

[GitHub] [spark] holdenk commented on pull request #28933: [SPARK-32104][SQL]Avoid full outer join OOM on skewed dataset

2020-06-29 Thread GitBox
holdenk commented on pull request #28933: URL: https://github.com/apache/spark/pull/28933#issuecomment-651412836 Jenkins ok to test This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] holdenk commented on pull request #28924: [SPARK-32091][CORE] Ignore timeout error when remove blocks on the lost executor

2020-06-29 Thread GitBox
holdenk commented on pull request #28924: URL: https://github.com/apache/spark/pull/28924#issuecomment-651412453 Also for `user facing` change maybe "less failures" which is good and we should call out here so we can mention it in the release notes and encourage folks to upgrade.

[GitHub] [spark] holdenk commented on a change in pull request #28924: [SPARK-32091][CORE] Ignore timeout error when remove blocks on the lost executor

2020-06-29 Thread GitBox
holdenk commented on a change in pull request #28924: URL: https://github.com/apache/spark/pull/28924#discussion_r447303794 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala ## @@ -95,6 +97,13 @@ class BlockManagerMasterEndpoint(

[GitHub] [spark] manuzhang commented on pull request #28916: [SPARK-32083][SQL] Coalesce to one partition when all partitions are empty in AQE

2020-06-29 Thread GitBox
manuzhang commented on pull request #28916: URL: https://github.com/apache/spark/pull/28916#issuecomment-651407512 @viirya @cloud-fan I've updated the PR description with an example. This is more of an improvement I propose for certain cases. Please let me know whether it makes sense.

[GitHub] [spark] warrenzhu25 commented on pull request #28942: [SPARK-32125][UI] Support get taskList by status in Web UI and SHS Rest API

2020-06-29 Thread GitBox
warrenzhu25 commented on pull request #28942: URL: https://github.com/apache/spark/pull/28942#issuecomment-651399906 > Hi @warrenzhu25 , thank you for your contribution. > This PR seems to add a new feature so could you add a testcase for it? > You can find tests for the status API in

[GitHub] [spark] holdenk commented on pull request #28619: [SPARK-21040][CORE] Speculate tasks which are running on decommission executors

2020-06-29 Thread GitBox
holdenk commented on pull request #28619: URL: https://github.com/apache/spark/pull/28619#issuecomment-651388307 Took a quick look, thanks for working on this. I think having a timeout to kill the executors regardless (e.g. a max decommissioning time) and the speculation are both useful.

[GitHub] [spark] gengliangwang commented on a change in pull request #28948: [SPARK-31935][SQL][FOLLOWUP] Hadoop file system config should be effective in data source options

2020-06-29 Thread GitBox
gengliangwang commented on a change in pull request #28948: URL: https://github.com/apache/spark/pull/28948#discussion_r447272447 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ## @@ -248,12 +248,17 @@ class CacheManager extends

[GitHub] [spark] holdenk commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-29 Thread GitBox
holdenk commented on a change in pull request #28708: URL: https://github.com/apache/spark/pull/28708#discussion_r447249716 ## File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ## @@ -242,8 +244,7 @@ private[spark] class BlockManager( private var

[GitHub] [spark] holdenk commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-29 Thread GitBox
holdenk commented on a change in pull request #28708: URL: https://github.com/apache/spark/pull/28708#discussion_r447247802 ## File path: core/src/main/scala/org/apache/spark/storage/BlockId.scala ## @@ -40,6 +40,9 @@ sealed abstract class BlockId { def isRDD: Boolean =

[GitHub] [spark] holdenk commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-29 Thread GitBox
holdenk commented on a change in pull request #28708: URL: https://github.com/apache/spark/pull/28708#discussion_r447247346 ## File path: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala ## @@ -148,6 +170,82 @@ private[spark] class

[GitHub] [spark] holdenk commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-29 Thread GitBox
holdenk commented on a change in pull request #28708: URL: https://github.com/apache/spark/pull/28708#discussion_r447247000 ## File path: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala ## @@ -55,6 +58,25 @@ private[spark] class

[GitHub] [spark] HeartSaVioR commented on pull request #27620: [SPARK-30866][SS] FileStreamSource: Cache fetched list of files beyond maxFilesPerTrigger as unread files

2020-06-29 Thread GitBox
HeartSaVioR commented on pull request #27620: URL: https://github.com/apache/spark/pull/27620#issuecomment-651313422 retest this, please This is an automated message from the Apache Git Service. To respond to the message,

  1   2   3   4   5   >