[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...

2017-02-07 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16620 I'll spend some time today trying to sort out the relative merits of the fix options; but in the meantime, there's also no good reason for `TaskSchedulerImpl.rootPool` to be a `var` initialized

[GitHub] spark pull request #16813: [SPARK-19466][CORE][SCHEDULER] Improve Fair Sched...

2017-02-06 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/16813#discussion_r99686720 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala --- @@ -69,19 +72,29 @@ private[spark] class FairSchedulableBuilder

[GitHub] spark pull request #16813: [SPARK-19466][CORE][SCHEDULER] Improve Fair Sched...

2017-02-06 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/16813#discussion_r99686323 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala --- @@ -69,19 +72,29 @@ private[spark] class FairSchedulableBuilder

[GitHub] spark issue #16813: [SPARK-19466][CORE][SCHEDULER] Improve Fair Scheduler Lo...

2017-02-06 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16813 Looks reasonable, but I'd prefer slightly different log messages. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #16620: [SPARK-19263] DAGScheduler should avoid sending c...

2017-01-20 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/16620#discussion_r97140817 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1193,7 +1193,15 @@ class DAGScheduler

[GitHub] spark pull request #16620: [SPARK-19263] DAGScheduler should avoid sending c...

2017-01-20 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/16620#discussion_r97139832 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1193,7 +1193,15 @@ class DAGScheduler

[GitHub] spark pull request #16620: [SPARK-19263] DAGScheduler should avoid sending c...

2017-01-20 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/16620#discussion_r97139668 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1193,7 +1193,15 @@ class DAGScheduler

[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should handle stage's pending...

2017-01-17 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16620 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should handle stage's pending...

2017-01-17 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16620 Beyond the lack of new tests, this patch is causing a couple of existing DAGSchedulerSuite tests to fail for me locally: "run trivial shuffle with out-of-band failure and retry" and

[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should handle stage's pending...

2017-01-17 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16620 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should handle stage's pending...

2017-01-17 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16620 Thanks for the work thus far, @jinxing64 , but this really needs updated test coverage before we can consider merging it. @squito --- If your project is set up for it, you can reply

[GitHub] spark issue #16437: [SPARK-19028] [SQL] Fixed non-thread-safe functions used...

2016-12-31 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16437 Please update the description in the JIRA ticket. What is there now is simply not adequate, meaning that if anyone has to come back and address this issue some time on the future, what

[GitHub] spark issue #16291: [SPARK-18838][CORE] Use separate executor service for ea...

2016-12-22 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16291 > Each listener should in theory could work independent of each other and we should only guarantee ordered processing of the events within a listener. If we were starting from noth

[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...

2016-12-19 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/12775 Yeah, that's ok, @kayousterhout --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16291: [SPARK-18838][CORE] Use separate executor service for ea...

2016-12-19 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16291 > You are right, that's why the executor service is single threaded which guarantees ordered processing of the events per listener. But this is still substantially different f

[GitHub] spark issue #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure refactor...

2016-12-15 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15335 It's worth something, but not a lot. I think it's worth merging, but if someone thinks it's not, I'm not going to fight it. Read it and merge if you want to. --- If your project is set up

[GitHub] spark issue #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure refactor...

2016-12-15 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15335 @vanzin It should be committed if you think it adds enough additional clarity that it is worth the penalty of making future backporting or other debugging maintenance a little more difficult

[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...

2016-12-12 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r92005198 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -161,12 +163,7 @@ private[spark] class Executor( * @param

[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...

2016-12-09 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r91764815 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -84,6 +84,16 @@ private[spark] class Executor( // Start worker

[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...

2016-12-07 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r91364564 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -161,12 +163,7 @@ private[spark] class Executor( * @param

[GitHub] spark issue #16065: [SPARK-18631][SQL] Changed ExchangeCoordinator re-partit...

2016-11-29 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/16065 @rxin fixed it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16065: [SPARK-17064][SQL] Changed ExchangeCoordinator re...

2016-11-29 Thread markhamstra
GitHub user markhamstra opened a pull request: https://github.com/apache/spark/pull/16065 [SPARK-17064][SQL] Changed ExchangeCoordinator re-partitioning to avoid additional data … ## What changes were proposed in this pull request? Re-partitioning logic

[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...

2016-11-28 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15986 Thanks, @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...

2016-11-28 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15986 If Kay is happy with the last couple of changes, then I'm fine with this, too. The only tiny nit I've still got is a change from `runningTasksByExecutors()` to `runningTasksByExecutors

[GitHub] spark pull request #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskS...

2016-11-23 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15986#discussion_r89423666 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -335,31 +337,31 @@ private[spark] class TaskSchedulerImpl

[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...

2016-11-23 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15986 The code looks pretty good @JoshRosen , but I still want to spend some time looking at your standalone end-to-end reproduction to get more familiar with the details. --- If your project

[GitHub] spark pull request #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskS...

2016-11-23 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15986#discussion_r89411560 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -88,10 +88,12 @@ private[spark] class TaskSchedulerImpl

[GitHub] spark issue #11888: [SPARK-14069][SQL] Improve SparkStatusTracker to also tr...

2016-11-23 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/11888 @rxin Do you mean the N/A in "How was this patch tested?" Some guy said that the lack of tests was ok. https://github.com/apache/spark/pull/11888#issuecomment-200162987 --- If yo

[GitHub] spark issue #15463: [SPARK-17894] [CORE] Ensure uniqueness of TaskSetManager...

2016-10-21 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15463 Whoa! Jenkins disrespects @kayousterhout -- bad Jenkins! Or did you actually fix something @shivaram ? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #15531: [SQL][STREAMING][TEST] Follow up to remove Option.contai...

2016-10-18 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15531 @rxin Nonsense; it's just as good an idea as are List.contains and Set.contains. The only problem with it is that it doesn't exist in Scala 2.10. --- If your project is set up for it, you can

[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...

2016-10-17 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15335#discussion_r83710471 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1255,27 +1255,46 @@ class DAGScheduler( s"l

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-10-13 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/14158 @nblintao Got it; thanks. There may be distinct queries that will be entirely the same within the first 1000 characters, but that's just the nature of working with these very large queries

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-10-13 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/14158 @nblintao I really haven't reviewed these changes closely enough to have a specific complaint or concern in mind, but I'm more concerned about what happens when you ask to see "more&

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-10-13 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/14158 It's great if it is already addressed, but I just didn't see anything explicit in the discussion or examples that showed any query of the magnitude that I am talking about. Machine-generated

[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-10-13 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/14158 Please be certain that this works well even for very large queries. They are not commonplace, but I know that Spark SQL does sometimes get asked to handle SQL queries that are hundreds

[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...

2016-10-13 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15335#discussion_r83289400 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1255,27 +1255,46 @@ class DAGScheduler( s"l

[GitHub] spark pull request #15441: [SPARK-4411] [Web UI] Add "kill" link for jobs in...

2016-10-12 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15441#discussion_r83105534 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobsTab.scala --- @@ -35,4 +37,20 @@ private[ui] class JobsTab(parent: SparkUI) extends

[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...

2016-10-12 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15335#discussion_r83073697 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1255,27 +1255,46 @@ class DAGScheduler( s"l

[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...

2016-10-12 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15335#discussion_r82944965 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1255,27 +1255,46 @@ class DAGScheduler( s"l

[GitHub] spark pull request #15441: [SPARK-4411] [Web UI] Add "kill" link for jobs in...

2016-10-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15441#discussion_r82925875 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobsTab.scala --- @@ -35,4 +37,18 @@ private[ui] class JobsTab(parent: SparkUI) extends

[GitHub] spark pull request #15441: [SPARK-4411] [Web UI] Add "kill" link for jobs in...

2016-10-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15441#discussion_r82924909 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobsTab.scala --- @@ -35,4 +37,18 @@ private[ui] class JobsTab(parent: SparkUI) extends

[GitHub] spark pull request #15441: [SPARK-4411] [Web UI] Add "kill" link for jobs in...

2016-10-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15441#discussion_r82923313 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobsTab.scala --- @@ -35,4 +37,18 @@ private[ui] class JobsTab(parent: SparkUI) extends

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-10-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r82891721 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -98,6 +84,14 @@ private[spark] class TaskSetManager( var

[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...

2016-10-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15335#discussion_r82869819 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1255,27 +1255,46 @@ class DAGScheduler( s"l

[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...

2016-10-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15335#discussion_r82864060 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1255,27 +1255,46 @@ class DAGScheduler( s"l

[GitHub] spark pull request #12775: [SPARK-14958][Core] Failed task not handled when ...

2016-10-05 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/12775#discussion_r82079358 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala --- @@ -135,8 +135,9 @@ private[spark] class TaskResultGetter(sparkEnv

[GitHub] spark issue #15350: [SPARK-17778][Tests]Mock SparkContext to reduce memory u...

2016-10-05 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15350 @zsxwing `build/mvn -U -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Dpyspark -Dsparkr test` completely succeeded on one of my machines where it was previously failing

[GitHub] spark issue #15326: [SPARK-17759] [CORE] FairSchedulableBuilder should avoid...

2016-10-03 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15326 @erenavsarogullari Your concern was entirely legitimate, and is also why I called in @kayousterhout to double-check my claim that other duplicate Schedulables would also be a problem

[GitHub] spark issue #15326: [SPARK-17759] [CORE] FairSchedulableBuilder should avoid...

2016-10-03 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15326 ...and it would be 2.0.2 at this point. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15326: [SPARK-17759] [CORE] FairSchedulableBuilder should avoid...

2016-10-03 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15326 @kayousterhout @rxin Ok, but if we're going to change the behavior, then we need to be sure that change at least makes it into the release notes. --- If your project is set up for it, you can

[GitHub] spark issue #15335: Some FetchFailure refactoring

2016-10-03 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15335 @squito the promised follow-up PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #15335: Some FetchFailure refactoring

2016-10-03 Thread markhamstra
GitHub user markhamstra opened a pull request: https://github.com/apache/spark/pull/15335 Some FetchFailure refactoring ## What changes were proposed in this pull request? Readability rewrites. Changed order of `failedStage.failedOnFetchAndShouldAbort

[GitHub] spark pull request #15301: [SPARK-17717][SQL] Add exist/find methods to Cata...

2016-09-29 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15301#discussion_r81270990 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala --- @@ -102,6 +102,83 @@ abstract class Catalog { def listColumns

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-09-28 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r81036771 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -592,34 +587,54 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-09-28 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r81035029 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -478,8 +473,8 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-09-28 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r81024744 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -421,7 +412,11 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-09-28 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r81022074 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -266,19 +263,11 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-09-28 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r81020674 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-09-28 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r81020284 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-09-28 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r81018983 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -50,22 +48,12 @@ import org.apache.spark.util.{AccumulatorV2, Clock

[GitHub] spark issue #15213: [SPARK-17644] [CORE] Do not add failedStages when abortS...

2016-09-27 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15213 > Honestly, I think just getting the fix in is important enough that I'm fine w/ putting in the minimally invasive thing now. That's fine, @squito -- go ahead and merge when you

[GitHub] spark issue #15221: [SPARK-17648][CORE] TaskScheduler really needs offers to...

2016-09-27 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15221 I'm glad Jenkins finally got it together. LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15213: [SPARK-17644] [CORE] Do not add failedStages when abortS...

2016-09-26 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15213 @scwf I understand that you were trying to make the least invasive fix possible to deal with the problem. That's usually a good thing to do, but even when that kind of fix is getting

[GitHub] spark issue #15213: [SPARK-17644] [CORE] Do not add failedStages when abortS...

2016-09-26 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15213 The fix is logically correct; however, the prior code is needlessly complex and not as easy to understand as it should be, and the proposed fix doesn't improve on that. I'd like to take

[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...

2016-09-23 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15213 Right, but `abortStage` occurs elsewhere. "When abort stage" seems to imply that this fix is necessary for all usages of `abortStage` when the actual problem is not in `abortStage`

[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...

2016-09-23 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15213 @scwf That description would actually be at least as bad since there are multiple routes to `abortStage` and this issue of adding to `failedStages` only applies to these two. I'll take another

[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...

2016-09-23 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15213 Ok, that makes better sense. The `disallowStageRetryForTest` case doesn't worry me too much since it is only used in tests. If we can fix this case, great; else if it remains possible

[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...

2016-09-23 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15213 This doesn't make sense to me. The DAGSchedulerEventProcessLoop runs on a single thread and processes a single event from its queue at a time. When the first CompletionEvent is run

[GitHub] spark issue #15084: [SPARK-17529][core] Implement BitSet.clearUntil and use ...

2016-09-13 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15084 @rxin @jegonzal looking for a third-party review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15084: [SPARK-17529][core] Implement BitSet.clearUntil and use ...

2016-09-13 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/15084 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14952: [SPARK-17110] Fix StreamCorruptionException in Bl...

2016-09-06 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/14952#discussion_r77697794 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -520,10 +520,11 @@ private[spark] class BlockManager

[GitHub] spark issue #14958: [SPARK-17378] [BUILD] Upgrade snappy-java to 1.1.2.6

2016-09-06 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/14958 @srowen The bug does manifest in Spark. I've been running this upgrade in production for a few weeks with no issues. This should be merged IMO. --- If your project is set up for it, you can

[GitHub] spark pull request #14952: [SPARK-17110] Fix StreamCorruptionException in Bl...

2016-09-04 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/14952#discussion_r77459444 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -520,10 +520,11 @@ private[spark] class BlockManager

[GitHub] spark pull request #14737: [SPARK-17171][WEB UI] DAG will list all partition...

2016-08-21 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/14737#discussion_r75602737 --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala --- @@ -141,6 +141,7 @@ private[spark] object SparkUI { val DEFAULT_POOL_NAME

[GitHub] spark pull request #14737: [SPARK-17171][WEB UI] DAG will list all partition...

2016-08-21 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/14737#discussion_r75602575 --- Diff: core/src/main/scala/org/apache/spark/ui/scope/RDDOperationGraph.scala --- @@ -119,18 +119,47 @@ private[ui] object RDDOperationGraph extends

[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...

2016-08-15 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/14557 There are multiple issues with this PR. Some are at a more stylistic level, but some include deeper issues -- e.g. see SPARK-17064. Most fundamentally, this PR is the wrong solution at least

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-08-15 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/12436 See https://issues.apache.org/jira/browse/SPARK-17064 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #14534: [SPARK-16941]Add SynchronizedMap trait with Map i...

2016-08-08 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/14534#discussion_r73908742 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/server/SparkSQLOperationManager.scala --- @@ -39,8 +39,10

[GitHub] spark issue #14533: [SPARK-16606] [CORE] isleading warning for SparkContext....

2016-08-08 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/14533 PR title typo? Intended "misleading"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have th

[GitHub] spark issue #14332: [SPARK-16694] [CORE] Use for/foreach rather than map for...

2016-07-25 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/14332 The Scala API docs clearly specify that the most idiomatic usage of Option is to treat it as a collection or monad and use map, flatMap, filter, or foreach. Those docs also clearly specify

[GitHub] spark pull request #14332: [SPARK-16694] [CORE] Use for/foreach rather than ...

2016-07-25 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/14332#discussion_r72014373 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/DataFrameExample.scala --- @@ -54,14 +54,13 @@ object DataFrameExample

[GitHub] spark pull request #14332: [SPARK-16694] [CORE] Use for/foreach rather than ...

2016-07-24 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/14332#discussion_r7120 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/DataFrameExample.scala --- @@ -54,14 +54,13 @@ object DataFrameExample

[GitHub] spark pull request #14332: [SPARK-16694] [CORE] Use for/foreach rather than ...

2016-07-24 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/14332#discussion_r71993558 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/DataFrameExample.scala --- @@ -54,14 +54,13 @@ object DataFrameExample

[GitHub] spark pull request #14332: [SPARK-16694] [CORE] Use for/foreach rather than ...

2016-07-24 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/14332#discussion_r71993490 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/DataFrameExample.scala --- @@ -54,14 +54,13 @@ object DataFrameExample

[GitHub] spark pull request #14332: [SPARK-16694] [CORE] Use for/foreach rather than ...

2016-07-24 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/14332#discussion_r71993163 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/DataFrameExample.scala --- @@ -54,14 +54,13 @@ object DataFrameExample

[GitHub] spark issue #14330: [SPARK-16693][SPARKR] Remove methods deprecated in 2.0.0...

2016-07-23 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/14330 See JIRA comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-05 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/14039 I haven't got anything more concrete to offer at this time than the descriptions in the relevant JIRA's, but I do have this running in production with 1.6, and it does work. Essentially, you

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/14039 Actually, they can be reused -- not in Spark as distributed, but it is an open question whether reusing shuffle files within Spark SQL is something that we should be doing and want to support

[GitHub] spark issue #13685: [SPARK-15963][CORE] Catch `TaskKilledException` correctl...

2016-06-23 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/13685 LGTM. Good work! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #13677: [SPARK 15926] Improve readability of DAGScheduler stage ...

2016-06-15 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/13677 LGTM, and runs without flakiness for me when rebased onto master with the https://github.com/apache/spark/pull/13688 HOTFIX. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #13688: [HOTFIX] [CORE] fix flaky BasicSchedulerIntegrationTest

2016-06-15 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/13688 As a HOTIFX, LGTM; but I agree that there is room for a follow-up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #13677: [SPARK 15926] Improve readability of DAGScheduler stage ...

2016-06-14 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/13677 > One goal of this change it to make it clearer which functions may create new stages (as opposed to looking up stages that already exist). Something that I have been looking at of l

[GitHub] spark issue #13646: [SPARK-15927] Eliminate redundant DAGScheduler code.

2016-06-14 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/13646 LGTM, but I agree with Imran's renaming suggestion, and his new test looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #13591: [Minor] Replace all occurrences of None: Option[X] with ...

2016-06-10 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/13591 Yeah, I wouldn't bother. The way this PR does it is arguably better because it can be done consistently across all collections: ```scala val l0 = List.empty[String] val s0

[GitHub] spark pull request: [SPARK-9876][SQL]: Update Parquet to 1.8.1.

2016-06-01 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/13280 Oh, wait... sorry, I just realized that @liancheng said he also merged to branch-2.0. +1 on reverting that. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-9876][SQL]: Update Parquet to 1.8.1.

2016-06-01 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/13280 @rxin Huh? The merge was to master, not branch-2.0. Doesn't that put it on the 2.1 track and not into 2.0.0? I think that is all that Yin was saying, that @rdblue was mistaken about

[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...

2016-05-27 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/12951#issuecomment-30643 Added my comments to the JIRA. In short, I think there is a legitimate use case for this, and there is a significant gap in our current fair-scheduling pool API

[GitHub] spark pull request: [SPARK-10372] [CORE] basic test framework for ...

2016-05-24 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/8559#issuecomment-221414334 @squito Yeah, that's fine. I haven't gone through the new tests closely to make sure that they are doing what they say they are doing, but the changes to both non

[GitHub] spark pull request: [SPARK-10372] [CORE] basic test framework for ...

2016-05-23 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/8559#issuecomment-221001489 @squito I don't feel really strongly about it, either. My only concern was for others adding tests needing a mock ExternalClusterManager in the future

[GitHub] spark pull request: [SPARK-10372] [CORE] basic test framework for ...

2016-05-22 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/8559#discussion_r64170026 --- Diff: core/src/test/resources/META-INF/services/org.apache.spark.scheduler.ExternalClusterManager --- @@ -1 +1,2

<    1   2   3   4   5   6   7   >