[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-24 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-47051463 Thanks everybody :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-24 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-47013273 Thanks Rui. I've merged this in now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does no

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-24 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/892 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-46891715 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16037/ --- If your project

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-46891709 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-46884580 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not ha

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-46884592 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-23 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-46884029 Jenkins, this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have th

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-22 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r14061614 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -181,16 +181,14 @@ private[spark] class TaskSetManager( var h

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-22 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r14061551 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -181,16 +181,14 @@ private[spark] class TaskSetManager( var hadAli

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-22 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r14059200 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -181,16 +181,14 @@ private[spark] class TaskSetManager( var h

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-22 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-46801884 Sorry about the code style and thanks @mateiz for pointing out. I've updated the patch. --- If your project is set up for it, you can reply to this email and have you

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-21 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-46767721 Hey @lirui-intel, sorry for taking a while to get back to this, but it looks good to me. Iterating through only the TaskSet's valid locality levels sounds fine. I made some

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r14052131 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -77,6 +77,10 @@ class FakeTaskScheduler(sc: SparkContext, liveExecutor

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r14052122 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -738,4 +738,21 @@ private[spark] class TaskSetManager( logDebug("V

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r14052118 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -181,16 +181,14 @@ private[spark] class TaskSetManager( var hadAli

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r14052106 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -725,10 +723,12 @@ private[spark] class TaskSetManager( private def

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r14052104 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -725,10 +723,12 @@ private[spark] class TaskSetManager( private def

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-12 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45890408 Thanks @mridulm , I've updated the patch accordingly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. I

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-11 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45732793 Just wanted to drop a quick node (since I might not be able to get to this until late next week). I think the proposal should work : though I might be missing something

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45696187 I've updated the patch. Currently, maxLocality starts from PROCESS_LOCAL (maxLocality <- TaskLocality.values). What if we make it start from highest valid level of

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45691507 Thanks for the explanation @mridulm , really appreciate it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13601836 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -388,7 +386,7 @@ private[spark] class TaskSetManager( val curTi

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13601229 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -388,7 +386,7 @@ private[spark] class TaskSetManager( val c

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13600111 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -388,7 +386,7 @@ private[spark] class TaskSetManager( val c

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13599131 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -388,7 +386,7 @@ private[spark] class TaskSetManager( val c

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45625310 Had to delete a comment and edit another : issues with reviewing PR on phone, apologies ! --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13598180 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -388,7 +386,7 @@ private[spark] class TaskSetManager( val curTi

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13597675 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -388,7 +386,7 @@ private[spark] class TaskSetManager( val curTi

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13596860 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -153,8 +153,8 @@ private[spark] class TaskSetManager( }

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13594625 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -388,7 +386,7 @@ private[spark] class TaskSetManager( val c

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45592572 @kayousterhout - I've fixed how we compute valid locality levels and added some unit test. Now computeValidLocalityLevels considers a level as valid only if some e

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-09 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45564434 Thanks @kayousterhout , I'll fix this ASAP. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your pro

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-09 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45524023 @lirui-intel sorry came up with that example too quickly! But computeValidLocalityLevels() still isn't going to work correctly with your patch; see the unit test I

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-09 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45468581 Hi @kayousterhout , thanks for pointing out this. My understanding is that, when TaskScheduler calls TaskSetManager.resourceOffer, it passes the parameter "maxLocality

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-09 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45466828 Thanks @lirui-intel for updating this! Can you please add unit tests to check that this works properly when no node-local executors for the job are immediately avai

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-06 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45398131 I've removed the delay for pendingTasksWithNoPrefs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If y

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-06 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45397807 @mridulm maybe it's better to allow users to specify how many executors they need (which is not available with standalone mode I believe)? So they can control things l

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-06 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45397685 @mateiz - I added this waiting time into every TaskSetManager because with dynamic resizing clusters (as you suggested earlier), we may add new executors when new task

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-06 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45379198 @lirui-intel regarding enough executors : if application cant get to sufficient executors, waiting for them in scheduler does not help :-) I dont think we currently expose

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-06 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45378774 @pwendell pendingTasksWithNoPrefs handles case of executor failures, tasks which have no preferences, etc. Or maybe I misunderstood the question ? As I mentioned b

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-06 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45361221 Commented on the code -- I don't think a wait time in TaskSetManager makes sense because it will apply even to TaskSets much later in the application, so you might get this

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-06 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13500107 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -54,8 +54,15 @@ private[spark] class TaskSetManager( clock: Clock

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45299022 @mateiz - sorry I meant remove the part where we add tasks to `pendingTasksWithNoPrefs` if `!hadAliveLocations` is false. --- If your project is set up for it, you can r

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45299018 Sure @kayousterhout . Can we use [SPARK-1937](https://issues.apache.org/jira/browse/SPARK-1937) for the discussion? --- If your project is set up for it, you can rep

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45298895 @mateiz - could we just remove the `pendingTasksWithNoPrefs` construct? I'm a bit confused what purpose this serves. If a task has no executors that satisfy it's constra

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45298475 @lirui-intel Let's move the discussion about NODE_LOCAL / PROCESS_LOCAL to a Jira -- since it's tangential to this change --- If your project is set up for it, you

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13474841 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -182,15 +189,16 @@ private[spark] class TaskSetManager( for (

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13474668 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -182,15 +189,16 @@ private[spark] class TaskSetManager( for

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13474171 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -182,15 +189,16 @@ private[spark] class TaskSetManager( for (

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13473499 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -54,8 +54,15 @@ private[spark] class TaskSetManager( clock: C

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45293719 Hi @kayousterhout , let's consider a map stage whose tasks all have NODE_LOCAL preference. So pendingTasksForExecutor is empty and all tasks are added to pendingTasksF

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45247880 @lirui-intel I still don't understand the problem you mentioned: in TaskSetManager.findTask(), it tries to schedule process local, node local, and rack local tasks b

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13451805 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -738,4 +750,20 @@ private[spark] class TaskSetManager( logD

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13451625 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -182,15 +189,16 @@ private[spark] class TaskSetManager( for

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13451548 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -54,8 +54,15 @@ private[spark] class TaskSetManager( clock:

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45207806 I've revised the patch as @mateiz suggested: tasks will be added to corresponding lists even when preferred location is unavailable, in which case it'll also be added

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-04 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45184290 That's true @kayousterhout , but my point is that what if all the tasks only specify NODE_LOCAL preference (common case when the RDD is created from some HDFS file)? T

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-04 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45183292 My understanding is that the problem mentioned only exists when a job has a combination of tasks with constraints and tasks without constraints. AFAIK this is not a

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-04 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45176804 Hi @mateiz , I think we should distinguish between tasks that truly have no preference, and tasks whose preference is unavailable when picking tasks from pendingTasksW

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-04 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45175977 I see, we might be able to fix that as a separate issue. We could count pendingTasksWithNoPrefs as being at a "special" level we try only after we try the real locality lev

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-04 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45175085 Yes @mateiz great idea. One quick question is that tasks in pendingTasksWithNoPrefs are considered as PROCESS_LOCAL. Suppose we have no tasks in pendingTasksForExecuto

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-04 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13421524 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -738,4 +739,13 @@ private[spark] class TaskSetManager( logDeb

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-04 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45174261 Hei @lirui-intel, I talked about this a bit with Kay and we think there's a simpler and more general approach. * In addPendingTask, always add the task to host, rack and

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-04 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13414402 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -195,7 +196,7 @@ private[spark] class TaskSetManager( }

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-04 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13413931 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -111,6 +111,8 @@ private[spark] class TaskSchedulerImpl( // This

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-04 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45156913 It looks like when you re-add tasks after an executor is added, tasks that newly have one of their preferred locations available are not removed from the pendingTask

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-04 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13410584 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -738,4 +739,13 @@ private[spark] class TaskSetManager( logD

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-05-28 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-44394787 I've made some modifications, please help to see if this makes sense :) @mridulm @rxin @kayousterhout --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-05-27 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-44356508 If I understand, the application cannot control how many executors to launch (at least with the standalone mode). New executors can be launched for the application whe

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-05-27 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-44345092 Yeah I agree with @mridulm. It seems like, with this change, you're counting on the executors to finish registering before the delay threshold runs out or before al

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-05-27 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-44343061 Tagging @kayousterhout ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-05-27 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-44285670 This is better handled in user code and not in scheduler. We need a way to surface how many executors have registered - but once that is available, spinning until some

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-05-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-44246965 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proj

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-05-27 Thread lirui-intel
GitHub user lirui-intel opened a pull request: https://github.com/apache/spark/pull/892 SPARK-1937: fix issue with task locality Don't check executor/host availability when creating a TaskSetManager. Because the executors may haven't been registered when the TaskSetManager is creat