Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-47051463
Thanks everybody :-)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-47013273
Thanks Rui. I've merged this in now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does no
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/892
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabl
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-46891715
All automated tests passed.
Refer to this link for build results:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16037/
---
If your project
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-46891709
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-46884580
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not ha
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-46884592
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-46884029
Jenkins, this is ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have th
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r14061614
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -181,16 +181,14 @@ private[spark] class TaskSetManager(
var h
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r14061551
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -181,16 +181,14 @@ private[spark] class TaskSetManager(
var hadAli
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r14059200
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -181,16 +181,14 @@ private[spark] class TaskSetManager(
var h
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-46801884
Sorry about the code style and thanks @mateiz for pointing out. I've
updated the patch.
---
If your project is set up for it, you can reply to this email and have you
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-46767721
Hey @lirui-intel, sorry for taking a while to get back to this, but it
looks good to me. Iterating through only the TaskSet's valid locality levels
sounds fine. I made some
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r14052131
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
@@ -77,6 +77,10 @@ class FakeTaskScheduler(sc: SparkContext, liveExecutor
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r14052122
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -738,4 +738,21 @@ private[spark] class TaskSetManager(
logDebug("V
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r14052118
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -181,16 +181,14 @@ private[spark] class TaskSetManager(
var hadAli
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r14052106
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -725,10 +723,12 @@ private[spark] class TaskSetManager(
private def
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r14052104
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -725,10 +723,12 @@ private[spark] class TaskSetManager(
private def
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45890408
Thanks @mridulm , I've updated the patch accordingly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. I
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45732793
Just wanted to drop a quick node (since I might not be able to get to this
until late next week).
I think the proposal should work : though I might be missing something
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45696187
I've updated the patch.
Currently, maxLocality starts from PROCESS_LOCAL (maxLocality <-
TaskLocality.values). What if we make it start from highest valid level of
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45691507
Thanks for the explanation @mridulm , really appreciate it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13601836
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -388,7 +386,7 @@ private[spark] class TaskSetManager(
val curTi
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13601229
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -388,7 +386,7 @@ private[spark] class TaskSetManager(
val c
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13600111
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -388,7 +386,7 @@ private[spark] class TaskSetManager(
val c
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13599131
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -388,7 +386,7 @@ private[spark] class TaskSetManager(
val c
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45625310
Had to delete a comment and edit another : issues with reviewing PR on
phone, apologies !
---
If your project is set up for it, you can reply to this email and have your
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13598180
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -388,7 +386,7 @@ private[spark] class TaskSetManager(
val curTi
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13597675
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -388,7 +386,7 @@ private[spark] class TaskSetManager(
val curTi
Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13596860
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -153,8 +153,8 @@ private[spark] class TaskSetManager(
}
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13594625
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -388,7 +386,7 @@ private[spark] class TaskSetManager(
val c
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45592572
@kayousterhout - I've fixed how we compute valid locality levels and added
some unit test.
Now computeValidLocalityLevels considers a level as valid only if some
e
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45564434
Thanks @kayousterhout , I'll fix this ASAP.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pro
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45524023
@lirui-intel sorry came up with that example too quickly! But
computeValidLocalityLevels() still isn't going to work correctly with your
patch; see the unit test I
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45468581
Hi @kayousterhout , thanks for pointing out this. My understanding is that,
when TaskScheduler calls TaskSetManager.resourceOffer, it passes the parameter
"maxLocality
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45466828
Thanks @lirui-intel for updating this! Can you please add unit tests to
check that this works properly when no node-local executors for the job are
immediately avai
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45398131
I've removed the delay for pendingTasksWithNoPrefs
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If y
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45397807
@mridulm maybe it's better to allow users to specify how many executors
they need (which is not available with standalone mode I believe)? So they can
control things l
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45397685
@mateiz - I added this waiting time into every TaskSetManager because with
dynamic resizing clusters (as you suggested earlier), we may add new executors
when new task
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45379198
@lirui-intel regarding enough executors : if application cant get to
sufficient executors, waiting for them in scheduler does not help :-) I dont
think we currently expose
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45378774
@pwendell pendingTasksWithNoPrefs handles case of executor failures, tasks
which have no preferences, etc. Or maybe I misunderstood the question ?
As I mentioned b
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45361221
Commented on the code -- I don't think a wait time in TaskSetManager makes
sense because it will apply even to TaskSets much later in the application, so
you might get this
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13500107
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -54,8 +54,15 @@ private[spark] class TaskSetManager(
clock: Clock
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45299022
@mateiz - sorry I meant remove the part where we add tasks to
`pendingTasksWithNoPrefs` if `!hadAliveLocations` is false.
---
If your project is set up for it, you can r
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45299018
Sure @kayousterhout . Can we use
[SPARK-1937](https://issues.apache.org/jira/browse/SPARK-1937) for the
discussion?
---
If your project is set up for it, you can rep
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45298895
@mateiz - could we just remove the `pendingTasksWithNoPrefs` construct?
I'm a bit confused what purpose this serves. If a task has no executors that
satisfy it's constra
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45298475
@lirui-intel Let's move the discussion about NODE_LOCAL / PROCESS_LOCAL to
a Jira -- since it's tangential to this change
---
If your project is set up for it, you
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13474841
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -182,15 +189,16 @@ private[spark] class TaskSetManager(
for (
Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13474668
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -182,15 +189,16 @@ private[spark] class TaskSetManager(
for
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13474171
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -182,15 +189,16 @@ private[spark] class TaskSetManager(
for (
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13473499
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -54,8 +54,15 @@ private[spark] class TaskSetManager(
clock: C
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45293719
Hi @kayousterhout , let's consider a map stage whose tasks all have
NODE_LOCAL preference. So pendingTasksForExecutor is empty and all tasks are
added to pendingTasksF
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45247880
@lirui-intel I still don't understand the problem you mentioned: in
TaskSetManager.findTask(), it tries to schedule process local, node local, and
rack local tasks b
Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13451805
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -738,4 +750,20 @@ private[spark] class TaskSetManager(
logD
Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13451625
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -182,15 +189,16 @@ private[spark] class TaskSetManager(
for
Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13451548
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -54,8 +54,15 @@ private[spark] class TaskSetManager(
clock:
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45207806
I've revised the patch as @mateiz suggested: tasks will be added to
corresponding lists even when preferred location is unavailable, in which case
it'll also be added
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45184290
That's true @kayousterhout , but my point is that what if all the tasks
only specify NODE_LOCAL preference (common case when the RDD is created from
some HDFS file)? T
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45183292
My understanding is that the problem mentioned only exists when a job has a
combination of tasks with constraints and tasks without constraints. AFAIK
this is not a
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45176804
Hi @mateiz , I think we should distinguish between tasks that truly have no
preference, and tasks whose preference is unavailable when picking tasks from
pendingTasksW
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45175977
I see, we might be able to fix that as a separate issue. We could count
pendingTasksWithNoPrefs as being at a "special" level we try only after we try
the real locality lev
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45175085
Yes @mateiz great idea. One quick question is that tasks in
pendingTasksWithNoPrefs are considered as PROCESS_LOCAL. Suppose we have no
tasks in pendingTasksForExecuto
Github user lirui-intel commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13421524
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -738,4 +739,13 @@ private[spark] class TaskSetManager(
logDeb
Github user mateiz commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45174261
Hei @lirui-intel, I talked about this a bit with Kay and we think there's a
simpler and more general approach.
* In addPendingTask, always add the task to host, rack and
Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13414402
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -195,7 +196,7 @@ private[spark] class TaskSetManager(
}
Github user mateiz commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13413931
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -111,6 +111,8 @@ private[spark] class TaskSchedulerImpl(
// This
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-45156913
It looks like when you re-add tasks after an executor is added, tasks that
newly have one of their preferred locations available are not removed from the
pendingTask
Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/892#discussion_r13410584
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -738,4 +739,13 @@ private[spark] class TaskSetManager(
logD
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-44394787
I've made some modifications, please help to see if this makes sense :)
@mridulm @rxin @kayousterhout
---
If your project is set up for it, you can reply to this
Github user lirui-intel commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-44356508
If I understand, the application cannot control how many executors to
launch (at least with the standalone mode). New executors can be launched for
the application whe
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-44345092
Yeah I agree with @mridulm. It seems like, with this change, you're
counting on the executors to finish registering before the delay threshold runs
out or before al
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-44343061
Tagging @kayousterhout ...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user mridulm commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-44285670
This is better handled in user code and not in scheduler.
We need a way to surface how many executors have registered - but once that
is available, spinning until some
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/892#issuecomment-44246965
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your proj
GitHub user lirui-intel opened a pull request:
https://github.com/apache/spark/pull/892
SPARK-1937: fix issue with task locality
Don't check executor/host availability when creating a TaskSetManager.
Because the executors may haven't been registered when the TaskSetManager is
creat
75 matches
Mail list logo