[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...

2017-01-05 Thread lirui-intel
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 Thanks guys for the review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...

2017-01-04 Thread lirui-intel
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 I think the failure is due to one more [skipped test](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70890/testReport/pyspark.sql.tests/HiveContextSQLTests

[GitHub] spark issue #16376: [SPARK-18967][SCHEDULER] compute locality levels even if...

2017-01-04 Thread lirui-intel
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/16376 @kayousterhout I see. Thanks for the explanations :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...

2016-12-22 Thread lirui-intel
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 Not sure if my patch makes the tests unstable. But I can't figure out why. @kayousterhout @mridulm any ideas? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...

2016-12-22 Thread lirui-intel
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16376: [SPARK-18967][SCHEDULER] compute locality levels even if...

2016-12-22 Thread lirui-intel
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/16376 The change looks good to me, although I still want to make sure I understand it correctly. Before the change, a locality level is invalid if it has delay=0. The patch changes that and makes

[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...

2016-12-20 Thread lirui-intel
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 I don't think the failure is related, and it can't be reproduced locally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...

2016-12-20 Thread lirui-intel
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 The new test passed locally and I can't find any failures in the Jenkins test report. Not sure what failed exactly. --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...

2016-12-19 Thread lirui-intel
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 Sure. Updated patch to not catch Throwable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...

2016-12-19 Thread lirui-intel
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 Hi @kayousterhout and @mridulm, to clarify, I think the error won't disappear if we don't catch it. Because the runnable is wrapped in Utils.logUncaughtExceptions so the error will

[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...

2016-11-08 Thread lirui-intel
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 Hi @kayousterhout and @markhamstra , could you take another look? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...

2016-10-09 Thread lirui-intel
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 Thanks for the review. Updated the patch to address the comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...

2016-07-26 Thread lirui-intel
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 Could anybody help review this PR? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-11 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-218654108 Thanks @markhamstra for the explanations. I think currently the thread just dies and we log the uncaught error. I can add a catch for NoClassDefFoundError and

[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-11 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-218488089 Hey @markhamstra, anything specific that you think we should do in case of more severe errors? I think it doesn't hurt to handle the failed task in a fi

[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-05-09 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/12775#issuecomment-217886648 Update to add test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-14958][Core] Failed task not handled wh...

2016-04-28 Thread lirui-intel
GitHub user lirui-intel opened a pull request: https://github.com/apache/spark/pull/12775 [SPARK-14958][Core] Failed task not handled when there's error deserializing failure reason ## What changes were proposed in this pull request? TaskResultGetter tries to deseri

[GitHub] spark pull request: [SPARK-3902] Stabilize AsynRDDActions and add ...

2014-10-10 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/2760#issuecomment-58739690 Looks great! I think it's very useful to have these async APIs in java :-) --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: SPARK-2636: Expose job ID in JobWaiter API

2014-08-31 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/2176#issuecomment-54007752 Thanks @pwendell , patch updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-2636: Expose job ID in JobWaiter API

2014-08-30 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/2176#issuecomment-53960113 Thanks @rxin , @vanzin for the review. I've added experimental mark in the java doc. I see that mima can automatically exclude DeveloperApi and Experimental cl

[GitHub] spark pull request: SPARK-2636: Expose job ID in JobWaiter API

2014-08-29 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/2176#issuecomment-53853736 Thanks @rxin . I updated the patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-2636: Expose job ID in JobWaiter API

2014-08-29 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/2176#issuecomment-53847106 Hi @rxin , could you be more specific as how to do it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-2636: Expose job ID in JobWaiter API

2014-08-28 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/2176#discussion_r16883922 --- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala --- @@ -149,6 +149,13 @@ class SimpleFutureAction[T] private[spark](jobWaiter

[GitHub] spark pull request: SPARK-2636: Expose job ID in JobWaiter API

2014-08-28 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/2176#issuecomment-53842283 Thanks @rxin . Updated the patch accordingly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-2636: no where to get job identifier whi...

2014-08-28 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/2176#issuecomment-53682639 @rxin I've updated the patch. Yes I see these APIs are experimental. We can make hive use it as a workaround and change it when we have a better sol

[GitHub] spark pull request: SPARK-2636: no where to get job identifier whi...

2014-08-27 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/2176#issuecomment-53680289 I thought these async actions are missing in the java API so I added all of them from AsyncRDDActions. But sure, let me just add foreachAsync. --- If your project

[GitHub] spark pull request: SPARK-2636: no where to get job identifier whi...

2014-08-27 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/2176#issuecomment-53679584 Hi @rxin, thanks for the review! I can add interface to SimpleFutureAction to get the job id if we shouldn't expose JobWaiter to users. Hive on spark curr

[GitHub] spark pull request: SPARK-2636: no where to get job identifier whi...

2014-08-27 Thread lirui-intel
GitHub user lirui-intel opened a pull request: https://github.com/apache/spark/pull/2176 SPARK-2636: no where to get job identifier while submit spark job through spark API This PR adds the async actions to the Java API. User can call these async actions to get the FutureAction

[GitHub] spark pull request: SPARK-2740: allow user to specify ascending an...

2014-08-01 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/1645#issuecomment-50858418 Thanks @JoshRosen :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2740: allow user to specify ascending an...

2014-07-29 Thread lirui-intel
GitHub user lirui-intel opened a pull request: https://github.com/apache/spark/pull/1645 SPARK-2740: allow user to specify ascending and numPartitions for sortBy... It should be more convenient if user can specify ascending and numPartitions when calling sortByKey. You can merge

[GitHub] spark pull request: SPARK-2277: clear host->rack info properly

2014-07-23 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/1454#issuecomment-49958248 Thanks @mateiz --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: SPARK-2387: remove stage barrier

2014-07-22 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/1328#discussion_r15267187 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -340,6 +459,7 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-16 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-49264857 If a TaskSet only contains no-pref tasks, there won't be delay because the only valid level is ANY, so everything gets scheduled right away. If a Ta

[GitHub] spark pull request: SPARK-2277: clear host->rack info properly

2014-07-16 Thread lirui-intel
GitHub user lirui-intel opened a pull request: https://github.com/apache/spark/pull/1454 SPARK-2277: clear host->rack info properly Hi @mridulm, I just think of this issue of [#1212](https://github.com/apache/spark/pull/1212): I added FakeRackUtil to hold the host -> rack m

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-16 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49248835 Thanks everybody :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-15 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-49125959 Thanks @mridulm and sorry for your laptop :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-14 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-48873878 Hi @mridulm , I've added some test case to capture schedule behavior of RACK_LOCAL tasks. Let me know if I got anything wrong. --- If your project is set u

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-13 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/1313#issuecomment-48859854 This looks good to me :) Just a reminder that when TaskSchedulerImpl calls TaskSetManager.resourceOffer, the maxLocality (changed to preferredLocality in this PR

[GitHub] spark pull request: SPARK-2387: remove stage barrier

2014-07-08 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/1328#issuecomment-48421136 Thanks @sryza for the idea. I think it's OK to piggy back the communication in a heartbeat, but we should also allow the worker to explicitly ask the master fo

[GitHub] spark pull request: SPARK-2387: remove stage barrier

2014-07-08 Thread lirui-intel
GitHub user lirui-intel opened a pull request: https://github.com/apache/spark/pull/1328 SPARK-2387: remove stage barrier This PR is a PoC implementation of [SPARK-2387](https://issues.apache.org/jira/browse/SPARK-2387). When a ShuffleMapTask finishes, DAGScheduler will

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-07-04 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/1212#issuecomment-48025283 Thanks @mridulm for the review! I don't quite get your point about the testcase though, could you please be more specific on what testcase should be

[GitHub] spark pull request: SPARK-2277: make TaskScheduler track hosts on ...

2014-06-25 Thread lirui-intel
GitHub user lirui-intel opened a pull request: https://github.com/apache/spark/pull/1212 SPARK-2277: make TaskScheduler track hosts on rack You can merge this pull request into a Git repository by running: $ git pull https://github.com/lirui-intel/spark trackHostOnRack

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-24 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-47051463 Thanks everybody :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-22 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r14061614 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -181,16 +181,14 @@ private[spark] class TaskSetManager( var

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-22 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r14059200 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -181,16 +181,14 @@ private[spark] class TaskSetManager( var

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-22 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-46801884 Sorry about the code style and thanks @mateiz for pointing out. I've updated the patch. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-12 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45890408 Thanks @mridulm , I've updated the patch accordingly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45696187 I've updated the patch. Currently, maxLocality starts from PROCESS_LOCAL (maxLocality <- TaskLocality.values). What if we make it start from highest valid

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45691507 Thanks for the explanation @mridulm , really appreciate it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13601229 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -388,7 +386,7 @@ private[spark] class TaskSetManager( val

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13600111 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -388,7 +386,7 @@ private[spark] class TaskSetManager( val

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13599131 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -388,7 +386,7 @@ private[spark] class TaskSetManager( val

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13594625 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -388,7 +386,7 @@ private[spark] class TaskSetManager( val

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-10 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45592572 @kayousterhout - I've fixed how we compute valid locality levels and added some unit test. Now computeValidLocalityLevels considers a level as valid only if

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-09 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45564434 Thanks @kayousterhout , I'll fix this ASAP. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-09 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45468581 Hi @kayousterhout , thanks for pointing out this. My understanding is that, when TaskScheduler calls TaskSetManager.resourceOffer, it passes the parameter

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-06 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45398131 I've removed the delay for pendingTasksWithNoPrefs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-06 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45397807 @mridulm maybe it's better to allow users to specify how many executors they need (which is not available with standalone mode I believe)? So they can control t

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-06 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45397685 @mateiz - I added this waiting time into every TaskSetManager because with dynamic resizing clusters (as you suggested earlier), we may add new executors when new

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45299018 Sure @kayousterhout . Can we use [SPARK-1937](https://issues.apache.org/jira/browse/SPARK-1937) for the discussion? --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13474841 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -182,15 +189,16 @@ private[spark] class TaskSetManager( for

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13474171 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -182,15 +189,16 @@ private[spark] class TaskSetManager( for

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13473499 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -54,8 +54,15 @@ private[spark] class TaskSetManager( clock

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45293719 Hi @kayousterhout , let's consider a map stage whose tasks all have NODE_LOCAL preference. So pendingTasksForExecutor is empty and all tasks are add

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-05 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45207806 I've revised the patch as @mateiz suggested: tasks will be added to corresponding lists even when preferred location is unavailable, in which case it'll als

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-04 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45184290 That's true @kayousterhout , but my point is that what if all the tasks only specify NODE_LOCAL preference (common case when the RDD is created from some HDFS

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-04 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45176804 Hi @mateiz , I think we should distinguish between tasks that truly have no preference, and tasks whose preference is unavailable when picking tasks from

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-04 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-45175085 Yes @mateiz great idea. One quick question is that tasks in pendingTasksWithNoPrefs are considered as PROCESS_LOCAL. Suppose we have no tasks in

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-06-04 Thread lirui-intel
Github user lirui-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/892#discussion_r13421524 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -738,4 +739,13 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-05-28 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-44394787 I've made some modifications, please help to see if this makes sense :) @mridulm @rxin @kayousterhout --- If your project is set up for it, you can reply to

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-05-27 Thread lirui-intel
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/892#issuecomment-44356508 If I understand, the application cannot control how many executors to launch (at least with the standalone mode). New executors can be launched for the application

[GitHub] spark pull request: SPARK-1937: fix issue with task locality

2014-05-27 Thread lirui-intel
GitHub user lirui-intel opened a pull request: https://github.com/apache/spark/pull/892 SPARK-1937: fix issue with task locality Don't check executor/host availability when creating a TaskSetManager. Because the executors may haven't been registered when the TaskSet