[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-11-07 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/214 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-11-05 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-61938307 Hi @qqsun8819, as Matei mentioned, Spark now broadcasts RDD objects, so it's very unlikely for task serialization to become a bottleneck. I closed the associated JI

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-09-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-54694752 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project do

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-08-26 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-53514576 @qqsun8819 given the recent patch in 1.1 to broadcast RDD objects (and hence not have to serialize them when we send each task), do you think this patch is still needed? Un

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-31 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-39083121 Hey @qqsun8819 , Finally find that, there have been some discussions about removing dagScheduler's serializability checking https://github.com/apache/spark/pull/143 --

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-31 Thread qqsun8819
Github user qqsun8819 commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-39079998 @CodingCat Thanks very much for your review.I found out that you main concern concentrate on two points:1 Merge the two SerializerRunner in two scheduler backend into on

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-30 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r11102892 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -149,6 +151,21 @@ private[spark] object Utils extends Logging { buf }

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-30 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-39050291 @qqsun8819 Good job, just gave my thoughts on the current solution, I'm actually far from an expert, expecting others' feedback. --- If your project is set up for i

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-30 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r11102844 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -149,6 +151,21 @@ private[spark] object Utils extends Logging { buf }

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-30 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r11102803 --- Diff: core/src/main/scala/org/apache/spark/scheduler/local/LocalBackend.scala --- @@ -46,6 +47,7 @@ private[spark] class LocalActor( private

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-30 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r11102770 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala --- @@ -62,6 +65,30 @@ private[spark] class MesosSchedule

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-30 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r11102737 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala --- @@ -62,6 +65,30 @@ private[spark] class MesosSchedule

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-30 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r11102700 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala --- @@ -29,9 +29,12 @@ import org.apache.mesos.{Scheduler

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-30 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r11102691 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala --- @@ -62,6 +65,30 @@ private[spark] class MesosSchedule

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-30 Thread qqsun8819
Github user qqsun8819 commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-39048706 patch updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38881858 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proj

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread qqsun8819
Github user qqsun8819 commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38651743 @CodingCat @kayousterhout @mridulm Thanks very much for your review. I think @kayousterhout state clear in her last two comments what the ideal implementation looks

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread qqsun8819
Github user qqsun8819 commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38651617 @CodingCat @mridulm @kayousterhout Thanks very much for your review I looked through your discussion, and basically understand what you mean .So you all agree on movi

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38615455 Yeah totally agree about the util method!! On Tue, Mar 25, 2014 at 1:21 PM, Mridul Muralidharan < notificati...@github.com> wrote: > Btw, we

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38615396 Btw, we might want to make it some util method somewhere - so that the various backends dont need to duplicate this code. --- If your project is set up for it, you can re

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38615338 Ah, I see what you mean - pull all of the logic within successful schedule of resourceOffer into the caller. Yeah, that should work fine (with the caveat of setting Spa

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38614287 Yeah exactly -- so my proposal was something like, inn CBSG.makeOffers(): -still do scheduler.resourceOffers(), only now this returns unserialized tasks -upda

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38613780 In coarse grained scheduler, freeCores is updated once the task desc's are returned - in launchTasks; and expected to be used within the actor thread (so MT-unsafe). A

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38612977 freeCores would need to be updated by the makeOffers() method and before the tasks get serialized (otherwise we can have race conditions where we assign the sa

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38612685 @kayousterhout the backend assumes that there is only a single thread which is executing inside the actor at a given point of time. We will be changing this assumption.

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10949757 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -198,6 +201,13 @@ private[spark] class TaskSchedulerImpl( */

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10949666 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -243,9 +275,16 @@ private[spark] class TaskSchedulerImpl( }

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10949633 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -219,18 +229,40 @@ private[spark] class TaskSchedulerImpl(

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38609414 I thought about this a bit more and I think it makes sense to do something similar to what @CodingCat suggested: in CoarseGrainedSchedulerBackend, when we call sched

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-25 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38556226 Hey, @qqsun8819 , after the second thought on whether task serialization function should call the function directly or send a message to the ClusterSchedulerBackend, I

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10919090 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -198,6 +201,13 @@ private[spark] class TaskSchedulerImpl( */

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread qqsun8819
Github user qqsun8819 commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10918799 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -198,6 +201,13 @@ private[spark] class TaskSchedulerImpl( */

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread qqsun8819
Github user qqsun8819 commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10918762 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -198,6 +201,13 @@ private[spark] class TaskSchedulerImpl( */

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38533621 Oh, sorry, it's DAG --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this fe

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38533621 Oh, sorry, it's DAG --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this fe

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10918503 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -198,6 +201,13 @@ private[spark] class TaskSchedulerImpl( */

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38533226 Hi, @kayousterhout, you mean CoarseClusterSchedulerBackend block, instead of DAG? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10918466 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -198,6 +201,13 @@ private[spark] class TaskSchedulerImpl( */

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread qqsun8819
Github user qqsun8819 commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38532790 Also,if we use TaskResultGetter-like mechanism , we can create threapool inside it using FixPool from Util just as ResultGetter does --- If your project is set up for i

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread qqsun8819
Github user qqsun8819 commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38532473 Thanks for your advice @kayousterhout . And my understand for what you mean is create a TaskResultGetter-like class, and this class main a threadpool inside it , and e

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread qqsun8819
Github user qqsun8819 commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10918103 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -198,6 +201,13 @@ private[spark] class TaskSchedulerImpl( */

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38496480 Echoing what @CodingCat said, I think this solution has the same problem that I mentioned in response to your design posted in the JIRA (https://spark-project.atlass

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10899913 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -31,6 +32,7 @@ import org.apache.spark._ import org.apache.spark

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10896811 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -198,6 +201,13 @@ private[spark] class TaskSchedulerImpl( */

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread qqsun8819
Github user qqsun8819 commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38463531 Fix DriverSuite case fail put threadpool inside resourceoffer and shutdown it before it return some other fix according to @CodingCat 's review --- If your pro

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread qqsun8819
Github user qqsun8819 commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10888735 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -93,6 +96,10 @@ private[spark] class TaskSchedulerImpl( val ma

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread qqsun8819
Github user qqsun8819 commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10888611 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -30,6 +30,9 @@ import scala.util.Random import org.apache.spark.

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread qqsun8819
Github user qqsun8819 commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10886619 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -219,18 +226,43 @@ private[spark] class TaskSchedulerImpl(

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10884928 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -219,18 +226,43 @@ private[spark] class TaskSchedulerImpl(

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10884645 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -30,6 +30,9 @@ import scala.util.Random import org.apache.spark.

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10884378 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -219,18 +226,43 @@ private[spark] class TaskSchedulerImpl(

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10884300 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -219,18 +226,43 @@ private[spark] class TaskSchedulerImpl(

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10884189 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -243,12 +275,18 @@ private[spark] class TaskSchedulerImpl(

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10884203 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -243,12 +275,18 @@ private[spark] class TaskSchedulerImpl(

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10884024 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -93,6 +96,10 @@ private[spark] class TaskSchedulerImpl( val ma

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/214#discussion_r10884042 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -93,6 +96,10 @@ private[spark] class TaskSchedulerImpl( val ma

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/214#issuecomment-38416907 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proj

[GitHub] spark pull request: [SPARK-1141] [WIP] Parallelize Task Serializat...

2014-03-23 Thread qqsun8819
GitHub user qqsun8819 opened a pull request: https://github.com/apache/spark/pull/214 [SPARK-1141] [WIP] Parallelize Task Serialization https://spark-project.atlassian.net/browse/SPARK-1141 @kayousterhout copied from JIRA(design doc in JIRA is old, I'll update it later)