Re: news20-binary classification with LogisticRegressionWithSGD

Xiangrui Meng Tue, 17 Jun 2014 16:50:32 -0700

DB, Yes, reduce and aggregate are linear.

Makoto, dense vectors are used to in aggregation. If you have 32
partitions and each one sending a dense vector of size 1,354,731 to
master. Then the driver needs 300M+. That may be the problem. Which
deploy mode are you using, standalone or local?


Debasish, there is an old PR for butterfly allreduce. However, it
doesn't seem to be the right way to go for Spark. I just sent out the
PR: https://github.com/apache/spark/pull/1110 . This is a WIP and it
needs more testing before we are confident to merge it. It would be
great if you can help test it.

Best,
Xiangrui

On Tue, Jun 17, 2014 at 2:33 PM, Debasish Das <debasish.da...@gmail.com> wrote:
> Xiangrui,
>
> Could you point to the JIRA related to tree aggregate ? ...sounds like the
> allreduce idea...
>
> I would definitely like to try it on our dataset...
>
> Makoto,
>
> I did run pretty big sparse dataset (20M rows, 3M sparse features) and I got
> 100 iterations of SGD running in 200 seconds...10 executors each with 16 GB
> memory...
>
> Although the best result on the same dataset came out of liblinear and
> BFGS-L1 out of box...so I did not tune the SGD further on learning rate and
> other heuristics...it was arnd 5% off...
>
> Thanks.
> Deb
>
>
>
> On Tue, Jun 17, 2014 at 2:09 PM, DB Tsai <dbt...@stanford.edu> wrote:
>>
>> Hi Xiangrui,
>>
>> Does it mean that mapPartition and then reduce shares the same
>> behavior as aggregate operation which is O(n)?
>>
>> Sincerely,
>>
>> DB Tsai
>> -------------------------------------------------------
>> My Blog: https://www.dbtsai.com
>> LinkedIn: https://www.linkedin.com/in/dbtsai
>>
>>
>> On Tue, Jun 17, 2014 at 2:00 PM, Xiangrui Meng <men...@gmail.com> wrote:
>> > Hi DB,
>> >
>> > treeReduce (treeAggregate) is a feature I'm testing now. It is a
>> > compromise between current reduce and butterfly allReduce. The former
>> > runs in linear time on the number of partitions, the latter introduces
>> > too many dependencies. treeAggregate with depth = 2 should run in
>> > O(sqrt(n)) time, where n is the number of partitions. It would be
>> > great if someone can help test its scalability.
>> >
>> > Best,
>> > Xiangrui
>> >
>> > On Tue, Jun 17, 2014 at 1:32 PM, Makoto Yui <yuin...@gmail.com> wrote:
>> >> Hi Xiangrui,
>> >>
>> >>
>> >> (2014/06/18 4:58), Xiangrui Meng wrote:
>> >>>
>> >>> How many partitions did you set? If there are too many partitions,
>> >>> please do a coalesce before calling ML algorithms.
>> >>
>> >>
>> >> The training data "news20.random.1000" is small and thus only 2
>> >> partitions
>> >> are used by the default.
>> >>
>> >> val training = MLUtils.loadLibSVMFile(sc,
>> >> "hdfs://host:8020/dataset/news20-binary/news20.random.1000",
>> >> multiclass=false).
>> >>
>> >> We also tried 32 partitions as follows but the aggregate never
>> >> finishes.
>> >>
>> >> val training = MLUtils.loadLibSVMFile(sc,
>> >> "hdfs://host:8020/dataset/news20-binary/news20.random.1000",
>> >> multiclass=false, numFeatures = 1354731 , minPartitions = 32)
>> >>
>> >>
>> >>> Btw, could you try the tree branch in my repo?
>> >>> https://github.com/mengxr/spark/tree/tree
>> >>>
>> >>> I used tree aggregate in this branch. It should help with the
>> >>> scalability.
>> >>
>> >>
>> >> Is treeAggregate itself available on Spark 1.0?
>> >>
>> >> I wonder.. Could I test your modification just by running the following
>> >> code
>> >> on REPL?
>> >>
>> >> -------------------
>> >> val (gradientSum, lossSum) = data.sample(false, miniBatchFraction, 42 +
>> >> i)
>> >>         .treeAggregate((BDV.zeros[Double](weights.size), 0.0))(
>> >>           seqOp = (c, v) => (c, v) match { case ((grad, loss), (label,
>> >> features)) =>
>> >>             val l = gradient.compute(features, label, weights,
>> >> Vectors.fromBreeze(grad))
>> >>             (grad, loss + l)
>> >>           },
>> >>           combOp = (c1, c2) => (c1, c2) match { case ((grad1, loss1),
>> >> (grad2, loss2)) =>
>> >>             (grad1 += grad2, loss1 + loss2)
>> >>           }, 2)
>> >> -------------------------
>> >>
>> >> Rebuilding Spark is quite something to do evaluation.
>> >>
>> >> Thanks,
>> >> Makoto
>
>

Re: news20-binary classification with LogisticRegressionWithSGD

Reply via email to