Re: news20-binary classification with LogisticRegressionWithSGD

Xiangrui Meng Tue, 17 Jun 2014 17:08:07 -0700

Makoto, please use --driver-memory 8G when you launch spark-shell. -Xiangrui


On Tue, Jun 17, 2014 at 4:49 PM, Xiangrui Meng <men...@gmail.com> wrote:
> DB, Yes, reduce and aggregate are linear.
>
> Makoto, dense vectors are used to in aggregation. If you have 32
> partitions and each one sending a dense vector of size 1,354,731 to
> master. Then the driver needs 300M+. That may be the problem. Which
> deploy mode are you using, standalone or local?
>
> Debasish, there is an old PR for butterfly allreduce. However, it
> doesn't seem to be the right way to go for Spark. I just sent out the
> PR: https://github.com/apache/spark/pull/1110 . This is a WIP and it
> needs more testing before we are confident to merge it. It would be
> great if you can help test it.
>
> Best,
> Xiangrui
>
> On Tue, Jun 17, 2014 at 2:33 PM, Debasish Das <debasish.da...@gmail.com> 
> wrote:
>> Xiangrui,
>>
>> Could you point to the JIRA related to tree aggregate ? ...sounds like the
>> allreduce idea...
>>
>> I would definitely like to try it on our dataset...
>>
>> Makoto,
>>
>> I did run pretty big sparse dataset (20M rows, 3M sparse features) and I got
>> 100 iterations of SGD running in 200 seconds...10 executors each with 16 GB
>> memory...
>>
>> Although the best result on the same dataset came out of liblinear and
>> BFGS-L1 out of box...so I did not tune the SGD further on learning rate and
>> other heuristics...it was arnd 5% off...
>>
>> Thanks.
>> Deb
>>
>>
>>
>> On Tue, Jun 17, 2014 at 2:09 PM, DB Tsai <dbt...@stanford.edu> wrote:
>>>
>>> Hi Xiangrui,
>>>
>>> Does it mean that mapPartition and then reduce shares the same
>>> behavior as aggregate operation which is O(n)?
>>>
>>> Sincerely,
>>>
>>> DB Tsai
>>> -------------------------------------------------------
>>> My Blog: https://www.dbtsai.com
>>> LinkedIn: https://www.linkedin.com/in/dbtsai
>>>
>>>
>>> On Tue, Jun 17, 2014 at 2:00 PM, Xiangrui Meng <men...@gmail.com> wrote:
>>> > Hi DB,
>>> >
>>> > treeReduce (treeAggregate) is a feature I'm testing now. It is a
>>> > compromise between current reduce and butterfly allReduce. The former
>>> > runs in linear time on the number of partitions, the latter introduces
>>> > too many dependencies. treeAggregate with depth = 2 should run in
>>> > O(sqrt(n)) time, where n is the number of partitions. It would be
>>> > great if someone can help test its scalability.
>>> >
>>> > Best,
>>> > Xiangrui
>>> >
>>> > On Tue, Jun 17, 2014 at 1:32 PM, Makoto Yui <yuin...@gmail.com> wrote:
>>> >> Hi Xiangrui,
>>> >>
>>> >>
>>> >> (2014/06/18 4:58), Xiangrui Meng wrote:
>>> >>>
>>> >>> How many partitions did you set? If there are too many partitions,
>>> >>> please do a coalesce before calling ML algorithms.
>>> >>
>>> >>
>>> >> The training data "news20.random.1000" is small and thus only 2
>>> >> partitions
>>> >> are used by the default.
>>> >>
>>> >> val training = MLUtils.loadLibSVMFile(sc,
>>> >> "hdfs://host:8020/dataset/news20-binary/news20.random.1000",
>>> >> multiclass=false).
>>> >>
>>> >> We also tried 32 partitions as follows but the aggregate never
>>> >> finishes.
>>> >>
>>> >> val training = MLUtils.loadLibSVMFile(sc,
>>> >> "hdfs://host:8020/dataset/news20-binary/news20.random.1000",
>>> >> multiclass=false, numFeatures = 1354731 , minPartitions = 32)
>>> >>
>>> >>
>>> >>> Btw, could you try the tree branch in my repo?
>>> >>> https://github.com/mengxr/spark/tree/tree
>>> >>>
>>> >>> I used tree aggregate in this branch. It should help with the
>>> >>> scalability.
>>> >>
>>> >>
>>> >> Is treeAggregate itself available on Spark 1.0?
>>> >>
>>> >> I wonder.. Could I test your modification just by running the following
>>> >> code
>>> >> on REPL?
>>> >>
>>> >> -------------------
>>> >> val (gradientSum, lossSum) = data.sample(false, miniBatchFraction, 42 +
>>> >> i)
>>> >>         .treeAggregate((BDV.zeros[Double](weights.size), 0.0))(
>>> >>           seqOp = (c, v) => (c, v) match { case ((grad, loss), (label,
>>> >> features)) =>
>>> >>             val l = gradient.compute(features, label, weights,
>>> >> Vectors.fromBreeze(grad))
>>> >>             (grad, loss + l)
>>> >>           },
>>> >>           combOp = (c1, c2) => (c1, c2) match { case ((grad1, loss1),
>>> >> (grad2, loss2)) =>
>>> >>             (grad1 += grad2, loss1 + loss2)
>>> >>           }, 2)
>>> >> -------------------------
>>> >>
>>> >> Rebuilding Spark is quite something to do evaluation.
>>> >>
>>> >> Thanks,
>>> >> Makoto
>>
>>

Re: news20-binary classification with LogisticRegressionWithSGD

Reply via email to