subject:"news20\-binary classification with LogisticRegressionWithSGD"

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-19 Thread Makoto Yui

Xiangrui, (2014/06/19 23:43), Xiangrui Meng wrote: It is because the frame size is not set correctly in executor backend. see spark-1112 . We are going to fix it in v1.0.1 . Did you try the treeAggregate? Not yet. I will wait the v1.0.1 release. Thanks, Makoto

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-19 Thread Xiangrui Meng

It is because the frame size is not set correctly in executor backend. see spark-1112 . We are going to fix it in v1.0.1 . Did you try the treeAggregate? > On Jun 19, 2014, at 2:01 AM, Makoto Yui wrote: > > Xiangrui and Debasish, > > (2014/06/18 6:33), Debasish Das wrote: >> I did run pretty b

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-19 Thread Makoto Yui

Xiangrui and Debasish, (2014/06/18 6:33), Debasish Das wrote: I did run pretty big sparse dataset (20M rows, 3M sparse features) and I got 100 iterations of SGD running in 200 seconds...10 executors each with 16 GB memory... I could figure out what the problem is. "spark.akka.frameSize" was to

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread Makoto Yui

Hi Xiangrui, (2014/06/18 8:49), Xiangrui Meng wrote: Makoto, dense vectors are used to in aggregation. If you have 32 partitions and each one sending a dense vector of size 1,354,731 to master. Then the driver needs 300M+. That may be the problem. It seems that it could cuase certain problems

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread Xiangrui Meng

Makoto, please use --driver-memory 8G when you launch spark-shell. -Xiangrui On Tue, Jun 17, 2014 at 4:49 PM, Xiangrui Meng wrote: > DB, Yes, reduce and aggregate are linear. > > Makoto, dense vectors are used to in aggregation. If you have 32 > partitions and each one sending a dense vector of s

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread Xiangrui Meng

DB, Yes, reduce and aggregate are linear. Makoto, dense vectors are used to in aggregation. If you have 32 partitions and each one sending a dense vector of size 1,354,731 to master. Then the driver needs 300M+. That may be the problem. Which deploy mode are you using, standalone or local? Debasi

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread Debasish Das

Xiangrui, Could you point to the JIRA related to tree aggregate ? ...sounds like the allreduce idea... I would definitely like to try it on our dataset... Makoto, I did run pretty big sparse dataset (20M rows, 3M sparse features) and I got 100 iterations of SGD running in 200 seconds...10 execu

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread Makoto Yui

Hi Xiangrui, (2014/06/18 6:03), Xiangrui Meng wrote: Are you using Spark 1.0 or 0.9? Could you go to the executor tab of the web UI and check the driver's memory? I am using Spark 1.0. 588.8 MB is allocated for RDDs. I am setting SPARK_DRIVER_MEMORY=2g in the conf/spark-env.sh. The value al

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread DB Tsai

Hi Xiangrui, Does it mean that mapPartition and then reduce shares the same behavior as aggregate operation which is O(n)? Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, Jun 17, 201

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread Xiangrui Meng

Hi Makoto, Are you using Spark 1.0 or 0.9? Could you go to the executor tab of the web UI and check the driver's memory? treeAggregate is not part of 1.0. Best, Xiangrui On Tue, Jun 17, 2014 at 2:00 PM, Xiangrui Meng wrote: > Hi DB, > > treeReduce (treeAggregate) is a feature I'm testing now.

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread Xiangrui Meng

Hi DB, treeReduce (treeAggregate) is a feature I'm testing now. It is a compromise between current reduce and butterfly allReduce. The former runs in linear time on the number of partitions, the latter introduces too many dependencies. treeAggregate with depth = 2 should run in O(sqrt(n)) time, wh

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread Makoto Yui

Hi Xiangrui, (2014/06/18 4:58), Xiangrui Meng wrote: How many partitions did you set? If there are too many partitions, please do a coalesce before calling ML algorithms. The training data "news20.random.1000" is small and thus only 2 partitions are used by the default. val training = MLUti

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread DB Tsai

Hi Xiangrui, What's different between treeAggregate and aggregate? Why treeAggregate scales better? What if we just use mapPartition, will it be as fast as treeAggregate? Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn:

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread Xiangrui Meng

Hi Makoto, How many partitions did you set? If there are too many partitions, please do a coalesce before calling ML algorithms. Btw, could you try the tree branch in my repo? https://github.com/mengxr/spark/tree/tree I used tree aggregate in this branch. It should help with the scalability. Be

Re: news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread Makoto Yui

Here is follow-up to the previous evaluation. "aggregate at GradientDescent.scala:178" never finishes at https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala#L178 We confirmed, by -verbose:gc, that GC is not happening during th

news20-binary classification with LogisticRegressionWithSGD

2014-06-17 Thread Makoto Yui

Hello, I have been evaluating LogisticRegressionWithSGD of Spark 1.0 MLlib on Hadoop 0.20.2-cdh3u6 but it does not work for a sparse dataset though the number of training examples used in the evaluation is just 1,000. It works fine for the dataset *news20.binary.1000* that has 178,560 features. H

Re: news20-binary classification with LogisticRegressionWithSGD

Re: news20-binary classification with LogisticRegressionWithSGD

Re: news20-binary classification with LogisticRegressionWithSGD

Re: news20-binary classification with LogisticRegressionWithSGD

Re: news20-binary classification with LogisticRegressionWithSGD

Re: news20-binary classification with LogisticRegressionWithSGD

Re: news20-binary classification with LogisticRegressionWithSGD

Re: news20-binary classification with LogisticRegressionWithSGD

Re: news20-binary classification with LogisticRegressionWithSGD

Re: news20-binary classification with LogisticRegressionWithSGD

Re: news20-binary classification with LogisticRegressionWithSGD

Re: news20-binary classification with LogisticRegressionWithSGD

Re: news20-binary classification with LogisticRegressionWithSGD

Re: news20-binary classification with LogisticRegressionWithSGD

Re: news20-binary classification with LogisticRegressionWithSGD

news20-binary classification with LogisticRegressionWithSGD

16 matches

Site Navigation

Mail list logo

Footer information