Good point -- there's been another optimization for ALS in HEAD (
https://github.com/apache/spark/pull/131), but yes the better place to pick
up just essential changes since 0.9.0 including the previous one is the 0.9
branch.

--
Sean Owen | Director, Data Science | London


On Sun, Mar 16, 2014 at 2:18 PM, Patrick Wendell <pwend...@gmail.com> wrote:

> Sean - was this merged into the 0.9 branch as well (it seems so based
> on the message from rxin). If so it might make sense to try out the
> head of branch-0.9 as well. Unless there are *also* other changes
> relevant to this in master.
>
> - Patrick
>
> On Sun, Mar 16, 2014 at 12:24 PM, Sean Owen <so...@cloudera.com> wrote:
> > You should simply use a snapshot built from HEAD of
> github.com/apache/spark
> > if you can. The key change is in MLlib and with any luck you can just
> > replace that bit. See the PR I referenced.
> >
> > Sure with enough memory you can get it to run even with the memory issue,
> > but it could be hundreds of GB at your scale. Not sure I take the point
> > about the JVM; you can give it 64GB of heap and executors can use that
> much,
> > sure.
> >
> > You could reduce the number of features a lot to work around it too, or
> > reduce the input size. (If anyone saw my blog post about StackOverflow
> and
> > ALS -- that's why I snuck in a relatively paltry 40 features and pruned
> > questions with <4 tags :) )
> >
> > I don't think jblas has anything to do with it per se, and the allocation
> > fails in Java code, not native code. This should be exactly what that PR
> I
> > mentioned fixes.
> >
> > --
> > Sean Owen | Director, Data Science | London
> >
> >
> > On Sun, Mar 16, 2014 at 11:48 AM, Debasish Das <debasish.da...@gmail.com
> >
> > wrote:
> >>
> >> Thanks Sean...let me get the latest code..do you know which PR was it ?
> >>
> >> But will the executors run fine with say 32 gb or 64 gb of memory ? Does
> >> not JVM shows up issues when the max memory goes beyond certain limit...
> >>
> >> Also the failure is due to GC limits from jblas...and I was thinking
> that
> >> jblas is going to call native malloc right ? May be 64 gb is not a big
> deal
> >> then...I will try increasing to 32 and then 64...
> >>
> >> java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead
> limit
> >> exceeded)
> >>
> >>
> org.jblas.DoubleMatrix.<init>(DoubleMatrix.java:323)org.jblas.DoubleMatrix.zeros(DoubleMatrix.java:471)org.jblas.DoubleMatrix.zeros(DoubleMatrix.java:476)com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$17.apply(ALSQR.scala:366)com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$17.apply(ALSQR.scala:366)scala.Array$.fill(Array.scala:267)com.verizon.bigdata.mllib.recommendation.ALSQR.updateBlock(ALSQR.scala:366)com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$com$verizon$bigdata$mllib$recommendation$ALSQR$$updateFeatures$2.apply(ALSQR.scala:346)com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$com$verizon$bigdata$mllib$recommendation$ALSQR$$updateFeatures$2.apply(ALSQR.scala:345)org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:32)org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:32)scala.collection.Iterator$$anon$11.next(Iterator.scala:328)org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:149)org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:147)scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:147)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)org.apache.spark.rdd.RDD.iterator(RDD.scala:233)org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:32)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)org.apache.spark.rdd.RDD.iterator(RDD.scala:233)org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:32)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)org.apache.spark.rdd.RDD.iterator(RDD.scala:233)org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)org.apache.spark.rdd.RDD.iterator(RDD.scala:233)org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)org.apache.spark.scheduler.Task.run(Task.scala:53)org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
> >>
> >>
> >>
> >> On Sun, Mar 16, 2014 at 11:42 AM, Sean Owen <so...@cloudera.com> wrote:
> >>>
> >>> Are you using HEAD or 0.9.0? I know there was a memory issue fixed a
> few
> >>> weeks ago that made ALS need a lot more memory than is needed.
> >>>
> >>> https://github.com/apache/incubator-spark/pull/629
> >>>
> >>> Try the latest code.
> >>>
> >>> --
> >>> Sean Owen | Director, Data Science | London
> >>>
> >>>
> >>> On Sun, Mar 16, 2014 at 11:40 AM, Debasish Das <
> debasish.da...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I gave my spark job 16 gb of memory and it is running on 8 executors.
> >>>>
> >>>> The job needs more memory due to ALS requirements (20M x 1M matrix)
> >>>>
> >>>> On each node I do have 96 gb of memory and I am using 16 gb out of
> it. I
> >>>> want to increase the memory but I am not sure what is the right way to
> >>>> do
> >>>> that...
> >>>>
> >>>> On 8 executor if I give 96 gb it might be a issue due to GC...
> >>>>
> >>>> Ideally on 8 nodes, I would run with 48 executors and each executor
> will
> >>>> get 16 gb of memory..Total  48 JVMs...
> >>>>
> >>>> Is it possible to increase executors per node ?
> >>>>
> >>>> Thanks.
> >>>> Deb
> >>>
> >>>
> >>
> >
>

Reply via email to