Good point -- there's been another optimization for ALS in HEAD ( https://github.com/apache/spark/pull/131), but yes the better place to pick up just essential changes since 0.9.0 including the previous one is the 0.9 branch.
-- Sean Owen | Director, Data Science | London On Sun, Mar 16, 2014 at 2:18 PM, Patrick Wendell <pwend...@gmail.com> wrote: > Sean - was this merged into the 0.9 branch as well (it seems so based > on the message from rxin). If so it might make sense to try out the > head of branch-0.9 as well. Unless there are *also* other changes > relevant to this in master. > > - Patrick > > On Sun, Mar 16, 2014 at 12:24 PM, Sean Owen <so...@cloudera.com> wrote: > > You should simply use a snapshot built from HEAD of > github.com/apache/spark > > if you can. The key change is in MLlib and with any luck you can just > > replace that bit. See the PR I referenced. > > > > Sure with enough memory you can get it to run even with the memory issue, > > but it could be hundreds of GB at your scale. Not sure I take the point > > about the JVM; you can give it 64GB of heap and executors can use that > much, > > sure. > > > > You could reduce the number of features a lot to work around it too, or > > reduce the input size. (If anyone saw my blog post about StackOverflow > and > > ALS -- that's why I snuck in a relatively paltry 40 features and pruned > > questions with <4 tags :) ) > > > > I don't think jblas has anything to do with it per se, and the allocation > > fails in Java code, not native code. This should be exactly what that PR > I > > mentioned fixes. > > > > -- > > Sean Owen | Director, Data Science | London > > > > > > On Sun, Mar 16, 2014 at 11:48 AM, Debasish Das <debasish.da...@gmail.com > > > > wrote: > >> > >> Thanks Sean...let me get the latest code..do you know which PR was it ? > >> > >> But will the executors run fine with say 32 gb or 64 gb of memory ? Does > >> not JVM shows up issues when the max memory goes beyond certain limit... > >> > >> Also the failure is due to GC limits from jblas...and I was thinking > that > >> jblas is going to call native malloc right ? May be 64 gb is not a big > deal > >> then...I will try increasing to 32 and then 64... > >> > >> java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead > limit > >> exceeded) > >> > >> > org.jblas.DoubleMatrix.<init>(DoubleMatrix.java:323)org.jblas.DoubleMatrix.zeros(DoubleMatrix.java:471)org.jblas.DoubleMatrix.zeros(DoubleMatrix.java:476)com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$17.apply(ALSQR.scala:366)com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$17.apply(ALSQR.scala:366)scala.Array$.fill(Array.scala:267)com.verizon.bigdata.mllib.recommendation.ALSQR.updateBlock(ALSQR.scala:366)com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$com$verizon$bigdata$mllib$recommendation$ALSQR$$updateFeatures$2.apply(ALSQR.scala:346)com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$com$verizon$bigdata$mllib$recommendation$ALSQR$$updateFeatures$2.apply(ALSQR.scala:345)org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:32)org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:32)scala.collection.Iterator$$anon$11.next(Iterator.scala:328)org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:149)org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:147)scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:147)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)org.apache.spark.rdd.RDD.iterator(RDD.scala:233)org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:32)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)org.apache.spark.rdd.RDD.iterator(RDD.scala:233)org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:32)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)org.apache.spark.rdd.RDD.iterator(RDD.scala:233)org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)org.apache.spark.rdd.RDD.iterator(RDD.scala:233)org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)org.apache.spark.scheduler.Task.run(Task.scala:53)org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) > >> > >> > >> > >> On Sun, Mar 16, 2014 at 11:42 AM, Sean Owen <so...@cloudera.com> wrote: > >>> > >>> Are you using HEAD or 0.9.0? I know there was a memory issue fixed a > few > >>> weeks ago that made ALS need a lot more memory than is needed. > >>> > >>> https://github.com/apache/incubator-spark/pull/629 > >>> > >>> Try the latest code. > >>> > >>> -- > >>> Sean Owen | Director, Data Science | London > >>> > >>> > >>> On Sun, Mar 16, 2014 at 11:40 AM, Debasish Das < > debasish.da...@gmail.com> > >>> wrote: > >>>> > >>>> Hi, > >>>> > >>>> I gave my spark job 16 gb of memory and it is running on 8 executors. > >>>> > >>>> The job needs more memory due to ALS requirements (20M x 1M matrix) > >>>> > >>>> On each node I do have 96 gb of memory and I am using 16 gb out of > it. I > >>>> want to increase the memory but I am not sure what is the right way to > >>>> do > >>>> that... > >>>> > >>>> On 8 executor if I give 96 gb it might be a issue due to GC... > >>>> > >>>> Ideally on 8 nodes, I would run with 48 executors and each executor > will > >>>> get 16 gb of memory..Total 48 JVMs... > >>>> > >>>> Is it possible to increase executors per node ? > >>>> > >>>> Thanks. > >>>> Deb > >>> > >>> > >> > > >