It uses the standard SquaredL2Updater, and I also tried to broadcast it as well.
The input is a RDD created by taking the union of several inputs, that have all been run against MLUtils.kFold to produce even more RDDs. If I run with 10 different inputs, each with 10 kFolds. I'm pretty certain that all of the input RDDs have clean closures. But I'm curious, is there a high overhead for running union? Could that create larger task sizes? Kyle On Sat, Jul 12, 2014 at 7:50 PM, Aaron Davidson <ilike...@gmail.com> wrote: > I also did a quick glance through the code and couldn't find anything > worrying that should be included in the task closures. The only possibly > unsanitary part is the Updater you pass in -- what is your Updater and is > it possible it's dragging in a significant amount of extra state? > > > On Sat, Jul 12, 2014 at 7:27 PM, Kyle Ellrott <kellr...@soe.ucsc.edu> > wrote: > >> I'm working of a patch to MLLib that allows for multiplexing several >> different model optimization using the same RDD ( SPARK-2372: >> https://issues.apache.org/jira/browse/SPARK-2372 ) >> >> In testing larger datasets, I've started to see some memory errors ( >> java.lang.OutOfMemoryError and "exceeds max allowed: spark.akka.frameSize" >> errors ). >> My main clue is that Spark will start logging warning on smaller systems >> like: >> >> 14/07/12 19:14:46 WARN scheduler.TaskSetManager: Stage 2862 contains a >> task of very large size (10119 KB). The maximum recommended task size is >> 100 KB. >> >> Looking up start '2862' in the case leads to a 'sample at >> GroupedGradientDescent.scala:156' call. That code can be seen at >> >> https://github.com/kellrott/spark/blob/mllib-grouped/mllib/src/main/scala/org/apache/spark/mllib/grouped/GroupedGradientDescent.scala#L156 >> >> I've looked over the code, I'm broadcasting the larger variables, and >> between the sampler and the combineByKey, I wouldn't think there much data >> being moved over the network, much less a 10MB chunk. >> >> Any ideas of what this might be a symptom of? >> >> Kyle >> >> >