I also did a quick glance through the code and couldn't find anything worrying that should be included in the task closures. The only possibly unsanitary part is the Updater you pass in -- what is your Updater and is it possible it's dragging in a significant amount of extra state?
On Sat, Jul 12, 2014 at 7:27 PM, Kyle Ellrott <kellr...@soe.ucsc.edu> wrote: > I'm working of a patch to MLLib that allows for multiplexing several > different model optimization using the same RDD ( SPARK-2372: > https://issues.apache.org/jira/browse/SPARK-2372 ) > > In testing larger datasets, I've started to see some memory errors ( > java.lang.OutOfMemoryError and "exceeds max allowed: spark.akka.frameSize" > errors ). > My main clue is that Spark will start logging warning on smaller systems > like: > > 14/07/12 19:14:46 WARN scheduler.TaskSetManager: Stage 2862 contains a > task of very large size (10119 KB). The maximum recommended task size is > 100 KB. > > Looking up start '2862' in the case leads to a 'sample at > GroupedGradientDescent.scala:156' call. That code can be seen at > > https://github.com/kellrott/spark/blob/mllib-grouped/mllib/src/main/scala/org/apache/spark/mllib/grouped/GroupedGradientDescent.scala#L156 > > I've looked over the code, I'm broadcasting the larger variables, and > between the sampler and the combineByKey, I wouldn't think there much data > being moved over the network, much less a 10MB chunk. > > Any ideas of what this might be a symptom of? > > Kyle > >