Probably this JIRA issue<https://spark-project.atlassian.net/browse/SPARK-1006>solves your problem. When running with large iteration number, the lineage DAG of ALS becomes very deep, both DAGScheduler and Java serializer may overflow because they are implemented in a recursive way. You may resort to checkpointing as a workaround.
On Wed, Apr 16, 2014 at 5:29 AM, Xiaoli Li <lixiaolima...@gmail.com> wrote: > Hi, > > I am testing ALS using 7 nodes. Each node has 4 cores and 8G memeory. ALS > program cannot run even with a very small size of training data (about 91 > lines) due to StackVverFlow error when I set the number of iterations to > 100. I think the problem may be caused by updateFeatures method which > updates products RDD iteratively by join previous products RDD. > > > I am writing a program which has a similar update process with ALS. This > problem also appeared when I iterate too many times (more than 80). > > The iterative part of my code is as following: > > solution = outlinks.join(solution). map { > ....... > } > > > Has anyone had similar problem? Thanks. > > > Xiaoli >