Yep it's all in memory -- it would be too slow to access it out of Mongo. The purpose is just making it easy to read and re-read data into Mongo, and facilitate updates.
If the data is too big to fit in memory you should look first at pruning your data -- can sampling 10% of it still give you good results? If not, you are in Hadoop territory then and would want to look at a distributed algorithm here. On Sun, Mar 18, 2012 at 8:12 PM, Mridul Kapoor <mridulkap...@gmail.com>wrote: > Hi, > I am up for building a item based recommender using Mahout. I have > humongous amount of data in a Mongodb collection. But I am not sure that > the MongoDBDataModel provided with Mahout will be able to handle my case. I > see that in the buildModel() function, it creates a > > > FastByIDMap<Collection<Preference>> userIDPrefMap = new > > FastByIDMap<Collection<Preference>>(); > > > [line 556] > Does the subsequent code refer to creating an in-memory-model of the data > from the mongodb collection(which I think it does); if yes - is there any > current immediate alternative to that ? > > Thanks > Mridul >