Are you sure that the problem is writing the results?  It seems to me that
the real problem is the use of a user-based recommender.

For such a small data set, for instance, a search-based recommender will be
able to make recommendations in less than a millisecond with multiple
recommendations possible in parallel.  This should allow you to do 200,000
recommendations in a few minutes on a single machine.

With such a small dataset, indicator-based methods may not be the best
option.  To improve that, try using something larger such as the million
song dataset.  See http://labrosa.ee.columbia.edu/millionsong/

Also, using and estimating ratings is not a particularly good thing to be
doing if you want to build a real recommender.


On Fri, Apr 3, 2015 at 3:26 AM, PierLorenzo Bianchini <
piell...@yahoo.com.invalid> wrote:

> Hello everyone,
> I'm new to mahout, to recommender systems and to the mailing list.
>
> I''m trying to find a (fast) way to write back preferences to a file. I
> tried a few methods but I'm sure there must be a better approach.
> Here's the deal (you can find the same post in stackoverflow[1]).
> I have a training dataset of 800.000 records from 6000 users rating 3900
> movies. These are stored in a comma separated file like:
> userId,movieId,preference. I have another dataset (200.000 records) in the
> format: userId,movieId. My goal is to use the first dataset as a
> training-set, in order to determine the missing preferences of the second
> set.
>
> So far, I managed to load the training dataset and I generated user-based
> recommendations. This is pretty smooth and doesn't take too much time. But
> I'm struggling when it comes to writing back the recommendations.
>
> The first method I tried is:
>
>  * read a line from the file and get the userId,movieId tuple.
>  * retrieve the calculated preference with estimatePreference(userId,
> movieId)
>  * append the preference to the line and save it in a new file
> This works, but it's incredibly slow (I added a counter to print every
> 10.000th iteration: after a couple of minutes it had only printed once. I
> have 8GB-RAM with an i7-core... how long can it take to process 200.000
> lines?!)
>
> My second choise was:
>
>  * create a new FileDataModel with the second dataset
>  * do something like this: newDataModel.setPreference(userId, movieId,
> recommender.estimatePreference(userId, movieId));
>
> Here I get several problems:
>  * at runtime: java.lang.UnsupportedOperationException (as I found out in
> [2], FileDataModel actually can't be updated. I don't understand why the
> function setPreference exists in the first place...)
>  * The API of FileDataModel#setPreference states "This method should also
> be considered relatively slow."
>
> I read around that a solution would be to use delta files, but I couldn't
> find out what that actually means. Any suggestion on how I could speed up
> my writing-the-preferences process?
> Thank you!
>
> Pier Lorenzo
>
>
> [1]
> http://stackoverflow.com/questions/29423824/mahout-fast-performance-how-to-write-preferences-to-file
> [2] http://comments.gmane.org/gmane.comp.apache.mahout.user/11330
>

Reply via email to