I just ran into the opposite case Sebastian mentions, where a very large % of
users have only one interaction. They come from Social media or Search and see
only thing and leave. Processing this data turned into a huge job but led to
virtually no change in the model since users with very few
This actually sounds like a very small problem.
My guess is that there are bad settings for the interaction and frequency
cuts.
On Thu, Jun 23, 2016 at 11:07 AM, Pat Ferrel wrote:
> In addition to increasing downsampling there are some other things to
> note. The
In addition to increasing downsampling there are some other things to note. The
original OOM was caused by the use of BiMaps to store your row and column ids.
These will increase with the size of the total storage needed for 2 hashmaps
per id type. With only 16g you may have very little else
Hi,
Pairwise similarity is a quadratic problem and its very easy to run into
a problem size does not scale anymore, especially with so many items.
Our code downsamples the input data to help with this.
One thing you can do is decrease the argument maxNumInteractions to a
lower number to
Hi,
I am trying to build a simple recommendation engine using spark item
similarity (eg with
org.apache.mahout.math.cf.SimilarityAnalysis.cooccurrencesIDSs)
Things work fine on comparatively small dataset but I am having difficulty
scaling it up
The input I am using is CSV data containing