You can derive many metrics based on just co-occurrence, if your data
is 1 and 0. Pearson, cosine similarity, Tanimoto/Jaccard, Euclidean
distance, log-likelihood all just reduce to counting. Why not at least
give the choice?

You can keep half the diff matrix since it's symmetric of course.
Beyond that you would want to prune entries with a low 'confidence'
(not small absolute value). For that, prune simply on the number of
diffs included in the average, by throwing away ones based on the
average of just a few diffs. Slightly better is to do so based on
standard deviation / variance, which is what Mahout does.

What are the top K highest items? they aren't the same for all users
and you don't know what they'll be for each user... until you have
made recommendations which is the task in question.

On Mon, Jul 9, 2012 at 1:59 AM, Razon, Oren <oren.ra...@intel.com> wrote:
> Hi
> A few questions:
> 1.      I see that one of the parameters of the distributed co-occurrence 
> item similarity is the name of the item similarity class.
>       I wonder why it is an option? The all idea behind this algorithm is 
> that the similarity is based on co-occurrences, what am I missing here?
> 2.      If I want to use the distributed slope-one average diff job, but I do 
> not want to save all items per item (I want to avoid from saving an I*I 
> matrix), what should be the way I can filter the amount of items I'm saving 
> (such as I can do in the item based recommender)?
> 3.      Related to Sean question a few days ago... If I want to simplify the 
> overhead in ALS prediction of doing Xu * Y' in order to get user u 
> recommendation, does something like saving for each of the K features only 
> the top K highest items could be a good heuristic? That way I'm reducing 
> dramatically the number of candidate items and they are the potential items 
> to get the highest score, I think...
>
> ---------------------------------------------------------------------
> Intel Electronics Ltd.
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.

Reply via email to