Trying to catch up.

Isn't the sum of similarities actually a globally comparable number for 
strength of preference in a boolean model? I was thinking it wasn't but it is 
really. It may not be ideal but as an ordinal it should work, right?

Is the logic behind the IDF idea that very popular items are of less value in 
calculating recommendations? If an IDF weight is to be applied isn't it to the 
preference values (0,1) before the similarity is calculated between users? The 
intuition would be that people aren't all that similar just because they have 
puppy liking in common.

I'm afraid I got lost applying IDFish weighting to similarity strengths 
themselves.

On Nov 15, 2012, at 10:50 AM, Sean Owen <sro...@gmail.com> wrote:

That's kind of what it does now... though it weights everything as "1". Not
so smart, but for sparse-ish data is not far off from a smarter answer.


On Thu, Nov 15, 2012 at 6:47 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> My own preference (pun intended) is to use log-likelihood score for
> determining which similarities are non-zero and then use simple frequency
> weighting such as IDF for weighting the similarities.   This doesn't make
> direct use of cooccurrence frequencies, but it works really well.  One
> reason that it seems to work well is that by using only general occurrence
> frequencies makes it *really* hard to overfit.
> 
> 

Reply via email to