In theory this is what the system is learning for you, that there is some pattern to the preferences and so someone who likes Julia Roberts's movies would tend to be recommended more of them.
So I suppose I'd advise against making a pseudo-item out of a feature unless you have specific, new information on which to base it that's not already captured in your input. If you're deriving this JR feature value just based on the existing input, it's not theoretically adding anything. It's "distorting" the model, but that doesn't mean it doesn't happen to help in your particular problem domain. Maybe forcing the existence of this feature is a smart thing if you know better than the model that this is important. Just be slightly wary of piling in lots of heuristics. Yes you should keep the thread alive, I'm sure it's useful to hear about your real-world result. On Sat, Apr 7, 2012 at 12:21 AM, <anita.mehro...@accenture.com> wrote: > Hi Sean, > > Thanks for the clarification and advice! > > In regards to how I use this matrix for content-based item-similarity, > it's exactly like you said - weighting these additional attributes in the > computation. So e.g. if 20% of all movies liked by a particular user U has > Julia Roberts as a star, then there would be column for "Julia Roberts" and > in the component for (U, Julia Roberts) would be 0.2. (I'm referencing Prof > Ullman's chapter 9 of Mining Massive Data Sets here). > > Thanks for the help! I will continue reaching out to the Mahout forum (and > you, if that's okay) as I work through building this out. > > Sincerely, > Anita >