In theory this is what the system is learning for you, that there is some
pattern to the preferences and so someone who likes Julia Roberts's movies
would tend to be recommended more of them.

So I suppose I'd advise against making a pseudo-item out of a feature
unless you have specific, new information on which to base it that's not
already captured in your input. If you're deriving this JR feature value
just based on the existing input, it's not theoretically adding anything.

It's "distorting" the model, but that doesn't mean it doesn't happen to
help in your particular problem domain. Maybe forcing the existence of this
feature is a smart thing if you know better than the model that this is
important. Just be slightly wary of piling in lots of heuristics.


Yes you should keep the thread alive, I'm sure it's useful to hear about
your real-world result.


On Sat, Apr 7, 2012 at 12:21 AM, <anita.mehro...@accenture.com> wrote:

> Hi Sean,
>
> Thanks for the clarification and advice!
>
> In regards to how I use this matrix for content-based item-similarity,
> it's exactly like you said - weighting these additional attributes in the
> computation. So e.g. if 20% of all movies liked by a particular user U has
> Julia Roberts as a star, then there would be column for "Julia Roberts" and
> in the component for (U, Julia Roberts) would be 0.2. (I'm referencing Prof
> Ullman's chapter 9 of Mining Massive Data Sets here).
>
> Thanks for the help! I will continue reaching out to the Mahout forum (and
> you, if that's okay) as I work through building this out.
>
> Sincerely,
> Anita
>

Reply via email to