The best default answer is to put them all in one model. The math
doesn't care what the things are. Unless you have a strong reason to
weight one data set I wouldn't. If you do, then two models is best. It
is hard to weight a subset of the data within most similarity
functions. I don't think it would in Pearson for instance but could
work in Tanimoto.

On Wed, Jul 4, 2012 at 1:20 AM, Ken Krugler <kkrugler_li...@transpac.com> wrote:
> Hi all,
>
> I'm curious what approaches are recommended for generating user-user 
> similarity, when I've got two (or more) distinct types of item data, both of 
> which are fairly large.
>
> E.g. let's say I had a set of users where I knew both (a) what books they had 
> bought on Amazon, and (b) what YouTube videos they had watched.
>
> For each user, I want to find the 10 most similar other users.
>
>  - I could create two separate models, find the nearest 30 users for each 
> user, and combine (maybe with weighting) the results.
>  - I could toss all of the data into one model - and I could use a value of < 
> 1.0 for whichever type of preference is less important.
>
> Any other suggestions? Input on the above two approaches?
>
> Thanks!
>
> -- Ken
>
> --------------------------
> Ken Krugler
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr
>
>
>
>

Reply via email to