It's a good question. I think you can achieve a partial solution in Mahout.

"Real-time" suggests that you won't be able to make use of
Hadoop-based implementations, since they are by nature big batch
processes.

All of the implementations accept the same input -- user,item,value.
That's OK; you can probably just reduce all of your user-thing
interactions to tuples like this. Any reasonable mapping should be OK.
Tags can be items too.

I don't think any of the implementations take advantage of time.

The non-Hadoop implementations are not-quite-realtime. The model is
loading data into memory from backing store, computing and maybe
caching partial results, and serving results as quickly as possible.
New input can't be immediately used, no. It comes into play when the
model is reloaded only.

I think you have very sparse input -- a high number of users and
"items" (tags, likes), but relatively few interactions. Matrix
factorization / latent factor models work well here. The ones in
Mahout that are not Hadoop-based may work for you, like
SVDRecommender. It's worth a try.

(Advertisement: the new recommender product I am commercializing,
Myrrix, does the real-time and matrix factorization thing just fine.
It's easy enough to start with that I would encourage you to
experiment with the open source system also:
http://myrrix.com/download/)



On Thu, Jan 31, 2013 at 7:02 PM, Frederik Kraus
<frederik.kr...@gmail.com> wrote:
> Hi Guys,
>
> I'm rather new to the whole Mahout ecosystem, so please excuse if the 
> questions I have are rather dumb ;)
>
> Our "problem" basically boils down to this: we want to match users with 
> either the content they interested in and/or the content they could 
> contribute to. To do this "matching" we have several dimensions both of users 
> and content items (things like: contribution history, tags, browsing history, 
> diggs, likes, ….).
>
> As interest of users can change over time some kind of CF algorithm including 
> temporal effects would obviously be best, but for the time being those 
> effects could probably be neglected.
>
> Now my questions:
>
> - what algorithm from the mahout "toolkit" would best fit our case?
> - How can we get this near realtime, i.e. not having to recalculate the 
> entire model when user dimensions change and/or new content is being added to 
> the system (or updated)
> - how would we model the user and item vectors (especially things like 
> "tags")?
> - any hints on where to start? ;)
>
> Thanks a lot!
>
> Fred.
>

Reply via email to