Hi Pat

Thanks very much for your suggestions. I will try to develop a "recommender" based on that and if somebody
is interested in it, then  I could contribute it as another example.

Thanks

Michael

Am 29.08.13 18:02, schrieb Pat Ferrel:
You can use the Mahout text pipeline, which will give you weighted vectors 
based on TFIDF for each article. There is an example of this in Mahout in 
Action for clustering. Then run the RowSimilarityJob on them instead of 
clustering. This will give you a strength of similarity for each article pair. 
RSJ produces a DRM (distributed row matrix), which is keyed by the article id 
and so has a list of how similar every article is to the row article. The 
highest similarities will indicate most similar text content in the articles.  
I've done this before and it works pretty well. There might be something in the 
new knn (k-nearest-neighbors) framework that is more optimized.

Once you have the article similarities you could combine the most similar to 
the past articles the user has read and show some number of the ones user 
hasn't seen yet.

Content-based recommenders are good for avoiding the cold start problem because 
even if the user has no read history you can show articles similar to the one 
she is looking at.

Also content-based recs are good when your inventory changes a lot (new 
articles appear all the time and they go out of favor quickly). You may never 
generate enough read behavior to use collaborative filtering alone.

BTW you might also look at Solr where you can use an article as a query against 
all articles indexed. This will also produce a list of ranked similar articles. 
Use the user's read history as queries and combine the lists somehow.

On Aug 29, 2013, at 7:53 AM, Gokhan Capan <gkhn...@gmail.com> wrote:

Hi Michael,

Those are collaborative filtering examples, which would recommend a news
article i, to a user u, based on:
- A weighted average of other users' ratings on i (where weight is the
similarity of two users' rating histories)
- A weighted average of u's ratings on other items (where weight is the
similarity of two items' rating histories, that is, the users rated the
item and how they rated it)
- A combination of the user and item vectors from user and item latent
factor matrices, which are obtained by decomposing the original rating
matrix.

If you are expecting the system recommend to a user only the news articles
those have similar content to the older news articles that the user had
shown a positive interest before, this is content-based filtering.
Also, the example you mentioned (recommending brand new articles)
introduces a challenge called cold-start problem, and content-based
filtering can generalize to cold-start articles, too.

A search in user-list for content-based filtering/recommendation can help
you (I am saying this because there were some great discussions on how to
achieve this with Mahout, for example, with custom similarity measures). if
you can't find anything satisfying, we can discuss that further.

Best,
Gokhan


On Thu, Aug 29, 2013 at 4:21 PM, Michael Wechner
<michael.wech...@wyona.com>wrote:

Hi

I am looking for a recommender example for news articles which is making
suggestions based on a user profile (independent of other users/readers) or
more specific on the reading history of a user.

Let's say a specific user likes to read articles about cycling and
international politics and the content management system is saving the URL
history of all the articles which have been read by this specific user.
When the editorial stuff is creating new articles/stories, then the system
should make recommendations to this user when she/he is getting back online
or also when a new story has been created, then the recommender should
check whether this new story would be good fit/match for this particular
user and the system should send a notification.

I guess developing such a recommender is possible with Mahout, but since I
am new Mahout, I would appreciate any pointers to examples which are
similar to the functionality described above.

I am currently looking at the examples shipped with Mahout

https://cwiki.apache.org/**confluence/display/MAHOUT/**
RecommendationExamples<https://cwiki.apache.org/confluence/display/MAHOUT/RecommendationExamples>

but if I understand correctly these are based on what other people liked
and not what the person itself only liked,
or do I misunderstand?

Thanks for your help

Michael



Reply via email to