I think you would cluster these like any other text document. The
centroid of each cluster tells you where the cluster is in
feature-space, but the features are just words. If you find the
features (words) with largest absolute value, those ought to be the
words that appear frequently in the cluster and are what they are
"about".

As to ratings, not sure how you might want to involve them?

On Sun, Apr 8, 2012 at 11:44 PM, Mohit Anchlia <mohitanch...@gmail.com> wrote:
> I am new to Mahout and just going through some tutorials. One of the
> requirements I am working on involves extracting customer reviews from
> Amazon for a given item and then clustering those into similar topics to
> see what in general users have been talking about. So for eg: Rating of >
> 3 could say user experience is good, quality or rating of <=3 could say
> price, buggy etc.
>
> Could anyone suggest what would be the best way to approach this?

Reply via email to