+user list.

Yes, Clustering with a good distance measure matching like cosine similarity
is a good way to start, you might need to do some form of document indexing
to get the top documents based on the top words in the cluster. Posting to
user list. Others may have more ideas around these.

Robin

On Fri, Jul 8, 2011 at 1:56 PM, Pankaj <hi.am...@gmail.com> wrote:

> Hi Robin,
>
> Pankaj this side, hope you are doing good at your end. I need some help on
> deciding whether *Apache Mahout *is* *a good option in the following case.
> * *While exploring about Apache mahout, i come to know that you have
> written a book on it. "Awesome stuff" dude big cheers for that.
>
> So we have round about 1 million questions asked by user around education
> domain in India. For example questions like:
> 1). "where i can do part time mba in delhi".
> 2). "How good is the part time mba course of some xyz college in delhi".
> 3). "good college for part time mba in delhi" etc,
>
> So there are lot of questions which are kind of similar in nature. If we
> categorize all three questions then they can be fall into "*part time mba
> delhi*" cluster. I want to categorize questions in different cluster and
> when user try to search for any question, i'll be able to show him all the
> questions that falls into same cluster.
>
> I am exploring different options to solve this problem, I have also gone
> through with *Apache Mahout *documentation and i think the *Clustering 
> *feature
> of it is very useful in this case.
>
> Need your input on the same. Is it good, bad or there are any other options
> which i should explore to solve this problem.
>
> Thanks and Regards
> - Pankaj
> --
> My Wings Lies In My Hands
>
>

Reply via email to