Search engines do cool things.
On Fri, Oct 11, 2013 at 7:42 AM, Jens Bonerz <jbon...@googlemail.com> wrote: > what a nice idea :-) really like that approach > > > 2013/10/11 Ted Dunning <ted.dunn...@gmail.com> > > > You don't need Mahout for this. > > > > A very easy way to do this is to gather all the words for each category > > into a document. Thus: > > > > CatA:selling buying sales payment > > CatB:gathering collecting > > CatC:information data info > > > > Then put these into a text retrieval engine so that you have one document > > per category. > > > > When you get a new document to categorize, just use the document as a > query > > and you will get a list of possible categories back. Make sure you set > the > > default query mode to OR for this. > > > > See http://wiki.apache.org/solr/SolrQuerySyntax for more on the syntax. > > > > > > > > On Fri, Oct 11, 2013 at 5:04 AM, Kasi Subrahmanyam > > <kasisubbu...@gmail.com>wrote: > > > > > Hi, > > > > > > I have a problem that i would like to implement in mahout clustering. > > > > > > I have input text documents with data like below. > > > > > > Document1: This is the first document of selling information. > > > Document2: This is the second document of gathering information. > > > > > > I also have another look up file with data like below > > > selling:CatA > > > gathering:CatB. > > > information:CatC > > > > > > NOw i would like to cluster the documents with output being genrated as > > > Document1:CatA,CatC > > > Document2:CatB,CatC > > > > > > Please let me know how to achieve this. > > > > > > Thanks, > > > Subbu > > > > > >