If you have supervised training data (and it sounds that way), then classification is likely to be more effective.
On Tue, Jan 24, 2012 at 7:44 PM, Vikas Pandya <vika...@yahoo.com> wrote: > Thanks. creating vectors for these three columns and clustering them > doesn't bring desired results. here is the usecase again. > > 1)User searches for some free text. This search goes against Solr and > brings back results > 2)When User selects one item from the search result, subsequent query to > Solr is made with passing in clusterId of the record which was selected > from the search result. FYI: I created clusters and then indexed the > clusterId for each record in the Solr Index so everything is at one place. > > With this is selected record's risk levels were "High", "Medium", "High" , > all items in the cluster for that should have same RiskLevel values > (desired). I understand that vector/cluster will just go by word similarity > hence it doesn't care if "High" appears for RiskLevel1 or RiskLevel3. hence > clustering of these three columns aren't bringing back desired results. > > Today I have got more requirement to cluster by 8 other columns (on top of > 3 Risk columns). Those 8 new columns are percentage values. > > Does this demand classification rather than clustering? I have just > started reading classification section from Mahout In Action. Opinions > please? > > > Thanks, > > > ________________________________ > From: Frank Scholten <fr...@frankscholten.nl> > To: user@mahout.apache.org > Sent: Friday, January 20, 2012 12:48 PM > Subject: Re: How to present mahout cluster in combination with Solr results > > On Fri, Jan 20, 2012 at 4:01 PM, Vikas Pandya <vika...@yahoo.com> wrote: > > From the example below, solr search results should be clustered in some > > following way > > list all the items which have matching RiskLevels e.g. > > > > > > Cluster 1: > > Title RiskLevel1 RiskLevel2 RiskLevel3 > > abc High Medium Low > > xyz High Medium High > > def Low Medium High > > > > Cluster 2: > > Title RiskLevel1 RiskLevel2 RiskLevel3 > > omn Low Medium Low > > yui Low Medium High > > bnm Medium Medium High > > > > Though I have a feeling I don't need to use Mahout clustering for this, > I am > > still trying to hook in mahout for this since we have more clustering > > requirements in the pipeline to cluster based on other features > (attributes > > of objects). > > > > You only have 27 unique risklevel combinations. You could just sort by > or more risklevels to get a sense of the data. > > If you have more attributes then you could indeed look into clustering, > > Cheers, > > Frank > > > Any thoughts? > > > > ________________________________ > > From: Vikas Pandya <vika...@yahoo.com> > > To: Frank Scholten <fr...@frankscholten.nl>; "user@mahout.apache.org" > > <user@mahout.apache.org> > > Sent: Thursday, January 19, 2012 11:05 AM > > > > Subject: Re: How to present mahout cluster in combination with Solr > results > > > > Hi Frank, > > > > Thanks for the link. That was useful. It's still bit unclear on how he > built > > his index. are we saying, we index clusterId,clusterSize and > clusterLable > > in the same index (where other data is indexed)? So one index will have > two > > sets of Solr documents in it? one containing cluster info? > > > > My requirement again; I have bunch of db columns which are being indexed. > > e.g. > > Title, RiskLevel1, RiskLevel2,RiskLevel3 etc > > Title1 High Medium Low > > > > Current requirement is to cluster documents based on their riskLevels and > > NOT the title. > > > > Thanks, > > > > > > ________________________________ > > From: Frank Scholten <fr...@frankscholten.nl> > > To: user@mahout.apache.org; Vikas Pandya <vika...@yahoo.com> > > Sent: Thursday, January 19, 2012 4:24 AM > > Subject: Re: How to present mahout cluster in combination with Solr > results > > > > Hi Vikas, > > > > I suggest indexing the cluster label, cluster size and > > cluster-document mappings so you can use that information to build a > > tag cloud of your data. Checkout this presentation > > http://java.dzone.com/videos/configuring-mahout-clustering > > > > Cheers, > > > > Frank > > > > On Thu, Jan 19, 2012 at 4:18 AM, Vikas Pandya <vika...@yahoo.com> wrote: > >> Hello, > >> > >> I have successfully created vectors from reading my existing Solr Index. > >> Then created sequenceFile and mahout clusters from it. As I understand > that > >> currently solr and mahout clustering aren't integrated, what's the best > way > >> to represent mahout clusters to the user? Mine is a search application > which > >> renders results by querying solr index. Now I need to incorporate Mahout > >> created clusters in the result. While Solr-Mahout integration isn't > there > >> yet, what's the best alternative way to represent this info? > >> > >> Thanks, > >