Re: Clustering or classification?

Ted Dunning Tue, 24 Jan 2012 20:00:15 -0800

If you have supervised training data (and it sounds that way), then
classification is likely to be more effective.


On Tue, Jan 24, 2012 at 7:44 PM, Vikas Pandya <vika...@yahoo.com> wrote:

> Thanks. creating vectors for these three columns and clustering them
> doesn't bring desired results. here is the usecase again.
>
> 1)User searches for some free text. This search goes against Solr and
> brings back results
> 2)When User selects one item from the search result, subsequent query to
> Solr is made with passing in clusterId of the record which was selected
> from the search result. FYI: I created clusters and then indexed the
> clusterId for each record in the Solr Index so everything is at one place.
>
> With this is selected record's risk levels were "High", "Medium", "High" ,
> all items in the cluster for that should have same RiskLevel values
> (desired). I understand that vector/cluster will just go by word similarity
> hence it doesn't care if "High" appears for RiskLevel1 or RiskLevel3. hence
> clustering of these three columns aren't bringing back desired results.
>
> Today I have got more requirement to cluster by 8 other columns (on top of
> 3 Risk columns). Those 8 new columns are percentage values.
>
> Does this demand classification rather than clustering? I have just
> started reading classification section from Mahout In Action. Opinions
> please?
>
>
> Thanks,
>
>
> ________________________________
>  From: Frank Scholten <fr...@frankscholten.nl>
> To: user@mahout.apache.org
> Sent: Friday, January 20, 2012 12:48 PM
> Subject: Re: How to present mahout cluster in combination with Solr results
>
> On Fri, Jan 20, 2012 at 4:01 PM, Vikas Pandya <vika...@yahoo.com> wrote:
> > From the example below, solr search results should be clustered in some
> > following way
> > list all the items which have matching RiskLevels e.g.
> >
> >
> > Cluster 1:
> > Title          RiskLevel1          RiskLevel2         RiskLevel3
> > abc            High                     Medium             Low
> > xyz            High                      Medium            High
> > def            Low                        Medium           High
> >
> > Cluster 2:
> > Title          RiskLevel1          RiskLevel2         RiskLevel3
> > omn            Low                     Medium             Low
> > yui            Low                      Medium            High
> > bnm            Medium             Medium           High
> >
> > Though I have a feeling I don't need to use Mahout clustering for this,
> I am
> > still trying to hook in mahout for this since we have more clustering
> > requirements in the pipeline to cluster based on other features
> (attributes
> > of objects).
> >
>
> You only have 27 unique risklevel combinations. You could just sort by
> or more risklevels to get a sense of the data.
>
> If you have more attributes then you could indeed look into clustering,
>
> Cheers,
>
> Frank
>
> > Any thoughts?
> >
> > ________________________________
> > From: Vikas Pandya <vika...@yahoo.com>
> > To: Frank Scholten <fr...@frankscholten.nl>; "user@mahout.apache.org"
> > <user@mahout.apache.org>
> > Sent: Thursday, January 19, 2012 11:05 AM
> >
> > Subject: Re: How to present mahout cluster in combination with Solr
> results
> >
> > Hi Frank,
> >
> > Thanks for the link. That was useful. It's still bit unclear on how he
> built
> > his index. are we saying, we index  clusterId,clusterSize and
> clusterLable
> > in the same index (where other data is indexed)? So one index will have
> two
> > sets of Solr documents in it?  one containing cluster info?
> >
> > My requirement again; I have bunch of db columns which are being indexed.
> > e.g.
> > Title,             RiskLevel1, RiskLevel2,RiskLevel3 etc
> > Title1        High             Medium      Low
> >
> > Current requirement is to cluster documents based on their riskLevels and
> > NOT the title.
> >
> > Thanks,
> >
> >
> > ________________________________
> > From: Frank Scholten <fr...@frankscholten.nl>
> > To: user@mahout.apache.org; Vikas Pandya <vika...@yahoo.com>
> > Sent: Thursday, January 19, 2012 4:24 AM
> > Subject: Re: How to present mahout cluster in combination with Solr
> results
> >
> > Hi Vikas,
> >
> > I suggest indexing the cluster label, cluster size and
> > cluster-document mappings so you can use that information to build a
> > tag cloud of your data. Checkout this presentation
> > http://java.dzone.com/videos/configuring-mahout-clustering
> >
> > Cheers,
> >
> > Frank
> >
> > On Thu, Jan 19, 2012 at 4:18 AM, Vikas Pandya <vika...@yahoo.com> wrote:
> >> Hello,
> >>
> >> I have successfully created vectors from reading my existing Solr Index.
> >> Then created sequenceFile and mahout clusters from it. As I understand
> that
> >> currently solr and mahout clustering aren't integrated, what's the best
> way
> >> to represent mahout clusters to the user? Mine is a search application
> which
> >> renders results by querying solr index. Now I need to incorporate Mahout
> >> created clusters in the result. While Solr-Mahout integration isn't
> there
> >> yet, what's the best alternative way to represent this info?
> >>
> >> Thanks,
> >

Re: Clustering or classification?

Reply via email to