Clustering accuracy

2013-02-05 Thread Aysu Ezen
Hello, To my understanding from the book, ClusterDumper tool can be used to get the top features of each cluster and the centroid vector. However, I have a dataset with manual labels on it. I would like to evaluate the clusters based on the manual labels to calculate accuracy of clustering (set th

Re: Does something like an "explain" feature exist in Mahout for clustering.

2013-02-05 Thread Chris Harrington
I'm currently using KMeans with canopy and Cosine as the measure. The data I'm using has been somewhat curated into categories so I expected them to cluster alongside the other documents in their respective categories. Some of them fall nicely into clusters I'd expect but others are like the exa

Using IDF in CF recommender

2013-02-05 Thread Pat Ferrel
I think you meant: "Human relatedness decays much slower than item popularity." So to make sure I understand the implications of using IDF… For boolean/implicit preferences the sum of all prefs (after weighting) for a single item over all users will always be 1 or 0. This no matter whether the

Re: Using IDF in CF recommender

2013-02-05 Thread Ted Dunning
On Tue, Feb 5, 2013 at 11:29 AM, Pat Ferrel wrote: > I think you meant: "Human relatedness decays much slower than item > popularity." > Yes. Oops. > So to make sure I understand the implications of using IDF… For > boolean/implicit preferences the sum of all prefs (after weighting) for a >

Re: How to classifyan individual file after training

2013-02-05 Thread Vinay B,
That's exactly what I was trying to do, by running TestNewsGroups.java, as I explained in my last post. Here's the code again with the stack trace. There's something wrong I'm doing while loading up the model (and I can't load up the Naive Bayes, see code) Thanks https://gist.github.com/anonymous

Re: Clustering using Solr Index vs Lucene Index : Different Results

2013-02-05 Thread Vinay B,
Anyone ? Thanks On Wed, Jan 30, 2013 at 10:42 AM, Vinay B, wrote: > > Just a set of mahout commands. Here they are. > > https://gist.github.com/4674331 > > For what it's worth,t he relevant solr config from the schema was > > multiValued="true" > termVectors="true"/> > > Thank You > >