Re: fkmeans or Cluster Dumper not working?

2011-07-26 Thread Jeffrey
Hi, Erm, I finally get to use a proper machine for testing (phew~), and now fkmeans with k=50 works fine (will try larger k value later). However, as you mentioned, clusterdumper is still failing with OME, my HADOOP_HEAPSIZE is 2000 (apparently the maximum i can assign, running a machine with

RE: meanshift reduce task problem

2011-07-26 Thread Sengupta, Sohini IN BLR SISL
Hi Jeff, I tried running this on synthetic_control dataset. I see load being balanced on reducers now; but the job stops after multiple failures with the following message: at com.google.common.base.Preconditions.checkArgument(Preconditions.java:115) at

Re: HBase Mahout - Using HBase as a Datastore/source for Mahout - Classification

2011-07-26 Thread Stanley Xu
I thought HBase might be a little slow for large data query. It normally takes 10-30ms to do a random read request. And even in a parallel/map-reduce condition, it will still take some time to query from the region server to the data node. I really doubt the hbase would become a io bottle neck for

Cluster-center and cluster-radius

2011-07-26 Thread Immo Micus
Hello, this is my first email to the mahout-user-list. I am trying to do some clustering with mahout and i have a question concerning the cluster-center and cluster-radius. For testing, i clustered 10 points using the KMeansClusterer: points: [13.000, 4455.000] [13.000, 5101.000] [13.000,

Item based recommendations

2011-07-26 Thread Antony Corfield [awc]
I've been testing the Mahout Recommender software using a dataModel derived from activity data generated by page views and downloads of items in an open-access repository. The taste data has preferences based on the number of times a user has viewed an item and I've also tested with boolean

Re: Item based recommendations

2011-07-26 Thread Sean Owen
The problem you've described is actually simpler than the 'classic' recommendation problem, which is personalized per user. All you want is a list of most-similar items. That's a lot easier. You could easily roll your own by using an ItemSimilarity implementation and iterating over all items. No

Re: Cluster-center and cluster-radius

2011-07-26 Thread Christoph Brücke
Hi Immo, did you have an extra cluster assignment at the end? Because the KMeans uses two phases: the first where all points are assigned to a cluster and the second where the cluster centroids are calculated based on the first assignment. So my idea is that you could use the clustering flag

Re: Cluster-center and cluster-radius

2011-07-26 Thread Immo Micus
Hi Christoph, thanks for your reply! The cluster-assignment is pretty much what i want to do: I have some points that i want to be clustered. Thats what i use KMeansClusterer.clusterpoints(...) for. Unfortunately this method does not provide me with an item-cluster-map. The only thing i get

Re: Cluster-center and cluster-radius

2011-07-26 Thread Ted Dunning
The first problem is that the input doesn't have comparable variability. This means that distance is going to be pretty much just y-distance. One way to improve this is to reduce each coordinate by dividing by the standard deviation of that coordinate. Depending on what your y coordinate is

Re: Mahout LDA

2011-07-26 Thread Jake Mannix
On Tue, Jul 26, 2011 at 4:27 AM, Benjamin Heilbrunn ben...@gmail.comwrote: 1) How can I display the topic distribution for a (existing) document from the reuters corpus? There is a sequence file called docTopics in the output directory. keys are docIds, values are VectorWritable. Use

using Integer array with NamedVector or RandomAccessSparseVector

2011-07-26 Thread Abhik Banerjee
I am new and have doubts

Article on Mahout recommenders and Cassandra

2011-07-26 Thread Sean Owen
http://www.acunu.com/blogs/sean-owen/recommending-cassandra/ I put together this quick-and-dirty writeup on using Cassandra as a backend for recommenders. May be of interest to anyone using Cassandra and/or the non-distributed recommenders. Sean

Re: using Integer array with NamedVector or RandomAccessSparseVector

2011-07-26 Thread Sean Owen
(Abhik this is nothing to do with Mahout, but the Manning forum system. I will reply privately as this is not the place.) On Tue, Jul 26, 2011 at 6:41 PM, Abhik Banerjee banerjee.abhik@gmail.com wrote: I get a message saying your post is more than 80 characters, fix that

Re: Article on Mahout recommenders and Cassandra

2011-07-26 Thread Chris Burroughs
On 07/26/2011 01:22 PM, Sean Owen wrote: http://www.acunu.com/blogs/sean-owen/recommending-cassandra/ I put together this quick-and-dirty writeup on using Cassandra as a backend for recommenders. May be of interest to anyone using Cassandra and/or the non-distributed recommenders. Sean

Re: Classification on Techcrunch

2011-07-26 Thread Ted Dunning
Yep. That sounds like a fine approach. You should try several algorithms, but the basic text classification approach should work reasonably well, especially if you include phrases and are aggressive about getting rid of garbage text. On Tue, Jul 26, 2011 at 2:17 PM, Shrikar archak

Parallel FPGrowth driver - doc problem?

2011-07-26 Thread Lance Norskog
The FPGrowth driver page: https://cwiki.apache.org/confluence/display/MAHOUT/Parallel+Frequent+Pattern+Mining gives a command line that only works in mahout/core, rather than mahout/. Is this drift, or a document bug? -- Lance Norskog goks...@gmail.com