I was following the book examples and k means , dirichlet and lda all have this casting problem. It may be a Mac issue not sure . I suspect it may be seq2sparse messing up the inputs, maybe wrong version. It outputs the regular part-r-* but the lda driver expects a file called data.
Sent from my iPad On Jun 9, 2011, at 7:40 AM, Mark <[email protected]> wrote: > Forgot to mention... great book :) > > On 6/9/11 7:30 AM, Mark wrote: >> KMeans is busted? What do you mean by this? The algorithm simply won't work >> or just the reuters example? >> >> Thanks >> >> On 6/9/11 12:28 AM, Sean Owen wrote: >>> (Assuming you are on HEAD,) I think KMeans is busted -- this has come up >>> before. I don't know if it is being maintained. Anyone who's willing to >>> step up and fix it is also welcome to overhaul it IMHO. >>> >>> On Thu, Jun 9, 2011 at 12:03 AM, Hector Yee<[email protected]> wrote: >>> >>>> I got a slightly different error on the next line of KMeansDriver.java >>>> (running on OS X Snow Leopard) >>>> >>>> 11/06/08 16:02:12 INFO compress.CodecPool: Got brand-new compressor >>>> Exception in thread "main" java.lang.ClassCastException: >>>> org.apache.hadoop.io.IntWritable cannot be cast to >>>> org.apache.mahout.math.VectorWritable >>>> at >>>> >>>> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:90) >>>> >>>> at >>>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:102) >>>> >>>> >>>> >>>> On Sun, Jun 5, 2011 at 9:31 PM, Jeff Eastman<[email protected]> wrote: >>>> >>>>> IIRC, Reuters used to run on a cluster but no longer does due to some >>>>> obscure Lucene changes. In 0.5 it only works in local mode. I really hope >>>>> this can be repaired by 0.6 as Reuters is a key entry point into Mahout >>>>> clustering for many users. >>>>>
