I'm still trying to figure out why reuters-0.5 does not work on either of my 
clusters. The scripts themselves have no diff and the environment variables are 
set as in trunk except for MAHOUT_HOME. The synthetic control and 20 newsgroups 
examples run on both clusters without problems (well, 20 newsgroups has a 
Version Mismatch error on CDH3, but that is another story). But when I run 
reuters on 0.5 I see "MAHOUT_LOCAL is set, running locally" followed by file IO 
exceptions in MahoutDriver that are cluster dependent. When I run it on trunk, 
I don't see this and it works just fine.

-----Original Message-----
From: Drew Farris [mailto:[email protected]] 
Sent: Thursday, June 09, 2011 5:36 PM
To: [email protected]
Subject: Re: Problems running examples

Jeff, No impuning perceived and thanks for running the variety of
tests. So it appears that trunk is fine and 0.5 isn't. I'll try to
determine what (or what didn't) make it into 0.5 that causes it's
brokenness.

Mark, in the mean time, no need to run all of the tests I've asked
about previously. Just give trunk a try and see if that resolves your
problem.

On Thu, Jun 9, 2011 at 7:21 PM, Jeff Eastman <[email protected]> wrote:
> Hi Drew,
>
> Running trunk locally, latest update, just now, build-reuters.sh works 
> (kmeans and lda).
>
> Running trunk on my CDH3 cluster, just now:
> - build-cluster-syntheticcontrol.sh works (with kmeans and others)
> - build-reuters.sh works (with kmeans and lda) Running trunk on my CDH3 
> cluster:
>
> Running trunk on my MapR cluster, just now:
> - build-cluster-syntheticcontrol.sh works (with kmeans and others)
> - build-reuters.sh works (with kmeans and lda)
>
>
> Running the 5/31 mahout-distribution-0.5, just now:
> - build-cluster-syntheticcontrol.sh works (CDH3 & MapR with kmeans and others)
> - build-reuters.sh runs in local mode only (CDH3 & MapR runs give different 
> errors)
>
> I was primarily defending kmeans. It is possible my 5/31 0.5 distribution is 
> not the final one, since everything seems kosher in trunk now. My apology if 
> I've impuned your patch.
>
> Jeff
>
>
> -----Original Message-----
> From: Drew Farris [mailto:[email protected]]
> Sent: Thursday, June 09, 2011 11:36 AM
> To: [email protected]
> Subject: Re: Problems running examples
>
> Jeff,
>
> Could you tell me about what's failing in KMeans and LDA when running
> on a cluster? I had this working just prior to 0.5 in
> https://issues.apache.org/jira/browse/MAHOUT-694
>
> Thanks,
>
> Drew
>
> On Thu, Jun 9, 2011 at 2:01 PM, Jeff Eastman <[email protected]> wrote:
>> Ahem, KMeans is not busted. It is being maintained by me, at least. The 
>> build-reuters.sh script runs only in local mode on 0.5 and fails in both 
>> KMeans and LDA when run on a cluster. The MIA examples are not always 
>> correct. Most of this has been reported before.
>>
>> -----Original Message-----
>> From: Sean Owen [mailto:[email protected]]
>> Sent: Thursday, June 09, 2011 12:29 AM
>> To: [email protected]
>> Subject: Re: Problems running examples
>>
>> (Assuming you are on HEAD,) I think KMeans is busted -- this has come up
>> before. I don't know if it is being maintained.  Anyone who's willing to
>> step up and fix it is also welcome to overhaul it IMHO.
>>
>> On Thu, Jun 9, 2011 at 12:03 AM, Hector Yee <[email protected]> wrote:
>>
>>> I got a slightly different error on the next line of KMeansDriver.java
>>> (running on OS X Snow Leopard)
>>>
>>> 11/06/08 16:02:12 INFO compress.CodecPool: Got brand-new compressor
>>> Exception in thread "main" java.lang.ClassCastException:
>>> org.apache.hadoop.io.IntWritable cannot be cast to
>>> org.apache.mahout.math.VectorWritable
>>>  at
>>>
>>> org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:90)
>>> at
>>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:102)
>>>
>>>
>>> On Sun, Jun 5, 2011 at 9:31 PM, Jeff Eastman <[email protected]> wrote:
>>>
>>> > IIRC, Reuters used to run on a cluster but no longer does due to some
>>> > obscure Lucene changes. In 0.5 it only works in local mode. I really hope
>>> > this can be repaired by 0.6 as Reuters is a key entry point into Mahout
>>> > clustering for many users.
>>> >
>>>
>>
>

Reply via email to