wah. Even trying to do seq2sparse doesn't work for me: [jake@smf1-ady-15-sr1 mahout-distribution-0.5-SNAPSHOT]$ ./bin/mahout seq2sparse -i hdfs://<namenode>/user/jake/text_temp -o hdfs://<namenode>/user/jake/text_vectors_temp Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop-0.20 No HADOOP_CONF_DIR set, using /usr/lib/hadoop-0.20/src/conf 11/05/09 23:36:01 WARN driver.MahoutDriver: No seq2sparse.props found on classpath, will use command-line arguments only 11/05/09 23:36:01 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1 11/05/09 23:36:01 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0 11/05/09 23:36:01 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1 11/05/09 23:36:04 INFO input.FileInputFormat: Total input paths to process : 1 11/05/09 23:36:10 INFO mapred.JobClient: Running job: job_201104300433_126621 11/05/09 23:36:12 INFO mapred.JobClient: map 0% reduce 0% 11/05/09 23:36:47 INFO mapred.JobClient: Task Id : attempt_201104300433_126621_m_000000_0, Status : FAILED 11/05/09 23:37:07 INFO mapred.JobClient: Task Id : attempt_201104300433_126621_m_000000_1, Status : FAILED Error: java.lang.ClassNotFoundException: org.apache.lucene.analysis.Analyzer
---- Note I'm not specifying any fancy analyzer. Just trying to run with the defaults. :\ -jake On Mon, May 9, 2011 at 2:21 PM, Jake Mannix <[email protected]> wrote: > > On Mon, May 9, 2011 at 2:15 PM, Sean Owen <[email protected]> wrote: > >> I think I am still +1 to just creating one re-packaged .jar -- for now >> at least. It fixes problems for sure. >> And then I am happy for the cognoscenti to construct a better solution >> later, and I'd be pleased to help. >> Though I still don't find this re-packaging a bad thing -- theoretical >> problems with signing keys and whatnot, yes, but don't exist in >> practice now. >> >> I guess I'm asking whether anyone is for/against committing MAHOUT-691? >> > > I think for our examples job jar, this is a good idea, for now. > > I will try out your patch and see how it looks on my production cluster. > > -jake >
