Re: MongoDBDataModel in memory ?

2012-03-21 Thread Mridul Kapoor
Correction : On 21 March 2012 15:57, Mridul Kapoor mridulkap...@gmail.com wrote: Hi Thanks a lot Sebastian, Sean and Ted for your continuing help! Following your examples, I tried to create my own implementation. DataModel dataModel = new FileDataModel(new File(preferences.csv));

Re: MongoDBDataModel in memory ?

2012-03-21 Thread Mridul Kapoor
On 21 March 2012 16:01, Mridul Kapoor mridulkap...@gmail.com wrote: Correction : On 21 March 2012 15:57, Mridul Kapoor mridulkap...@gmail.com wrote: Hi Thanks a lot Sebastian, Sean and Ted for your continuing help! Following your examples, I tried to create my own implementation.

Error Running mahout-core-0.5-job.jar

2012-03-21 Thread jeanbabyxu
I tried to run mahout in Hadoop using the following command, [jxu13@lppma692 hadoop-0.20.2]$ bin/hadoop jar /opt/mapr/mahout/mahout-0.5/core/target/mahout-core-0.5-job.jar org.apache.mahout.cf.taste.hadoop.ite m.RecommenderJob -Dmapred.input.dir=input/input.txt --Dmapred.output.dir=output

Re: Error Running mahout-core-0.5-job.jar

2012-03-21 Thread Sean Owen
It's -Dmapred.output.dir=output not --Dmapred.output.dir=output (one dash), but, that's not even the problem. I don't think you can specify -D options this way, as they are JVM arguments. You need to configure these in Hadoop's config files. This is not specific to Mahout. On Wed, Mar 21, 2012 at

Re: can't get point-id, cluster-id thru -p

2012-03-21 Thread Baoqiang Cao
This is extremely helpful! Thanks a lot. Indeed, after seqdumper I got such: Key: 1774184: Value: wt: 1.0distance: 0.8839410915753125 vec: [123426:1.000] Key: 1705919: Value: wt: 1.0distance: 0.0 vec: [] Key: 1705919: Value: wt: 1.0distance: 0.0 vec: [] Key: 1705919: Value: wt: 1.0distance:

What to do with empty docs

2012-03-21 Thread Pat Ferrel
You may want to tune your analyzer to let more tokens through. The empty docs may be used in the TFIDF part of the analysis to calculate IDF, not sure? As to clusters with empty docs, I don't see how you can avoid that unless you drop them before running clustering. I can't think of any