Correction :
On 21 March 2012 15:57, Mridul Kapoor mridulkap...@gmail.com wrote:
Hi
Thanks a lot Sebastian, Sean and Ted for your continuing help! Following
your examples, I tried to create my own implementation.
DataModel dataModel = new FileDataModel(new File(preferences.csv));
On 21 March 2012 16:01, Mridul Kapoor mridulkap...@gmail.com wrote:
Correction :
On 21 March 2012 15:57, Mridul Kapoor mridulkap...@gmail.com wrote:
Hi
Thanks a lot Sebastian, Sean and Ted for your continuing help! Following
your examples, I tried to create my own implementation.
I tried to run mahout in Hadoop using the following command,
[jxu13@lppma692 hadoop-0.20.2]$ bin/hadoop jar
/opt/mapr/mahout/mahout-0.5/core/target/mahout-core-0.5-job.jar
org.apache.mahout.cf.taste.hadoop.ite
m.RecommenderJob -Dmapred.input.dir=input/input.txt
--Dmapred.output.dir=output
It's -Dmapred.output.dir=output not --Dmapred.output.dir=output (one dash),
but, that's not even the problem.
I don't think you can specify -D options this way, as they are JVM
arguments. You need to configure these in Hadoop's config files.
This is not specific to Mahout.
On Wed, Mar 21, 2012 at
This is extremely helpful! Thanks a lot.
Indeed, after seqdumper I got such:
Key: 1774184: Value: wt: 1.0distance: 0.8839410915753125 vec: [123426:1.000]
Key: 1705919: Value: wt: 1.0distance: 0.0 vec: []
Key: 1705919: Value: wt: 1.0distance: 0.0 vec: []
Key: 1705919: Value: wt: 1.0distance:
You may want to tune your analyzer to let more tokens through. The empty
docs may be used in the TFIDF part of the analysis to calculate IDF, not
sure?
As to clusters with empty docs, I don't see how you can avoid that
unless you drop them before running clustering. I can't think of any