GC Overhead limit exceed in sequential mode of Mahout Streamingkmeans

2014-03-25 Thread fx MA XIAOJUN
I am using Mahout Streamingkmeans in sequential mode. With a dataset of 200 objects, 128 variables, I would like to get 1 clusters. GC Overhead limit exceed error occurred. How to set java memory limit for sequential model? Yours Sincerely, Ma

CIMapper and CIReducer Mahout k-Means implementation

2014-03-25 Thread hiroshi leon
Hello everybody, I am new to this mapReduce and mahout and I have been revising the code in the past few days of the mapReduce implementation of mahout K-Means and there are things that I still do not understand. For example: In the CIMapper we have three main functions: Setup() Map()

trainclassifier/trainnb

2014-03-25 Thread Mahmood Naderan
Hi, What is the correct syntax for this old command?    mahout trainclassifier -i traininginput -o wikipediamodel -mf 4 -ms 4 It seems that trainclassifier is replaced by trainnb but this one has no -mf option.   Regards, Mahmood

Re: trainclassifier/trainnb

2014-03-25 Thread Andrew Musselman
If you need to see which options are available for a given job you can just run $MAHOUT_HOME/bin/mahout jobname to see the usage: $ bin/mahout trainnb Running on hadoop, using /home/user/hadoop/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB:

Re: trainclassifier/trainnb

2014-03-25 Thread Suneel Marthi
If u r looking for an example usage, see examples/bin/classify-20newsgroups.sh Sent from my iPhone On Mar 25, 2014, at 9:28 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: If you need to see which options are available for a given job you can just run $MAHOUT_HOME/bin/mahout

Debugging mahout K-means CIMapper and CIReducer - Breakpoints disable

2014-03-25 Thread hiroshi leon
Hello everybody, When i was debugging the MapReduce kmeans implementation of mahout, I was not able to put breakpoints to the CIMapper and CIReducer classes, has someone experienced this before? Only for those classes, I am having problems to put breakpoints and I have the sources files... So

Re: trainclassifier/trainnb

2014-03-25 Thread Mahmood Naderan
OK. The wikipedia example in these pages http://mahout.apache.org/users/classification/wikipedia-bayes-example.html https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example Are valid for older Mahouts. I have no problem with Mahout 0.6. However the last two commands

Re: GC Overhead limit exceed in sequential mode of Mahout Streamingkmeans

2014-03-25 Thread Suneel Marthi
What's ur value for -km? Based on what you had provided -km should be =  1 * ln(200) = 145090 Try reducing ur no. of clusters to 1000 and -km = 14509 On Tuesday, March 25, 2014 2:45 AM, fx MA XIAOJUN xiaojun...@fujixerox.co.jp wrote: I am using Mahout Streamingkmeans in