RE: reduce is too slow in StreamingKmeans

2014-03-17 Thread fx MA XIAOJUN
Thank you for your quick reply. As to -km, I thought it was log10, instead of ln. I was wrong... This time I set -km 14 and run mahout streamingkmeans again.(CDH 5.0 Mrv1, Mahout 0.8) The maps run faster than before, but the reduce was still stuck at 76% for ever. So, I uninstalled mahout

Re: reduce is too slow in StreamingKmeans

2014-03-17 Thread Suneel Marthi
On Monday, March 17, 2014 3:43 AM, fx MA XIAOJUN xiaojun...@fujixerox.co.jp wrote: Thank you for your quick reply. As to -km, I thought it was log10, instead of ln. I was wrong... This time I set -km 14 and run mahout streamingkmeans again.(CDH 5.0 Mrv1, Mahout 0.8) The maps run

Re: Problem with FileSystem in Kmeans

2014-03-17 Thread Suneel Marthi
This problem's specifically to do with Canopy clustering and is not an issue with KMeans. I had seen this behavior with Canopy and looking at the code its indeed an issue wherein cluster-0 is created on the local file system and the remaining clusters land on HDFS. Please file a JIRA for this

java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

2014-03-17 Thread Margusja
Hi Here is my output: [speech@h14 ~]$ mahout/bin/mahout seqdirectory -c UTF-8 -i /user/speech/demo -o demo-seqfiles MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /usr/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB:

Re: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

2014-03-17 Thread Suneel Marthi
R u running on Hadoop 2.x which seems to be the case here. Compile with hadoop 2 profile: mvn -DskipTests clean install -Dhadoop2.profile=ur hadoop version On Monday, March 17, 2014 5:57 AM, Margusja mar...@roo.ee wrote: Hi Here is my output: [speech@h14 ~]$ mahout/bin/mahout

Re: Problem with FileSystem in Kmeans

2014-03-17 Thread Bikash Gupta
Suneel, Just for information, I havent found this issue in Canopy. Canopy cluster-0 was created in HDFS only. However Kmeans cluster-0 was created in local file system and cluster-1 in HDFS and after that it spit an error as it was unable to locate cluster-0 On Mon, Mar 17, 2014 at 3:10 PM,

Re: Problem with FileSystem in Kmeans

2014-03-17 Thread Bikash Gupta
I have 3 node cluster of CDH4.6, however I have build Mahout 0.9 with Hadoop 2.x profile. I have also created a mount point for these node and the path uri is same as HDFS. I have manually configured filesystem parameter conf.set(fs.hdfs.impl,org.

Normalization in Mahout

2014-03-17 Thread Bikash Gupta
Hi, Do we have any utility for Column and Row normalization in Mahout? -- Thanks Regards Bikash Gupta

Re: Normalization in Mahout

2014-03-17 Thread Suneel Marthi
What r u trying to do? On Monday, March 17, 2014 7:45 AM, Bikash Gupta bikash.gupt...@gmail.com wrote: Hi, Do we have any utility for Column and Row normalization in Mahout? -- Thanks Regards Bikash Gupta

Re: Normalization in Mahout

2014-03-17 Thread Bikash Gupta
Want to achieve few things 1. Normalize input data of clustering and classification algorithm 2. Normalize output data to plot in graph On Mon, Mar 17, 2014 at 5:32 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: What r u trying to do? On Monday, March 17, 2014 7:45 AM, Bikash Gupta

Re: Mahout parallel K-Means - algorithms analysis

2014-03-17 Thread Weishung Chung
You could take a look at org.apache.mahout.clustering.classify/ClusterClassificationMapper Enjoy, Wei Shung On Sat, Mar 15, 2014 at 2:51 PM, Suneel Marthi suneel_mar...@yahoo.comwrote: The clustering code is cimapper and cireducer. Following the clustering, there is cluster classification

RE: reduce is too slow in StreamingKmeans

2014-03-17 Thread fx MA XIAOJUN
Thank you for your extremely quick reply. What do u mean by this? kmeans hasn't changed between 0.8 and 0.9. Did u mean Streaming KMeans here? I want to try using -rskm in streaming kmeans. But in mahout 0.8, if setting -rskm as true, errors occur. I heard that the bug has been fixed in 0.9.

Re: reduce is too slow in StreamingKmeans

2014-03-17 Thread Suneel Marthi
-rskm option works only in sequential mode and fails in MR. That's still an issue in present trunk that needs to be fixed. That should explain why Streaming KMeans with -rskm works only in sequential mode for you. Mahout 0.9 has been built with Hadoop 1.2.1 profile, not sure if that's gonna

RE: reduce is too slow in StreamingKmeans

2014-03-17 Thread fx MA XIAOJUN
As mahout streamingkmeans has no problems in sequential mode, I would like to try sequential mode. However, java.lang.OutofMemoryError occurs. I wonder where to set JVM heap size for sequential mode? Is it the same with mapreduce mode? -Original Message- From: fx MA XIAOJUN

Fwd: Need help in executing SSVD for dimensionality reduction on Mahout

2014-03-17 Thread Vijaya Pratap
Hi, I am trying to use SSVD for dimensionality reduction on Mahout, the input is a sample data in CSV format. Below is a snippet of the input 22,2,44,36,5,9,2824,2,4,733,285,169 25,1,150,175,3,9,4037,2,18,1822,254,171 I have executed the below steps. 1. Loaded the csv file and Vectorized the

Re: Need help in executing SSVD for dimensionality reduction on Mahout

2014-03-17 Thread Dmitriy Lyubimov
If the rows in the input for SSVD are data points you are trying to create reduced space for, then rows of USigma represent the same points in the PCA (reduced) space. The mapping between the input rows and output rows is by same keys in the sequence files. However, it doesn't look like your input