Hi, I have not used reuters-21578 my k-means.
These steps I followed. I have prepared sequence directory then seq2sparse directory. ./mahout kmeans -Dmapred.map.java.child.opts=-Xmx1g -i /urlcat-data/56-categories/vector-dir/tfidf-vectors/ \ -c /urlcat-data/56-categories/cluster-centroids -o /urlcat-data/56-categories/kmeans-cluster-output \ -ow -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cd 1 -k 49 --clustering -cl mahout clusterdump -i /opt/49-classification/cluster-centroids -o /opt/49-classification/kmeans-cluster-output/clusteranalyze1.txt -p /opt/49-classification/kmeans-cluster-output/clusteredPoints/ -d /root/Desktop/final_feature_dictionaries.txt -dt text -e; I have checked examples/bin/cluster-reuters.sh and downloaded reuters-21578 Can you please let me know what should I do now. Thanks, Venkat On Thu, Jun 26, 2014 at 6:46 PM, Suneel Marthi <smar...@apache.org> wrote: > No, a dictionary is not a file of 'crisp keywords' to clusters mapping. A > dictionary is a mapping of keywords to a unique integerId. > > I again ask that it would be easier to help, if u can outline the steps u > had done for generating the clusters. Seems like u might have missed > something, at the very least look at the kmeans example in > examples/bin/cluster-reuters.sh for the correct sequence of steps. > > > On Thu, Jun 26, 2014 at 5:07 AM, venkata ramana < > venkat.ecosyst...@gmail.com > > wrote: > > > As per my understanding dictionary file contains crisp keywords which are > > related to cluster. Please let me know if I am wrong. > > > > Thanks, > > Venkat > > > > > > On Thu, Jun 26, 2014 at 1:27 PM, Suneel Marthi <smar...@apache.org> > wrote: > > > > > Its clear from the stacktrace that u have a String as key where an > > integer > > > was expected. > > > How did u go about building ur clusters from original input ? > > > > > > > > > On Thu, Jun 26, 2014 at 3:28 AM, venkata ramana < > > > venkat.ecosyst...@gmail.com > > > > wrote: > > > > > > > Hi Mahout, > > > > > > > > I am trying to analysis my k-means cluster. I have used following > > > command. > > > > > > > > mahout clusterdump -i /opt/49-classification/cluster-centroids -o > > > > /opt/49-classification/kmeans-cluster-output/clusteranalyze1.txt -p > > > > /opt/49-classification/kmeans-cluster-output/clusteredPoints/ -d > > > > /root/Desktop/final_feature_dictionaries.txt -dt text -e; > > > > > > > > I got the following error. > > > > > > > > hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, > running > > > > locally > > > > SLF4J: Class path contains multiple SLF4J bindings. > > > > SLF4J: Found binding in > > > > > > > > > > > > > > [jar:file:/opt/Gouri_Sankar/mahout-distribution-0.8/mahout-examples-0.8-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] > > > > SLF4J: Found binding in > > > > > > > > > > > > > > [jar:file:/opt/Gouri_Sankar/mahout-distribution-0.8/lib/slf4j-jcl-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > > > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > > > > explanation. > > > > SLF4J: Actual binding is of type [org.slf4j.impl.JCLLoggerFactory] > > > > Jun 26, 2014 12:43:40 PM org.slf4j.impl.JCLLoggerAdapter info > > > > INFO: Command line arguments: > > > > {--dictionary=[/root/Desktop/final_feature_dictionaries.txt], > > > > --dictionaryType=[text], > > > > > > > > > > > > > > --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure], > > > > --endPhase=[2147483647], --evaluate=null, > > > > --input=[/opt/49-classification/cluster-centroids], > > > > > > > > > > > > > > --output=[/opt/49-classification/kmeans-cluster-output/clusteranalyze1.txt], > > > > --outputFormat=[TEXT], > > > > > > > > > > > > > > --pointsDir=[/opt/49-classification/kmeans-cluster-output/clusteredPoints/], > > > > --startPhase=[0], --tempDir=[temp]} > > > > Exception in thread "main" java.lang.NumberFormatException: For input > > > > string: "aajproperty.com" > > > > at > > > > > > > > > > > > > > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > > > > at java.lang.Integer.parseInt(Integer.java:492) > > > > at java.lang.Integer.parseInt(Integer.java:527) > > > > at > > > > > > > > > > > > > > org.apache.mahout.utils.vectors.VectorHelper.loadTermDictionary(VectorHelper.java:218) > > > > > > > > > > > > I have not used any numbers in my dictionary file. Could you please > > help > > > me > > > > on this. > > > > > > > > Thanks, > > > > Venkat > > > > > > > > > >