Is this command line what you need? (Replace /user/root/testdataout with your output directory)
$ mahout seqdumper -i /user/root/testdataout/data/part-m-00000
Key: 9: Value: {0:1.0,2:-0.956,1:-0.213,5:0.091,3:-0.003,7:-0.024,6:0.017,8:1.0,4:0.056} Key: 9: Value: {0:1.0,2:2.129,1:3.147,5:-0.063,3:-0.006,7:0.109,6:-0.002,4:-0.056} Key: 9: Value: {0:1.0,2:-2.718,1:-2.165,5:-0.103,3:-0.008,7:-0.024,6:-0.156,8:1.0,4:0.043}
...

Sorry if I misunderstand.



On 16/6/14 3:44 pm, Kamesh wrote:
Thanks for the response Andrew. I am using Mahout 0.9 version. However, I
tried with trunk version but still I am getting output in the following
format

C-55{n=1 c=[15993058.000] r=[]}
C-56{n=2 c=[15993061.167] r=[]}
C-57{n=1 c=[15993062.000] r=[]}

C-97{n=1 c=[15993103.000] r=[]}
C-98{n=2 c=[15993119.333] r=[0.395]}
C-99{n=1 c=[15993105.000] r=[]}

and hence, not able to figure out the data points inside each cluster.

Also, When I am running with "-of JSON" getting NPE

Exception in thread "main" java.lang.NullPointerException
at
org.apache.mahout.utils.clustering.JsonClusterWriter.getTopFeaturesList(JsonClusterWriter.java:118)
at
org.apache.mahout.utils.clustering.JsonClusterWriter.write(JsonClusterWriter.java:73)
at
org.apache.mahout.utils.clustering.AbstractClusterWriter.write(AbstractClusterWriter.java:115)
at
org.apache.mahout.utils.clustering.AbstractClusterWriter.write(AbstractClusterWriter.java:102)

I am executing cluster dump using the following way

hadoop jar mahout-integration-1.0-SNAPSHOT.jar
org.apache.mahout.utils.clustering.ClusterDumper -i
/canopy/clusters-0-final -p /canopy/clusteredPoints -of JSON -n 1000

Also I have observed that the *part* file created inside *clusteredPoints*
is empty.

Please help me how to get data points from each cluster.


On Fri, Jun 13, 2014 at 9:24 PM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

That's going to be easier if you can work off of trunk, since the output of
clustering has been cleaned up to write a better format, per
https://issues.apache.org/jira/browse/MAHOUT-1505

E.g.,

{
   "top_terms": [
     {"all":3.0149030685424805},
     {"english":3.0149030685424805},
     {"best":3.0149030685424805},
     {"spaniel":3.0149030685424805},
     {"springer":3.0149030685424805},
     {"dogs":1.9162907600402832}
   ],
   "cluster_id": 7,
   "cluster": {
     "r": [],
     "c": [
       {"all":3.015},
       {"best":3.015},
       {"dogs":1.916},
       {"english":3.015},
       {"spaniel":3.015},
       {"springer":3.015}
     ],
     "n": 1,
     "identifier": "C-7"
   },
   "points": [
     {
       "point": [
         {"all":3.015},
         {"best":3.015},
         {"dogs":1.916},
         {"english":3.015},
         {"spaniel":3.015},
         {"springer":3.015}
       ],
       "vector_name": "P(14)",
       "weight": "1.0"
     }
   ]
}


On Fri, Jun 13, 2014 at 2:42 AM, Kamesh <kamesh.had...@gmail.com> wrote:

Hi All,
Please help me in getting the data points inside each cluster.
The output of the clustering algorithm is center of the cluster and
radius
of the cluster. How do we derive actual data points inside each cluster
from this output.

--
Kamesh.







Reply via email to