> Subject: Re: How to find which point belongs which cluster after running
> KMeansClusterer
> From: gsing...@apache.org
> Date: Fri, 4 Nov 2011 06:49:49 -0400
> To: user@mahout.apache.org
>
>
> On Nov 4, 2011, at 3:28 AM, WangRamon wrote:
>
> >
> > Thanks, that's what i need. I have another question, is there a recommend
> > value for the iteration and convergenceDelta in K-Means? Thanks a lot.
> > Cheers Ramon
>
>
> It's usually determined by testing (what's the minimum values you need that
> give you good results), but also by how long it takes for your system to run
> and what your business requirements are. Both of those values are really
> meant to be save guards against a runaway process since k-means isn't
> guaranteed to converge. What do you mean by k-means isn't guaranteed to
> converge?
>
>
> >> Date: Fri, 4 Nov 2011 08:07:01 +0530
> >> From: pran...@xebia.com
> >> To: user@mahout.apache.org
> >> Subject: Re: How to find which point belongs which cluster after running
> >> KMeansClusterer
> >>
> >> Transform your vector in a NamedVector.
> >>
> >> On 04-11-2011 08:02, WangRamon wrote:
> >>> OK, me again, I checked the KMeansDriver code for output points
> >>> information, following is the code: Map<Text, Text> props = new
> >>> HashMap<Text, Text>();
> >>> props.put(new Text("distance"), new
> >>> Text(String.valueOf(nearestDistance)));
> >>> context.write(new IntWritable(nearestCluster.getId()), new
> >>> WeightedPropertyVectorWritable(1, vector, props)); It's good to output
> >>> point(the vector) and distance information, but usually we need
> >>> something like a name in real business to identify the the point, name
> >>> <--> vector/point, and this information is not written out, if we can
> >>> add this information, that's will be much more better. Cheers Ramon
> >>>> Subject: Re: How to find which point belongs which cluster after running
> >>>> KMeansClusterer
> >>>> From: gsing...@apache.org
> >>>> Date: Thu, 3 Nov 2011 08:28:19 -0400
> >>>> To: user@mahout.apache.org
> >>>>
> >>>> There is code for this, it's in two places (on trunk, at least):
> >>>>
> >>>> 1. ClusterDumper:
> >>>> public static Map<Integer, List<WeightedVectorWritable>> readPoints(Path
> >>>> pointsPathDir, Configuration conf) {
> >>>> Map<Integer, List<WeightedVectorWritable>> result = new
> >>>> TreeMap<Integer, List<WeightedVectorWritable>>();
> >>>> for (Pair<IntWritable, WeightedVectorWritable> record :
> >>>> new SequenceFileDirIterable<IntWritable,
> >>>> WeightedVectorWritable>(
> >>>> pointsPathDir, PathType.LIST,
> >>>> PathFilters.logsCRCFilter(), conf)) {
> >>>> // value is the cluster id as an int, key is the name/id of the
> >>>> // vector, but that doesn't matter because we only care about
> >>>> printing
> >>>> // it
> >>>> //String clusterId = value.toString();
> >>>> int keyValue = record.getFirst().get();
> >>>> List<WeightedVectorWritable> pointList = result.get(keyValue);
> >>>> if (pointList == null) {
> >>>> pointList = Lists.newArrayList();
> >>>> result.put(keyValue, pointList);
> >>>> }
> >>>> pointList.add(record.getSecond());
> >>>> }
> >>>> return result;
> >>>> }
> >>>>
> >>>> 2. ClusterDumperWriter:
> >>>> List<WeightedVectorWritable> points =
> >>>> clusterIdToPoints.get(value.getId()); //look up the points by cluster id
> >>>> if (points != null) {
> >>>> writer.write("\tWeight : [props - optional]: Point:\n\t");
> >>>> for (Iterator<WeightedVectorWritable> iterator = points.iterator();
> >>>> iterator.hasNext(); ) {
> >>>> WeightedVectorWritable point = iterator.next();
> >>>> writer.write(String.valueOf(point.getWeight()));
> >>>>
> >>>> On Nov 3, 2011, at 5:48 AM, WangRamon wrote:
> >>>>
> >>>>> Yes, Paritosh, it's a bit missleading for new users, I will start to
> >>>>> check KMeansDriver, thanks for your quickly reply.
> >>>>>> Date: Thu, 3 Nov 2011 15:02:28 +0530
> >>>>>> From: pran...@xebia.com
> >>>>>> To: user@mahout.apache.org
> >>>>>> Subject: Re: How to find which point belongs which cluster after
> >>>>>> running KMeansClusterer
> >>>>>>
> >>>>>> I also thought in the beginning that using KMeansClusterer and
> >>>>>> ClusterDumper will help in getting all vectors belonging to a cluster,
> >>>>>> but it did not help me a lot.
> >>>>>>
> >>>>>> I used KMeansDriver which I think is easy enough to use.
> >>>>>>
> >>>>>> After execution the records are written in the form
> >>>>>> <cluster id><vector>
> >>>>>>
> >>>>>> "context.write(new Text(cluster.getIdentifier()), cluster);"
> >>>>>>
> >>>>>> So, what helped me was to process this into a map with cluster Id as
> >>>>>> the
> >>>>>> key and vector list as the value. I read the clustered points and all
> >>>>>> the data in the map in the form. In the end, the list against each
> >>>>>> cluster id was what I needed.
> >>>>>>
> >>>>>> Hope this helps.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Paritosh
> >>>>>>
> >>>>>> On 03-11-2011 14:23, WangRamon wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> Hi All I'm using KMeansClusterer, I will use KMeansDriver on a Hadoop
> >>>>>>> environment later, but I think it will be easy to understand it by
> >>>>>>> using KMeansClusterer, OK, so the question is i cannot find a way to
> >>>>>>> find the cluster a point should belong to after running
> >>>>>>> KMeansClusterer, I expect I can get some API on the Cluster interface
> >>>>>>> to get all points/vector belong to this cluster, but... so did i miss
> >>>>>>> something? Thanks a lot. Cheers Ramon
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> -----
> >>>>>>> No virus found in this message.
> >>>>>>> Checked by AVG - www.avg.com
> >>>>>>> Version: 10.0.1411 / Virus Database: 2092/3992 - Release Date:
> >>>>>>> 11/02/11
> >>>>>
> >>>> --------------------------------------------
> >>>> Grant Ingersoll
> >>>> http://www.lucidimagination.com
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> -----
> >>> No virus found in this message.
> >>> Checked by AVG - www.avg.com
> >>> Version: 10.0.1411 / Virus Database: 2092/3992 - Release Date: 11/02/11
> >>
> >
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
>
>
>
>