RE: How to find which point belongs which cluster after running KMeansClusterer

WangRamon Fri, 04 Nov 2011 07:41:13 -0700

 > Subject: Re: How to find which point belongs which cluster after running 
 > KMeansClusterer
> From: gsing...@apache.org
> Date: Fri, 4 Nov 2011 06:49:49 -0400
> To: user@mahout.apache.org
> 
> 
> On Nov 4, 2011, at 3:28 AM, WangRamon wrote:
> 
> > 
> > Thanks, that's what i need. I have another question, is there a recommend 
> > value for the iteration and convergenceDelta in K-Means? Thanks a lot.  
> > Cheers Ramon
> 
> 
> It's usually determined by testing (what's the minimum values you need that 
> give you good results), but also by how long it takes for your system to run 
> and what your business requirements are.  Both of those values are really 
> meant to be save guards against a runaway process since k-means isn't 
> guaranteed to converge. What do you mean by k-means isn't guaranteed to 
> converge?
> 
> 
> >> Date: Fri, 4 Nov 2011 08:07:01 +0530
> >> From: pran...@xebia.com
> >> To: user@mahout.apache.org
> >> Subject: Re: How to find which point belongs which cluster after running 
> >> KMeansClusterer
> >> 
> >> Transform your vector in a NamedVector.
> >> 
> >> On 04-11-2011 08:02, WangRamon wrote:
> >>> OK, me again, I checked the KMeansDriver code for output points 
> >>> information, following is the code:   Map<Text, Text> props = new 
> >>> HashMap<Text, Text>();
> >>>    props.put(new Text("distance"), new 
> >>> Text(String.valueOf(nearestDistance)));
> >>>    context.write(new IntWritable(nearestCluster.getId()), new 
> >>> WeightedPropertyVectorWritable(1, vector, props)); It's good to output 
> >>> point(the vector) and distance information,  but usually we need 
> >>> something like a name in real business to identify the the point, name 
> >>> <--> vector/point,  and this information is not written out, if we can 
> >>> add this information, that's will be much more better.   Cheers  Ramon
> >>>> Subject: Re: How to find which point belongs which cluster after running 
> >>>> KMeansClusterer
> >>>> From: gsing...@apache.org
> >>>> Date: Thu, 3 Nov 2011 08:28:19 -0400
> >>>> To: user@mahout.apache.org
> >>>> 
> >>>> There is code for this, it's in two places (on trunk, at least):
> >>>> 
> >>>> 1. ClusterDumper:
> >>>> public static Map<Integer, List<WeightedVectorWritable>> readPoints(Path 
> >>>> pointsPathDir, Configuration conf) {
> >>>>    Map<Integer, List<WeightedVectorWritable>> result = new 
> >>>> TreeMap<Integer, List<WeightedVectorWritable>>();
> >>>>    for (Pair<IntWritable, WeightedVectorWritable> record :
> >>>>            new SequenceFileDirIterable<IntWritable, 
> >>>> WeightedVectorWritable>(
> >>>>                    pointsPathDir, PathType.LIST, 
> >>>> PathFilters.logsCRCFilter(), conf)) {
> >>>>      // value is the cluster id as an int, key is the name/id of the
> >>>>      // vector, but that doesn't matter because we only care about 
> >>>> printing
> >>>>      // it
> >>>>      //String clusterId = value.toString();
> >>>>      int keyValue = record.getFirst().get();
> >>>>      List<WeightedVectorWritable> pointList = result.get(keyValue);
> >>>>      if (pointList == null) {
> >>>>        pointList = Lists.newArrayList();
> >>>>        result.put(keyValue, pointList);
> >>>>      }
> >>>>      pointList.add(record.getSecond());
> >>>>    }
> >>>>    return result;
> >>>>  }
> >>>> 
> >>>> 2. ClusterDumperWriter:
> >>>> List<WeightedVectorWritable> points = 
> >>>> clusterIdToPoints.get(value.getId()); //look up the points by cluster id
> >>>>    if (points != null) {
> >>>>      writer.write("\tWeight : [props - optional]:  Point:\n\t");
> >>>>      for (Iterator<WeightedVectorWritable> iterator = points.iterator(); 
> >>>> iterator.hasNext(); ) {
> >>>>        WeightedVectorWritable point = iterator.next();
> >>>>        writer.write(String.valueOf(point.getWeight()));
> >>>> 
> >>>> On Nov 3, 2011, at 5:48 AM, WangRamon wrote:
> >>>> 
> >>>>> Yes, Paritosh, it's a bit missleading for new users, I will start to 
> >>>>> check KMeansDriver, thanks for your quickly reply.
> >>>>>> Date: Thu, 3 Nov 2011 15:02:28 +0530
> >>>>>> From: pran...@xebia.com
> >>>>>> To: user@mahout.apache.org
> >>>>>> Subject: Re: How to find which point belongs which cluster after 
> >>>>>> running KMeansClusterer
> >>>>>> 
> >>>>>> I also thought in the beginning that using KMeansClusterer and
> >>>>>> ClusterDumper will help in getting all vectors belonging to a cluster,
> >>>>>> but it did not help me a lot.
> >>>>>> 
> >>>>>> I used KMeansDriver which I think is easy enough to use.
> >>>>>> 
> >>>>>> After execution the records are written in the form
> >>>>>> <cluster id><vector>
> >>>>>> 
> >>>>>> "context.write(new Text(cluster.getIdentifier()), cluster);"
> >>>>>> 
> >>>>>> So, what helped me was to process this into a map with cluster Id as 
> >>>>>> the
> >>>>>> key and vector list as the value. I read the clustered points and all
> >>>>>> the data in the map in the form. In the end, the list against each
> >>>>>> cluster id was what I needed.
> >>>>>> 
> >>>>>> Hope this helps.
> >>>>>> 
> >>>>>> Regards,
> >>>>>> Paritosh
> >>>>>> 
> >>>>>> On 03-11-2011 14:23, WangRamon wrote:
> >>>>>>> 
> >>>>>>> 
> >>>>>>> Hi All I'm using KMeansClusterer, I will use KMeansDriver on a Hadoop 
> >>>>>>> environment later, but I think it will be easy to understand it by 
> >>>>>>> using KMeansClusterer, OK, so the question is i cannot find a way to 
> >>>>>>> find the cluster a point should belong to after running 
> >>>>>>> KMeansClusterer, I expect I can get some API on the Cluster interface 
> >>>>>>> to get all points/vector belong to this cluster, but... so did i miss 
> >>>>>>> something? Thanks a lot.  Cheers Ramon                                
> >>>>>>>         
> >>>>>>> 
> >>>>>>> 
> >>>>>>> -----
> >>>>>>> No virus found in this message.
> >>>>>>> Checked by AVG - www.avg.com
> >>>>>>> Version: 10.0.1411 / Virus Database: 2092/3992 - Release Date: 
> >>>>>>> 11/02/11
> >>>>>                                           
> >>>> --------------------------------------------
> >>>> Grant Ingersoll
> >>>> http://www.lucidimagination.com
> >>>> 
> >>>> 
> >>>> 
> >>>                                     
> >>> 
> >>> 
> >>> -----
> >>> No virus found in this message.
> >>> Checked by AVG - www.avg.com
> >>> Version: 10.0.1411 / Virus Database: 2092/3992 - Release Date: 11/02/11
> >> 
> >                                       
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
> 
> 
> 
> 
>
RE: How to find which point belongs which cluster after running KMeansClusterer

Reply via email to