Re: ClusteredPoints

Paritosh Ranjan Thu, 24 Nov 2011 23:06:40 -0800

Run this code after the kmeans clustering is done.

I have arranged code so that you can simply use the process method bysupplying it the path of clusteredPoints directory inside the outputpath for clustering, the hadoop fileSystem and Configuration.


  //use clusterId and vector here to write to a local file.

At this line you will get the clusterId and vector. Use it to write tothe file.

public void process(Path clusteredPoints, FileSystem fileSystem,Configuration conf){

 FileStatus[] partFiles = getAllClusteredPointPartFiles();
    for (FileStatus partFile : partFiles) {

SequenceFile.Reader clusteredPointsReader = newSequenceFile.Reader(fileSystem, partFile.getPath(),

          conf);

WritableComparable clusterIdAsKey = (WritableComparable)clusteredPointsReader.getKeyClass()

          .newInstance();

Writable vector = (Writable)clusteredPointsReader.getValueClass().newInstance();

      while (clusteredPointsReader.next(clusterIdAsKey, vector)) {
        //use clusterId and vector here to write to a local file.

      }
      clusteredPointsReader.close();
    }
  }
}

private FileStatus[] getAllClusteredPointPartFiles(PathclusteredPoints, FileSystem fileSystem) throws IOException {Path[] partFilePaths =FileUtil.stat2Paths(fileSystem.globStatus(clusteredPoints,

      PathFilters.partFilter()));

FileStatus[] partFileStatuses =fileSystem.listStatus(partFilePaths, PathFilters.partFilter());

    return partFileStatuses;
  }

Paritosh


On 25-11-2011 12:27, Rachana wrote:

Hi Ranjan,

Thank you for your response, but as I am newbee I am kind of confused a bit!
Where should I include this code?
Or should I run this as a seperate program.


Rachana.





-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1411 / Virus Database: 2092/4037 - Release Date: 11/24/11

Re: ClusteredPoints

Reply via email to