I run the KMeansDriver in Java than rather the command line. The problem I
am having is that I want to use the relative paths as specified in the
Hadoop configuration file and write it to hdfs than rather the local
filesystem, I don't want to hard-code the values as the username for
production and development are different. However, it works for the Canopy
Driver. Does anyone know what could be the cause of this? Any help is much
appreciated.

I'm trying to get the following to work:

KMeansDriver kMeansDriver = new KMeansDriver();
kMeansDriver.setConf(conf);
kMeansDriver.run(vectorsDir, clusterIn, clusterOut, new
TanimotoDistanceMeasure(), 0.05, 100, true, 0.05, false);

OR

KMeansDriver.run(conf, vectorsDir, clusterIn, clusterOut, new
TanimotoDistanceMeasure(), 0.05, 100, true, 0.05, false);


However this works but is 'hard-coded':

KMeansDriver.run(new Path(
"hdfs:localhost:9000/user/username/out/tfidf-vectors"), new
Path("hdfs:localhost:9000/user/username/out/canopy-centroids/clusters-0-final"),
new Path("hdfs:localhost:9000/user/username/out/clusters"), new
TanimotoDistanceMeasure(), 0.05, 100, true, 0.05, false);


whereby conf refers to the hadoop configuration.

This works for the CanopyDriver though:
CanopyDriver.run(conf, vectorsDir, canopyCentroids, new
EuclideanDistanceMeasure(), 500, 200, false, 0.05, false);


Thanks,
Herb

Reply via email to