If I understand correctly, CosineDistanceMeasure has a range of [0,1]. So shouldn't Canopy Clustering return only one single cluster if 1 is used for T1 and T2 as in the example below? All points are within range 1 from the random starting point and should therefore be removed from the list of possible canopy centroids. Yet it returns multiple clusters. Are my assumptions wrong, can someone help me understand this behavior?

CanopyDriver.run(new Path("tfidf-vectors"), new Path("canopy_centroids"),
      new CosineDistanceMeasure(), 1, 1, 0.0, true);

Reply via email to