RE: Finding thresholds for canopy

Jeff Eastman Wed, 27 Apr 2011 13:57:31 -0700

Worth a try, but it ultimately boils down to the distance measure you've 
chosen, the distributions of input vectors and T2. As a pre-run experiment, you 
could sample some points from your data set (e.g. using RandomSeedGenerator as 
you would to prime k-means), then build a distance matrix using your chosen 
distance measure. That would give you a T2 starting point in a more systematic 
manner than grabbing it completely out of thin air.

-----Original Message-----
From: Paul Mahon [mailto:[email protected]] 
Sent: Wednesday, April 27, 2011 1:46 PM
To: [email protected]
Subject: Re: Finding thresholds for canopy

If you have a guess at how many clusters you want you could take the 
total area of the space and divide by the number of clusters to get an 
initial guess of T2 or T1. That might work to get you started, 
depending on the distribution.

On 04/27/2011 12:39 PM, Camilo Lopez wrote:
> I'm using Canopy as first step for K-means clustering, is there any 
> algorithmic, or even a good heuristic to estimate good T1 and T2 from the 
> vectorized data?

RE: Finding thresholds for canopy

Reply via email to