Worth a try, but it ultimately boils down to the distance measure you've chosen, the distributions of input vectors and T2. As a pre-run experiment, you could sample some points from your data set (e.g. using RandomSeedGenerator as you would to prime k-means), then build a distance matrix using your chosen distance measure. That would give you a T2 starting point in a more systematic manner than grabbing it completely out of thin air.
-----Original Message----- From: Paul Mahon [mailto:[email protected]] Sent: Wednesday, April 27, 2011 1:46 PM To: [email protected] Subject: Re: Finding thresholds for canopy If you have a guess at how many clusters you want you could take the total area of the space and divide by the number of clusters to get an initial guess of T2 or T1. That might work to get you started, depending on the distribution. On 04/27/2011 12:39 PM, Camilo Lopez wrote: > I'm using Canopy as first step for K-means clustering, is there any > algorithmic, or even a good heuristic to estimate good T1 and T2 from the > vectorized data?
