As mentioned previously, automatically determining the number of clusters
found by SenseClusters is an issue of some importance. Previous summaries
I have posted have mentioned earlier work in the area, and in doing a
little web scrounging today I found a few fairly recent papers that I
like that discuss the issue in various different ways. In particular,
these are nice as they do comparative studies of a number of widely known
methods of cluster stopping, including CH, Hartigan, GAP, etc.

Without further introduction, here they are:

Sugar, C., and James, G. (2003) "Finding the Number of Clusters in a Data
Set : An Information Theoretic Approach",  Journal of the American
Statistical Association   98, 750-763.
http://www-rcf.usc.edu/~sugar/research/ratedist.pdf

Determining the Number of Clusters/Segments in Hierarchical
Clustering/Segmentation Algorithms, S. Salvador & P. Chan, Proc. 16th IEEE
Intl. Conf. on Tools with AI, pp. 576-584, 2004.
http://www.cs.fit.edu/~pkc/papers/ictai04salvador.pdf

Enjoy!
Ted

--
Ted Pedersen
http://www.d.umn.edu/~tpederse


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Reply via email to