As mentioned previously, automatically determining the number of clusters found by SenseClusters is an issue of some importance. Previous summaries I have posted have mentioned earlier work in the area, and in doing a little web scrounging today I found a few fairly recent papers that I like that discuss the issue in various different ways. In particular, these are nice as they do comparative studies of a number of widely known methods of cluster stopping, including CH, Hartigan, GAP, etc.
Without further introduction, here they are: Sugar, C., and James, G. (2003) "Finding the Number of Clusters in a Data Set : An Information Theoretic Approach", Journal of the American Statistical Association 98, 750-763. http://www-rcf.usc.edu/~sugar/research/ratedist.pdf Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms, S. Salvador & P. Chan, Proc. 16th IEEE Intl. Conf. on Tools with AI, pp. 576-584, 2004. http://www.cs.fit.edu/~pkc/papers/ictai04salvador.pdf Enjoy! Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ senseclusters-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
