Hi Behrang, Interesting questions, although generally speaking we confine discussion on this list to the SenseClusters package (http://senseclusters.sourceforge.net).
That said, your question is related in that SenseClusters does support cluster stopping, and in fact you can use the SenseClusters programs independently of the entire package - there is a program called clusterstopping.pl for example, that provides four different cluster stopping methods. You can see more about those here. http://search.cpan.org/dist/Text-SenseClusters/Toolkit/clusterstop/clusterstopping.pl I hope this helps! Cordially, Ted On Tue, May 5, 2009 at 8:54 AM, Behrang Saeedzadeh <[email protected]> wrote: > Hi all, > I am working on a cluster analysis project and I want to implement a > stopping rule for it. At the moment I want to implement the C/H stopping > rule. > Currently I am computing the WGSS like this (in Java): > public static double computeWGSS(DocumentGroup group) { > if (group.getDocuments().size() == 1) { > return 0.0; > } > double wgss = 0.0; > Document[] docs = group.getDocuments(); > for (int i = 0; i < docs.length; i++) { > for (int j = i + 1; j < docs.length; j++) { > Document d1 = docs[i]; > Document d2 = docs[j]; > wgss += computeSumOfSquares(d1.getProfile().getVector(), > d2.getProfile().getVector()); > } > } > return wgss / group.size(); > } > This is implemented according to C/H's paper "A dendrite method for cluster > analysis". > > However I have been unable to find the algorithm for computing BGSS. At the > moment I have implemented it like this: > public static double computeBGSS(List<DocumentGroup> groupList) { > if (groupList.size() == 1) { > return 0.0; > } > double bgss = 0.0; > for (int i = 0; i < groupList.size(); i++) { > for (int j = i + 1; j < groupList.size(); j++) { > DocumentGroup group1 = groupList.get(i); > DocumentGroup group2 = groupList.get(j); > bgss += computeBGSS(group1, group2); > } > } > return bgss; > } > public static double computeBGSS(DocumentGroup group1, DocumentGroup > group2) { > double bgss = 0.0; > for (Document d1 : group1.getDocuments()) { > for (Document d2 : group2.getDocuments()) { > bgss += computeSumOfSquares(d1.getProfile().getVector(), > d2.getProfile().getVector()); > } > } > return bgss; > } > > Is this implementation correct? When calculating WGSS, we divide the pooled > sum of squares by the number of documents in the cluster, do we have to > divide the pooled sum of squares in BGSS by something, like the number of > clusters? > Thanks in advance, > Behrang Saeedzadeh > ------------------------------- > http://my.opera.com/behrangsa > http://twitter.com/behrangsa > http://www.linkedin.com/in/behrangsa > http://www.facebook.com/people/Behrang-Saeedzadeh/619892726 > http://www.last.fm/user/behrangsa > > ------------------------------------------------------------------------------ > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your > production scanning environment may not be a perfect world - but thanks to > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK > i700 > Series Scanner you'll get full speed at 300 dpi even with all image > processing features enabled. http://p.sf.net/sfu/kodak-com > _______________________________________________ > senseclusters-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/senseclusters-users > > -- Ted Pedersen http://www.d.umn.edu/~tpederse ------------------------------------------------------------------------------ The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ senseclusters-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
