Re: [Senseclusters-users] How to compute BGSS?

Ted Pedersen Wed, 06 May 2009 08:44:11 -0700

Hi Behrang,

Interesting questions, although generally speaking we confine
discussion on this list to the SenseClusters package
(http://senseclusters.sourceforge.net).


That said, your question is related in that SenseClusters does support
cluster stopping, and in fact you can use the SenseClusters programs
independently of the entire package - there is a program called

clusterstopping.pl

for example, that provides four different cluster stopping methods.
You can see more about those here.

http://search.cpan.org/dist/Text-SenseClusters/Toolkit/clusterstop/clusterstopping.pl

I hope this helps!

Cordially,
Ted

On Tue, May 5, 2009 at 8:54 AM, Behrang Saeedzadeh <[email protected]> wrote:
> Hi all,
> I am working on a cluster analysis project and I want to implement a
> stopping rule for it. At the moment I want to implement the C/H stopping
> rule.
> Currently I am computing the WGSS like this (in Java):
>     public static double computeWGSS(DocumentGroup group) {
>         if (group.getDocuments().size() == 1) {
>             return 0.0;
>         }
>         double wgss = 0.0;
>         Document[] docs = group.getDocuments();
>         for (int i = 0; i < docs.length; i++) {
>             for (int j = i + 1; j < docs.length; j++) {
>                 Document d1 = docs[i];
>                 Document d2 = docs[j];
>                 wgss += computeSumOfSquares(d1.getProfile().getVector(),
> d2.getProfile().getVector());
>             }
>         }
>         return wgss / group.size();
>     }
> This is implemented according to C/H's paper "A dendrite method for cluster
> analysis".
>
> However I have been unable to find the algorithm for computing BGSS. At the
> moment I have implemented it like this:
>     public static double computeBGSS(List<DocumentGroup> groupList) {
>         if (groupList.size() == 1) {
>             return 0.0;
>         }
>         double bgss = 0.0;
>         for (int i = 0; i < groupList.size(); i++) {
>             for (int j = i + 1; j < groupList.size(); j++) {
>                 DocumentGroup group1 = groupList.get(i);
>                 DocumentGroup group2 = groupList.get(j);
>                 bgss += computeBGSS(group1, group2);
>             }
>         }
>         return bgss;
>     }
>     public static double computeBGSS(DocumentGroup group1, DocumentGroup
> group2) {
>         double bgss = 0.0;
>         for (Document d1 : group1.getDocuments()) {
>             for (Document d2 : group2.getDocuments()) {
>                 bgss += computeSumOfSquares(d1.getProfile().getVector(),
> d2.getProfile().getVector());
>             }
>         }
>         return bgss;
>     }
>
> Is this implementation correct? When calculating WGSS, we divide the pooled
> sum of squares by the number of documents in the cluster, do we have to
> divide the pooled sum of squares in BGSS by something, like the number of
> clusters?
> Thanks in advance,
> Behrang Saeedzadeh
> -------------------------------
> http://my.opera.com/behrangsa
> http://twitter.com/behrangsa
> http://www.linkedin.com/in/behrangsa
> http://www.facebook.com/people/Behrang-Saeedzadeh/619892726
> http://www.last.fm/user/behrangsa
>
> ------------------------------------------------------------------------------
> The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
> production scanning environment may not be a perfect world - but thanks to
> Kodak, there's a perfect scanner to get the job done! With the NEW KODAK
> i700
> Series Scanner you'll get full speed at 300 dpi even with all image
> processing features enabled. http://p.sf.net/sfu/kodak-com
> _______________________________________________
> senseclusters-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/senseclusters-users
>
>



-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Re: [Senseclusters-users] How to compute BGSS?

Reply via email to