Re: [Senseclusters-users] How to compute BGSS?

Ted Pedersen Fri, 08 May 2009 07:02:14 -0700

Hi Behrang,

On Thu, May 7, 2009 at 11:09 AM, Behrang Saeedzadeh <[email protected]> wrote:
> Hi Ted,
> Unfortunately I don't know Perl. Looks like I have to add it to the list of
> the languages I have to learn!


My hope is that you can just use SenseClusters without having to
program any of it. :) I think it will do much of what you might want
to achieve. Also, keep in mind we have the web interface available at
http://marimba.d.umn.edu which you can use to help you formulate
commands and get familiar with how things work (in addition to the
command line mode of operation).

> However in:
>
>
> http://www.mail-archive.com/[email protected]/msg00090.html
> 1/H1 for k = 1 is evaluated to a non-zero number:
>> 1-way clustering: [H1=9.64e-04] [321 of 321] 1/H1 = 1037.34
> And if I am not wrong:
>    1/H1 equals to inter-cluster-similarity/intra-cluster-similarity
> However when k = 1, we only have one cluster and then
> inter-cluster-similarity seems to be equal to 0.
> Or is k = 1 a special case for computing the inter-cluster-similarity?

H1 is shown above to be 0.000964 - while that's not zero it's
effectively pretty close to zero, and I think that's done to avoid the
1/0 problem. I'm not sure how the calculation of H1 is carried out
when k=1 - that's being done by Cluto, so you'd want to check with the
developer for further details.

http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview

I hope this helps!

Good luck,
Ted

> Thanks in advance,
> Behrang Saeedzadeh
> -------------------------------
> http://my.opera.com/behrangsa
> http://twitter.com/behrangsa
> http://www.linkedin.com/in/behrangsa
> http://www.facebook.com/people/Behrang-Saeedzadeh/619892726
> http://www.last.fm/user/behrangsa
>
>
> On Thu, May 7, 2009 at 1:41 AM, Ted Pedersen <[email protected]> wrote:
>>
>> Hi Behrang,
>>
>> Interesting questions, although generally speaking we confine
>> discussion on this list to the SenseClusters package
>> (http://senseclusters.sourceforge.net).
>>
>> That said, your question is related in that SenseClusters does support
>> cluster stopping, and in fact you can use the SenseClusters programs
>> independently of the entire package - there is a program called
>>
>> clusterstopping.pl
>>
>> for example, that provides four different cluster stopping methods.
>> You can see more about those here.
>>
>>
>> http://search.cpan.org/dist/Text-SenseClusters/Toolkit/clusterstop/clusterstopping.pl
>>
>> I hope this helps!
>>
>> Cordially,
>> Ted
>>
>> On Tue, May 5, 2009 at 8:54 AM, Behrang Saeedzadeh <[email protected]>
>> wrote:
>> > Hi all,
>> > I am working on a cluster analysis project and I want to implement a
>> > stopping rule for it. At the moment I want to implement the C/H stopping
>> > rule.
>> > Currently I am computing the WGSS like this (in Java):
>> >     public static double computeWGSS(DocumentGroup group) {
>> >         if (group.getDocuments().size() == 1) {
>> >             return 0.0;
>> >         }
>> >         double wgss = 0.0;
>> >         Document[] docs = group.getDocuments();
>> >         for (int i = 0; i < docs.length; i++) {
>> >             for (int j = i + 1; j < docs.length; j++) {
>> >                 Document d1 = docs[i];
>> >                 Document d2 = docs[j];
>> >                 wgss += computeSumOfSquares(d1.getProfile().getVector(),
>> > d2.getProfile().getVector());
>> >             }
>> >         }
>> >         return wgss / group.size();
>> >     }
>> > This is implemented according to C/H's paper "A dendrite method for
>> > cluster
>> > analysis".
>> >
>> > However I have been unable to find the algorithm for computing BGSS. At
>> > the
>> > moment I have implemented it like this:
>> >     public static double computeBGSS(List<DocumentGroup> groupList) {
>> >         if (groupList.size() == 1) {
>> >             return 0.0;
>> >         }
>> >         double bgss = 0.0;
>> >         for (int i = 0; i < groupList.size(); i++) {
>> >             for (int j = i + 1; j < groupList.size(); j++) {
>> >                 DocumentGroup group1 = groupList.get(i);
>> >                 DocumentGroup group2 = groupList.get(j);
>> >                 bgss += computeBGSS(group1, group2);
>> >             }
>> >         }
>> >         return bgss;
>> >     }
>> >     public static double computeBGSS(DocumentGroup group1, DocumentGroup
>> > group2) {
>> >         double bgss = 0.0;
>> >         for (Document d1 : group1.getDocuments()) {
>> >             for (Document d2 : group2.getDocuments()) {
>> >                 bgss += computeSumOfSquares(d1.getProfile().getVector(),
>> > d2.getProfile().getVector());
>> >             }
>> >         }
>> >         return bgss;
>> >     }
>> >
>> > Is this implementation correct? When calculating WGSS, we divide the
>> > pooled
>> > sum of squares by the number of documents in the cluster, do we have to
>> > divide the pooled sum of squares in BGSS by something, like the number
>> > of
>> > clusters?
>> > Thanks in advance,
>> > Behrang Saeedzadeh
>> > -------------------------------
>> > http://my.opera.com/behrangsa
>> > http://twitter.com/behrangsa
>> > http://www.linkedin.com/in/behrangsa
>> > http://www.facebook.com/people/Behrang-Saeedzadeh/619892726
>> > http://www.last.fm/user/behrangsa
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
>> > production scanning environment may not be a perfect world - but thanks
>> > to
>> > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK
>> > i700
>> > Series Scanner you'll get full speed at 300 dpi even with all image
>> > processing features enabled. http://p.sf.net/sfu/kodak-com
>> > _______________________________________________
>> > senseclusters-users mailing list
>> > [email protected]
>> > https://lists.sourceforge.net/lists/listinfo/senseclusters-users
>> >
>> >
>>
>>
>>
>> --
>> Ted Pedersen
>> http://www.d.umn.edu/~tpederse
>>
>>
>> ------------------------------------------------------------------------------
>> The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
>> production scanning environment may not be a perfect world - but thanks to
>> Kodak, there's a perfect scanner to get the job done! With the NEW KODAK
>> i700
>> Series Scanner you'll get full speed at 300 dpi even with all image
>> processing features enabled. http://p.sf.net/sfu/kodak-com
>> _______________________________________________
>> senseclusters-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/senseclusters-users
>
>
> ------------------------------------------------------------------------------
> The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
> production scanning environment may not be a perfect world - but thanks to
> Kodak, there's a perfect scanner to get the job done! With the NEW KODAK
> i700
> Series Scanner you'll get full speed at 300 dpi even with all image
> processing features enabled. http://p.sf.net/sfu/kodak-com
> _______________________________________________
> senseclusters-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/senseclusters-users
>
>



-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Re: [Senseclusters-users] How to compute BGSS?

Reply via email to