Hi Ted,
Unfortunately I don't know Perl. Looks like I have to add it to the list of
the languages I have to learn!
However in:
http://www.mail-archive.com/[email protected]/msg00090.html
1/H1 for k = 1 is evaluated to a non-zero number:
> 1-way clustering: [H1=9.64e-04] [321 of 321] 1/H1 = 1037.34
And if I am not wrong:
1/H1 equals to inter-cluster-similarity/intra-cluster-similarity
However when k = 1, we only have one cluster and then
inter-cluster-similarity seems to be equal to 0.
Or is k = 1 a special case for computing the inter-cluster-similarity?
Thanks in advance,
Behrang Saeedzadeh
-------------------------------
http://my.opera.com/behrangsa
http://twitter.com/behrangsa
http://www.linkedin.com/in/behrangsa
http://www.facebook.com/people/Behrang-Saeedzadeh/619892726
http://www.last.fm/user/behrangsa
On Thu, May 7, 2009 at 1:41 AM, Ted Pedersen <[email protected]> wrote:
> Hi Behrang,
>
> Interesting questions, although generally speaking we confine
> discussion on this list to the SenseClusters package
> (http://senseclusters.sourceforge.net).
>
> That said, your question is related in that SenseClusters does support
> cluster stopping, and in fact you can use the SenseClusters programs
> independently of the entire package - there is a program called
>
> clusterstopping.pl
>
> for example, that provides four different cluster stopping methods.
> You can see more about those here.
>
>
> http://search.cpan.org/dist/Text-SenseClusters/Toolkit/clusterstop/clusterstopping.pl
>
> I hope this helps!
>
> Cordially,
> Ted
>
> On Tue, May 5, 2009 at 8:54 AM, Behrang Saeedzadeh <[email protected]>
> wrote:
> > Hi all,
> > I am working on a cluster analysis project and I want to implement a
> > stopping rule for it. At the moment I want to implement the C/H stopping
> > rule.
> > Currently I am computing the WGSS like this (in Java):
> > public static double computeWGSS(DocumentGroup group) {
> > if (group.getDocuments().size() == 1) {
> > return 0.0;
> > }
> > double wgss = 0.0;
> > Document[] docs = group.getDocuments();
> > for (int i = 0; i < docs.length; i++) {
> > for (int j = i + 1; j < docs.length; j++) {
> > Document d1 = docs[i];
> > Document d2 = docs[j];
> > wgss += computeSumOfSquares(d1.getProfile().getVector(),
> > d2.getProfile().getVector());
> > }
> > }
> > return wgss / group.size();
> > }
> > This is implemented according to C/H's paper "A dendrite method for
> cluster
> > analysis".
> >
> > However I have been unable to find the algorithm for computing BGSS. At
> the
> > moment I have implemented it like this:
> > public static double computeBGSS(List<DocumentGroup> groupList) {
> > if (groupList.size() == 1) {
> > return 0.0;
> > }
> > double bgss = 0.0;
> > for (int i = 0; i < groupList.size(); i++) {
> > for (int j = i + 1; j < groupList.size(); j++) {
> > DocumentGroup group1 = groupList.get(i);
> > DocumentGroup group2 = groupList.get(j);
> > bgss += computeBGSS(group1, group2);
> > }
> > }
> > return bgss;
> > }
> > public static double computeBGSS(DocumentGroup group1, DocumentGroup
> > group2) {
> > double bgss = 0.0;
> > for (Document d1 : group1.getDocuments()) {
> > for (Document d2 : group2.getDocuments()) {
> > bgss += computeSumOfSquares(d1.getProfile().getVector(),
> > d2.getProfile().getVector());
> > }
> > }
> > return bgss;
> > }
> >
> > Is this implementation correct? When calculating WGSS, we divide the
> pooled
> > sum of squares by the number of documents in the cluster, do we have to
> > divide the pooled sum of squares in BGSS by something, like the number of
> > clusters?
> > Thanks in advance,
> > Behrang Saeedzadeh
> > -------------------------------
> > http://my.opera.com/behrangsa
> > http://twitter.com/behrangsa
> > http://www.linkedin.com/in/behrangsa
> > http://www.facebook.com/people/Behrang-Saeedzadeh/619892726
> > http://www.last.fm/user/behrangsa
> >
> >
> ------------------------------------------------------------------------------
> > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
> > production scanning environment may not be a perfect world - but thanks
> to
> > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK
> > i700
> > Series Scanner you'll get full speed at 300 dpi even with all image
> > processing features enabled. http://p.sf.net/sfu/kodak-com
> > _______________________________________________
> > senseclusters-users mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/senseclusters-users
> >
> >
>
>
>
> --
> Ted Pedersen
> http://www.d.umn.edu/~tpederse
>
>
> ------------------------------------------------------------------------------
> The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
> production scanning environment may not be a perfect world - but thanks to
> Kodak, there's a perfect scanner to get the job done! With the NEW KODAK
> i700
> Series Scanner you'll get full speed at 300 dpi even with all image
> processing features enabled. http://p.sf.net/sfu/kodak-com
> _______________________________________________
> senseclusters-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/senseclusters-users
>
------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users