Hi Behrang, On Thu, May 7, 2009 at 11:09 AM, Behrang Saeedzadeh <[email protected]> wrote: > Hi Ted, > Unfortunately I don't know Perl. Looks like I have to add it to the list of > the languages I have to learn!
My hope is that you can just use SenseClusters without having to program any of it. :) I think it will do much of what you might want to achieve. Also, keep in mind we have the web interface available at http://marimba.d.umn.edu which you can use to help you formulate commands and get familiar with how things work (in addition to the command line mode of operation). > However in: > > > http://www.mail-archive.com/[email protected]/msg00090.html > 1/H1 for k = 1 is evaluated to a non-zero number: >> 1-way clustering: [H1=9.64e-04] [321 of 321] 1/H1 = 1037.34 > And if I am not wrong: > 1/H1 equals to inter-cluster-similarity/intra-cluster-similarity > However when k = 1, we only have one cluster and then > inter-cluster-similarity seems to be equal to 0. > Or is k = 1 a special case for computing the inter-cluster-similarity? H1 is shown above to be 0.000964 - while that's not zero it's effectively pretty close to zero, and I think that's done to avoid the 1/0 problem. I'm not sure how the calculation of H1 is carried out when k=1 - that's being done by Cluto, so you'd want to check with the developer for further details. http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview I hope this helps! Good luck, Ted > Thanks in advance, > Behrang Saeedzadeh > ------------------------------- > http://my.opera.com/behrangsa > http://twitter.com/behrangsa > http://www.linkedin.com/in/behrangsa > http://www.facebook.com/people/Behrang-Saeedzadeh/619892726 > http://www.last.fm/user/behrangsa > > > On Thu, May 7, 2009 at 1:41 AM, Ted Pedersen <[email protected]> wrote: >> >> Hi Behrang, >> >> Interesting questions, although generally speaking we confine >> discussion on this list to the SenseClusters package >> (http://senseclusters.sourceforge.net). >> >> That said, your question is related in that SenseClusters does support >> cluster stopping, and in fact you can use the SenseClusters programs >> independently of the entire package - there is a program called >> >> clusterstopping.pl >> >> for example, that provides four different cluster stopping methods. >> You can see more about those here. >> >> >> http://search.cpan.org/dist/Text-SenseClusters/Toolkit/clusterstop/clusterstopping.pl >> >> I hope this helps! >> >> Cordially, >> Ted >> >> On Tue, May 5, 2009 at 8:54 AM, Behrang Saeedzadeh <[email protected]> >> wrote: >> > Hi all, >> > I am working on a cluster analysis project and I want to implement a >> > stopping rule for it. At the moment I want to implement the C/H stopping >> > rule. >> > Currently I am computing the WGSS like this (in Java): >> > public static double computeWGSS(DocumentGroup group) { >> > if (group.getDocuments().size() == 1) { >> > return 0.0; >> > } >> > double wgss = 0.0; >> > Document[] docs = group.getDocuments(); >> > for (int i = 0; i < docs.length; i++) { >> > for (int j = i + 1; j < docs.length; j++) { >> > Document d1 = docs[i]; >> > Document d2 = docs[j]; >> > wgss += computeSumOfSquares(d1.getProfile().getVector(), >> > d2.getProfile().getVector()); >> > } >> > } >> > return wgss / group.size(); >> > } >> > This is implemented according to C/H's paper "A dendrite method for >> > cluster >> > analysis". >> > >> > However I have been unable to find the algorithm for computing BGSS. At >> > the >> > moment I have implemented it like this: >> > public static double computeBGSS(List<DocumentGroup> groupList) { >> > if (groupList.size() == 1) { >> > return 0.0; >> > } >> > double bgss = 0.0; >> > for (int i = 0; i < groupList.size(); i++) { >> > for (int j = i + 1; j < groupList.size(); j++) { >> > DocumentGroup group1 = groupList.get(i); >> > DocumentGroup group2 = groupList.get(j); >> > bgss += computeBGSS(group1, group2); >> > } >> > } >> > return bgss; >> > } >> > public static double computeBGSS(DocumentGroup group1, DocumentGroup >> > group2) { >> > double bgss = 0.0; >> > for (Document d1 : group1.getDocuments()) { >> > for (Document d2 : group2.getDocuments()) { >> > bgss += computeSumOfSquares(d1.getProfile().getVector(), >> > d2.getProfile().getVector()); >> > } >> > } >> > return bgss; >> > } >> > >> > Is this implementation correct? When calculating WGSS, we divide the >> > pooled >> > sum of squares by the number of documents in the cluster, do we have to >> > divide the pooled sum of squares in BGSS by something, like the number >> > of >> > clusters? >> > Thanks in advance, >> > Behrang Saeedzadeh >> > ------------------------------- >> > http://my.opera.com/behrangsa >> > http://twitter.com/behrangsa >> > http://www.linkedin.com/in/behrangsa >> > http://www.facebook.com/people/Behrang-Saeedzadeh/619892726 >> > http://www.last.fm/user/behrangsa >> > >> > >> > ------------------------------------------------------------------------------ >> > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your >> > production scanning environment may not be a perfect world - but thanks >> > to >> > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK >> > i700 >> > Series Scanner you'll get full speed at 300 dpi even with all image >> > processing features enabled. http://p.sf.net/sfu/kodak-com >> > _______________________________________________ >> > senseclusters-users mailing list >> > [email protected] >> > https://lists.sourceforge.net/lists/listinfo/senseclusters-users >> > >> > >> >> >> >> -- >> Ted Pedersen >> http://www.d.umn.edu/~tpederse >> >> >> ------------------------------------------------------------------------------ >> The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your >> production scanning environment may not be a perfect world - but thanks to >> Kodak, there's a perfect scanner to get the job done! With the NEW KODAK >> i700 >> Series Scanner you'll get full speed at 300 dpi even with all image >> processing features enabled. http://p.sf.net/sfu/kodak-com >> _______________________________________________ >> senseclusters-users mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/senseclusters-users > > > ------------------------------------------------------------------------------ > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your > production scanning environment may not be a perfect world - but thanks to > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK > i700 > Series Scanner you'll get full speed at 300 dpi even with all image > processing features enabled. http://p.sf.net/sfu/kodak-com > _______________________________________________ > senseclusters-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/senseclusters-users > > -- Ted Pedersen http://www.d.umn.edu/~tpederse ------------------------------------------------------------------------------ The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your production scanning environment may not be a perfect world - but thanks to Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 Series Scanner you'll get full speed at 300 dpi even with all image processing features enabled. http://p.sf.net/sfu/kodak-com _______________________________________________ senseclusters-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
