Dear Sense Clusterers, In a number of applications, I find myself in need of a "thesaurus"-like list of sets of related words (hopefully, words with similar meanings, but some noise is ok).
It seems like SC should be a good tool for distribution-based thesaurus building, right? However, the current version of SC offers so many options that I am not sure where to start... In short, my input would be lists of bigrams made of a target and a context word (i.e., a word that I want to cluster, and a word that is part of the interesting contexts in which the target word occurs), toghether with their frequency of occurrence (I could of course also provide a raw list of all the pairs, rather than their counts). The output, ideally, should be a full or partial (non-hierarchical?) clustering of the target words. Which sets of scripts should I use? Where in the documentation should I start looking at? Thanks in advance. Regards, Marco -- Marco Baroni SSLMIT, University of Bologna http://sslmit.unibo.it/~baroni ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ senseclusters-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
