[Senseclusters-users] sc for thesaurus building -- where to start?

Marco Baroni Wed, 11 Oct 2006 06:44:35 -0700

Dear Sense Clusterers,

In a number of applications, I find myself in need of a "thesaurus"-like 
list of sets of related words (hopefully, words with similar meanings, but 
some noise is ok).


It seems like SC should be a good tool for  distribution-based thesaurus 
building, right?

However, the current version of SC offers so many options that I am not 
sure where to start...

In short, my input would be lists of bigrams made of a target and a context 
word (i.e., a word that I want to cluster, and a word that is part of the 
interesting contexts in which the target word occurs), toghether with their 
frequency of occurrence (I could of course also provide a raw list of all 
the pairs, rather than their counts).

The output, ideally, should be a full or partial (non-hierarchical?) 
clustering of the target words.

Which sets of scripts should I use? Where in the documentation should I 
start looking at?

Thanks in advance.

Regards,

Marco

-- 
Marco Baroni
SSLMIT, University of Bologna
http://sslmit.unibo.it/~baroni



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

[Senseclusters-users] sc for thesaurus building -- where to start?

Reply via email to