We are happy to announce the release of a small but useful utility program
known as nameconflate! You can download it here:

http://www.d.umn.edu/~tpederse/tools.html

This program takes as input text from the English GigaWord Corpus, and
allows you to conflate any number of words or phrases into a single word
(aka a pseudo-word). For example, you could conflate line, China, and
"Tom Hanks" into a single word in the Giga Word corpus (or some portion
of it).

The output is in the lexical sample format from Senseval-2, and will
replace each occurrence of the individual words with their conflated
(ambiguous) form. The correct (unconflated/unambiguous) form is retained
as well, so you can perform word sense disambiguation on the conflated
text, and then easily score your results.

We have used this program rather extensively to create data for
SenseClusters (http://senseclusters.sourceforge.net) and also with our
Duluth WSD systems (http://www.d.umn.edu/~tpederse/senseval3.html)

We have placed some sample data on the tools page above so you can see how
it looks. If you don't have the GigaWord corpus, we would be happy to
generate some samples for you based on particular words you might like to
see conflated.

Cordially,
Ted and Anagha

--
Ted Pedersen
http://www.d.umn.edu/~tpederse


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Reply via email to