Users of SenseClusters may wonder where they can find data with which
to experiment.

Over the course of the last few years we have created quite a bit of
data, and I've updated our name discrimination data page to include
links to most of that data. In addition, I've provided very brief
summaries of the content of each collection.

http://www.d.umn.edu/~tpederse/namedata.html

This primarily consists of data we have created by conflating names
together to create a new ambiguity, such as turning all occurrences of
"Tony Blair" and "Bill Clinton" into the now ambiguous name
"TonyBlairBillClinton". The objective with this data is to take the
occurrences of this newly ambiguous name and see if you can discover
who the underlying entities/identities are via SenseClusters.

This page also includes the "Kulkarni name corpus" which is a
collection where ambiguous names as found on the web have been
manually disambiguated.

In addition, please remember that SenseClusters can also be applied to
text where word senses have been manually disambiguated. In this case
the task of SenseClusters is to cluster the occurrences of a word
based on the sense in which it was used. Any of the Senseval-2
formatted data found at the link below can easily be used with
SenseClusters.

http://www.d.umn.edu/~tpederse/data.html

Finally, please note that you can use SenseClusters on email data,
where there is not a single target word or name you are interested in,
but rather you seek to categorize short messages by topic. There is
some email data found in the name data page, and we also have a subset
of the Enron email corpus available which has been categorized by
topic which you could use as input to SenseClusters. That data is
available here:

http://www.d.umn.edu/~tpederse/enron.html

Of course you can use SenseClusters with a wide range of data over a
much broader range of tasks than described here, but the data we
provide here has the advantage of having correct answers associated
with it. This means you can evaluate your results and even compare to
our published results should you wish to do that.

Finally, if you have any data that you have used with SenseClusters
and you'd like to make that available, please do let us know. We'd be
happy to include a link to your data or even host it on our server.

Please let us know if you have any questions or comments about this data!

Cordially,
Ted

-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Reply via email to