Re: [Senseclusters-users] [Fwd: Re: WSD]

Ted Pedersen Fri, 05 Dec 2008 18:44:16 -0800

Hi Savas,

The data set you used isn't really intended to be used as input
(directly) to SenseClusters. The English lexical sample data is a
large collection that consists of many different words and their
correct sense - in general SenseClusters would be expecting to process
each of those sets of words individually.If you'd like to break that
data down into a form where SenseClusters can better deal with it, you
can use the preprocess.pl program to take care of that for you. More
details on that here...


http://search.cpan.org/src/TPEDERSE/Text-SenseClusters-1.01/Toolkit/preprocess/sval2/preprocess.pl

An even simpler alternative are the scripts in the /samples directory
that will break that data apart into the individual samples per word
and then run discriminate.pl on each of those words. You can find
those described in more detail here:

http://search.cpan.org/src/TPEDERSE/Text-SenseClusters-1.01/samples/README.samples.pod

If you are just getting started with SenseClusters and would like to
experiment with data for a single word (that is ready to run), you
might want to try the begin.v.xml data, found here:

http://search.cpan.org/src/TPEDERSE/Text-SenseClusters-1.01/samples/Data/begin.v-test.xml

Or, you might want to try out some of the name discrimination data found here :

http://www.d.umn.edu/~tpederse/namedata.html

These have all been separated such that each file pertains to a different name.

I hope this helps! If you have further questions it might be best to
send to the senseclusters-users list - that way all developers see
them, and you are likely to get the fastest possible response!

Cordially,
Ted

>> Savas Yildirim wrote:
>>>
>>> Hi,
>>> I am using SenseCluster Web Interface, I used Ted Petersen's data in
>>> his web page. At last, I got a user.report file showing following
>>> result
>>>
>>> Precision = 3.51(302/8611)
>>> Recall = 3.51(302/8611+0)
>>> F-Measure = 3.51
>>>
>>> And including some tables, matches etc...
>>>
>>> These precision, recall, and f-measure metrics seem to be very bad, Do
>>> I use the program in a wrong way ?
>>>
>>>
>>> This is my command used :
>>>  discriminate.pl "eng-lex-sample.training.xml" --format f16.06 --token
>>> "token.regex" --feature bi --remove 5 --context o2 --clusters 10
>>> --space vector --clmethod rb --crfun i2 --sim cos --label_remove 5
>>> --label_stat ll --label_rank 10  --eval --prefix "user"
>>>
>>> How do I evaluate the result files,(e.g. user.report)
>>>
>>
>
>

-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
senseclusters-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

Re: [Senseclusters-users] [Fwd: Re: WSD]

Reply via email to