Hi Savas, The data set you used isn't really intended to be used as input (directly) to SenseClusters. The English lexical sample data is a large collection that consists of many different words and their correct sense - in general SenseClusters would be expecting to process each of those sets of words individually.If you'd like to break that data down into a form where SenseClusters can better deal with it, you can use the preprocess.pl program to take care of that for you. More details on that here...
http://search.cpan.org/src/TPEDERSE/Text-SenseClusters-1.01/Toolkit/preprocess/sval2/preprocess.pl An even simpler alternative are the scripts in the /samples directory that will break that data apart into the individual samples per word and then run discriminate.pl on each of those words. You can find those described in more detail here: http://search.cpan.org/src/TPEDERSE/Text-SenseClusters-1.01/samples/README.samples.pod If you are just getting started with SenseClusters and would like to experiment with data for a single word (that is ready to run), you might want to try the begin.v.xml data, found here: http://search.cpan.org/src/TPEDERSE/Text-SenseClusters-1.01/samples/Data/begin.v-test.xml Or, you might want to try out some of the name discrimination data found here : http://www.d.umn.edu/~tpederse/namedata.html These have all been separated such that each file pertains to a different name. I hope this helps! If you have further questions it might be best to send to the senseclusters-users list - that way all developers see them, and you are likely to get the fastest possible response! Cordially, Ted >> Savas Yildirim wrote: >>> >>> Hi, >>> I am using SenseCluster Web Interface, I used Ted Petersen's data in >>> his web page. At last, I got a user.report file showing following >>> result >>> >>> Precision = 3.51(302/8611) >>> Recall = 3.51(302/8611+0) >>> F-Measure = 3.51 >>> >>> And including some tables, matches etc... >>> >>> These precision, recall, and f-measure metrics seem to be very bad, Do >>> I use the program in a wrong way ? >>> >>> >>> This is my command used : >>> discriminate.pl "eng-lex-sample.training.xml" --format f16.06 --token >>> "token.regex" --feature bi --remove 5 --context o2 --clusters 10 >>> --space vector --clmethod rb --crfun i2 --sim cos --label_remove 5 >>> --label_stat ll --label_rank 10 --eval --prefix "user" >>> >>> How do I evaluate the result files,(e.g. user.report) >>> >> > > -- Ted Pedersen http://www.d.umn.edu/~tpederse ------------------------------------------------------------------------------ SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada. The future of the web can't happen without you. Join us at MIX09 to help pave the way to the Next Web now. Learn more and register at http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/ _______________________________________________ senseclusters-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
