We are pleased to announce the release of version 0.53 of SenseClusters. There are a number of significant changes to the package present in this new version, so upgrading is strongly encouraged. You can download the new version from:
http://www.d.umn.edu/~tpederse/senseclusters.html https://sourceforge.net/projects/senseclusters/ In particular, this version introduces "global" training, where any large corpus of text can be used as a source of training data. In previous versions we only supported a "local" mode of training, where each word being discriminated requires its own training sample that consists of multiple instances of that word. However, in the new global mode the training data can be any large corpus (e.g., the New York Times, Wall Street Journal, etc.). In addition, this same corpus can be used as the source of training data for all the words being discriminated. In previous versions (with the local mode) each word required its "own" training data, which was made up of instances of that word. Please note that the local mode of training is still supported, so the global training is provided in addition to the local. Generally speaking, the global mode allows you to acquire features from a very large general corpus (e.g. English GigaWord corpus), whereas the local mode allows you to acquire them from a much smaller more targeted set of data (e.g., Senseval-2 training data for a particular word). Please let us know if you have any questions or comments about local versus global training. There are many interesting issues to explore we think! In addition, a number of changes have been made to make SenseClusters more robust and efficient. Below you will find the "official" Changelog for version 0.53. Again, please let us know if you have any questions or concerns! Cordially, Ted and Amruta =================================================================== Changes made in Sense-Clusters version 0.51 during version 0.53 Amruta Purandare [EMAIL PROTECTED] Ted Pedersen [EMAIL PROTECTED] University of Minnesota, Duluth 1. Added programs - reduce-count.pl - to reduce the training bigram file by removing bigrams made up of both words not present in the test data maketarget.pl - to automatically create the target word by searching possible forms of the target in the given sval2 file sval2plain.pl - to convert a sval2 formatted file into plain text 2. Updated programs wordvec.pl, order2vec.pl, svdpackout.pl, mat2harbo.pl, svdpackout.pl by adding error checks to detect floating point overflow and underflow errors wordvec.pl - with an option to specify features file via --feats option simat.pl - no normalization for null vectors mat2harbo.pl - added help in perldoc on setting parameters in las2.h and problem of las2 running infinitely. Changed default iter to min(3*maxprs, cols). For previous default iter = #cols, las2 was running infinitely for quite a few experiments. 3. Updated wrapper discriminate.pl by - a. supporting global test and training data that might not contain <head> tags b. adding unigram feature types for order1 type vectors 4. Updated /Demos dir. by - a. providing fewer scripts that show all possible variations b. demonstrating use of both global and local training data 5. Updated /Docs dir - a. modified flow diagrams b. removed pseudo scripts as the new demo scripts show all possible variations c. updated the html documentation files -- Ted Pedersen http://www.d.umn.edu/~tpederse ------------------------------------------------------- This SF.Net email is sponsored by The 2004 JavaOne(SM) Conference Learn from the experts at JavaOne(SM), Sun's Worldwide Java Developer Conference, June 28 - July 1 at the Moscone Center in San Francisco, CA REGISTER AND SAVE! http://java.sun.com/javaone/sf Priority Code NWMGYKND _______________________________________________ senseclusters-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
