[Senseclusters-users] SenseClusters version 0.53 released

ted pedersen Sun, 20 Jun 2004 16:40:00 -0700

We are pleased to announce the release of version 0.53 of SenseClusters.
There are a number of significant changes to the package present in this
new version, so upgrading is strongly encouraged. You can download the new
version from:


http://www.d.umn.edu/~tpederse/senseclusters.html
https://sourceforge.net/projects/senseclusters/

In particular, this version introduces "global" training, where any large
corpus of text can be used as a source of training data. In previous
versions we only supported a "local" mode of training, where each word
being discriminated requires its own training sample that consists of
multiple instances of that word.

However, in the new global mode the training data can be any large corpus
(e.g., the New York Times, Wall Street Journal, etc.). In addition, this
same corpus can be used as the source of training data for all the words
being discriminated. In previous versions (with the local mode) each word
required  its "own" training data, which was made up  of instances of that
word. Please note that the local mode of training is still supported, so
the global training is provided in addition to the local.

Generally speaking, the global mode allows you to acquire features from a
very large general corpus (e.g. English GigaWord corpus), whereas the
local mode allows you to acquire them from a much smaller more targeted
set of data (e.g., Senseval-2 training data for a particular word). Please
let us know if you have any questions or comments about local versus
global training. There are many interesting issues to explore we think!

In addition, a number of changes have been made to make SenseClusters more
robust and efficient. Below you will find the "official" Changelog for
version 0.53.

Again, please let us know if you have any questions or concerns!

Cordially,
Ted and Amruta

===================================================================

Changes made in Sense-Clusters version 0.51 during version 0.53

Amruta Purandare [EMAIL PROTECTED]
Ted Pedersen     [EMAIL PROTECTED]

University of Minnesota, Duluth

1.      Added programs -

        reduce-count.pl - to reduce the training bigram file by removing
        bigrams made up of both words not present in the test data

        maketarget.pl - to automatically create the target word by searching
        possible forms of the target in the given sval2 file

        sval2plain.pl - to convert a sval2 formatted file into plain text

2.      Updated programs

        wordvec.pl, order2vec.pl, svdpackout.pl, mat2harbo.pl, svdpackout.pl
        by adding error checks to detect floating point overflow and underflow
        errors

        wordvec.pl - with an option to specify features file via --feats option

        simat.pl - no normalization for null vectors

        mat2harbo.pl - added help in perldoc on setting parameters in las2.h
        and problem of las2 running infinitely. Changed default iter to
        min(3*maxprs, cols). For previous default iter = #cols, las2 was
        running infinitely for quite a few experiments.

3.      Updated wrapper discriminate.pl by -

        a. supporting global test and training data that might not contain
        <head> tags

        b. adding unigram feature types for order1 type vectors

4.      Updated /Demos dir. by -

        a. providing fewer scripts that show all possible variations
        b. demonstrating use of both global and local training data

5.      Updated /Docs dir -

        a. modified flow diagrams
        b. removed pseudo scripts as the new demo scripts show all possible
           variations
        c. updated the html documentation files

--
Ted Pedersen
http://www.d.umn.edu/~tpederse


-------------------------------------------------------
This SF.Net email is sponsored by The 2004 JavaOne(SM) Conference
Learn from the experts at JavaOne(SM), Sun's Worldwide Java Developer
Conference, June 28 - July 1 at the Moscone Center in San Francisco, CA
REGISTER AND SAVE! http://java.sun.com/javaone/sf Priority Code NWMGYKND
_______________________________________________
senseclusters-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

[Senseclusters-users] SenseClusters version 0.53 released

Reply via email to