[Senseclusters-users] removing target word from context representations

ted pedersen Mon, 25 Oct 2004 22:43:51 -0700

There is an interesting issue surrounding the target word that you are
discriminating, and whether or not that should be included in the
representation of the context that is being clustered. Now, when using
discriminate.pl the target word will always be included in context
representation, but there are options available in individual programs
that let you remove it. (It's important to point out that discriminate.pl
does not utilize all of the options available in SenseClusters.)


Now, when using discriminate.pl in order 2 the target word is represented
by a vector that contributes to the average vector that represents the
context. In order 1, the target word will be represented by a feature.
Now, the reason for including this is that morphological variations (like
"line" versus "lines" might be indicative of different senses of the word.
Also, we might want to use Senseclusters for synonym identification, for
example, suppose we had target words "line", "cord", and "queue". We would
want each of those to be represened either as a first or second order
feature.

However, let's suppose that you want to remove the target word as a
feature. You might want to do this if you don't want to consider
morphological differences as significant, or if you feel that the target
word is simply adding noise.

The good news is you can do that. There is an --extarget option that can
be used with the following programs that will remove the target word as a
feature:

order1vec.pl --extarget

        order1vec.pl produces a first order context vector for each
        instance in the given test data. --extarget will make sure
        that the target word (in all of its forms) will not be included
        as a first order feature.

wordvec.pl --extarget

        wordvec.pl converts NSP output into a word by word
        co--occurrence matrix. This matrix will be the source of the
        word vectors used to create 2nd order vectors.
        Using --extarget removes the target word (in all of its forms)
        from this matrix.

By default the target word appears in both the 1st and 2nd order
representations, but you can exclude the target word if you like. You will
need to run SenseClusters program by program (or modify discriminate.pl)
so that the --extarget option is used.




--
Ted Pedersen
http://www.d.umn.edu/~tpederse


-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
senseclusters-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/senseclusters-users

[Senseclusters-users] removing target word from context representations

Reply via email to