Re: Label-dependent Features

Alexandre Patry Wed, 02 Oct 2013 19:07:54 -0700

On 13-10-01 11:10 PM, George Ramonov wrote:

Hi everyone,


I am new to OpenNLP maxent classifier, and I have a question regarding
using features that are label-dependent.

I have two sets of words (S1 and S2, where ||S1|| >> ||S2||), and I am
trying to create find words from S2 that are most similar to S1 using
features I designed. I turned this into a classification problem, treating
words from S2 as labels, and built a nice training set. However, my
features are dependent on the labels itself. I can't find a simple way in
OpenNLP to utilize labels in the prediction process. My guess is I would
have to subclass MaxentModel and implement eval() method? Is there an
easier way to solve this problem? Or perhaps, maximum entropy is not the
best algorithm of choice?

You cannot use the label in your features because it is unknown atprediction time. You can however use the set of all possible labels tocompute features. For example, if one of your feature is theedit-distance, you can compute the edit-distance of a word to eachpossible label. Another option is to add a feature to specify the labelwith the minimal edit-distance. If your possible labels are "w1" and"w2", a feature vector could look like :

"edit-distance to w1", "edit-distance to w2", "1 if w1 has smallest editdistance, 0 otherwise", "1 if w2 has smallest edit distance, 0 otherwise"


From there, you can easily generalize to many features and many labels.

Hope this help,

Alexandr

Re: Label-dependent Features

Reply via email to