On 13-10-01 11:10 PM, George Ramonov wrote:
Hi everyone,

I am new to OpenNLP maxent classifier, and I have a question regarding
using features that are label-dependent.

I have two sets of words (S1 and S2, where ||S1|| >> ||S2||), and I am
trying to create find words from S2 that are most similar to S1 using
features I designed. I turned this into a classification problem, treating
words from S2 as labels, and built a nice training set. However, my
features are dependent on the labels itself. I can't find a simple way in
OpenNLP to utilize labels in the prediction process. My guess is I would
have to subclass MaxentModel and implement eval() method? Is there an
easier way to solve this problem? Or perhaps, maximum entropy is not the
best algorithm of choice?
You cannot use the label in your features because it is unknown at prediction time. You can however use the set of all possible labels to compute features. For example, if one of your feature is the edit-distance, you can compute the edit-distance of a word to each possible label. Another option is to add a feature to specify the label with the minimal edit-distance. If your possible labels are "w1" and "w2", a feature vector could look like :

"edit-distance to w1", "edit-distance to w2", "1 if w1 has smallest edit distance, 0 otherwise", "1 if w2 has smallest edit distance, 0 otherwise"

From there, you can easily generalize to many features and many labels.

Hope this help,

Alexandr

Reply via email to