Thanks Alexandre,

That's what I ended up doing in the end. It worked out quite nicely. Thanks
for your help!

On Wed, Oct 2, 2013 at 7:07 PM, Alexandre Patry <[email protected]> wrote:

> On 13-10-01 11:10 PM, George Ramonov wrote:
>
>> Hi everyone,
>>
>> I am new to OpenNLP maxent classifier, and I have a question regarding
>> using features that are label-dependent.
>>
>> I have two sets of words (S1 and S2, where ||S1|| >> ||S2||), and I am
>> trying to create find words from S2 that are most similar to S1 using
>> features I designed. I turned this into a classification problem, treating
>> words from S2 as labels, and built a nice training set. However, my
>> features are dependent on the labels itself. I can't find a simple way in
>> OpenNLP to utilize labels in the prediction process. My guess is I would
>> have to subclass MaxentModel and implement eval() method? Is there an
>> easier way to solve this problem? Or perhaps, maximum entropy is not the
>> best algorithm of choice?
>>
> You cannot use the label in your features because it is unknown at
> prediction time. You can however use the set of all possible labels to
> compute features. For example, if one of your feature is the edit-distance,
> you can compute the edit-distance of a word to each possible label. Another
> option is to add a feature to specify the label with the minimal
> edit-distance. If your possible labels are "w1" and "w2", a feature vector
> could look like :
>
> "edit-distance to w1", "edit-distance to w2", "1 if w1 has smallest edit
> distance, 0 otherwise", "1 if w2 has smallest edit distance, 0 otherwise"
>
> From there, you can easily generalize to many features and many labels.
>
> Hope this help,
>
> Alexandr
>

Reply via email to