Thanks Alexandre, That's what I ended up doing in the end. It worked out quite nicely. Thanks for your help!
On Wed, Oct 2, 2013 at 7:07 PM, Alexandre Patry <[email protected]> wrote: > On 13-10-01 11:10 PM, George Ramonov wrote: > >> Hi everyone, >> >> I am new to OpenNLP maxent classifier, and I have a question regarding >> using features that are label-dependent. >> >> I have two sets of words (S1 and S2, where ||S1|| >> ||S2||), and I am >> trying to create find words from S2 that are most similar to S1 using >> features I designed. I turned this into a classification problem, treating >> words from S2 as labels, and built a nice training set. However, my >> features are dependent on the labels itself. I can't find a simple way in >> OpenNLP to utilize labels in the prediction process. My guess is I would >> have to subclass MaxentModel and implement eval() method? Is there an >> easier way to solve this problem? Or perhaps, maximum entropy is not the >> best algorithm of choice? >> > You cannot use the label in your features because it is unknown at > prediction time. You can however use the set of all possible labels to > compute features. For example, if one of your feature is the edit-distance, > you can compute the edit-distance of a word to each possible label. Another > option is to add a feature to specify the label with the minimal > edit-distance. If your possible labels are "w1" and "w2", a feature vector > could look like : > > "edit-distance to w1", "edit-distance to w2", "1 if w1 has smallest edit > distance, 0 otherwise", "1 if w2 has smallest edit distance, 0 otherwise" > > From there, you can easily generalize to many features and many labels. > > Hope this help, > > Alexandr >
