Document Categorizer Custom Feature Generators

Zach Zeman Tue, 05 Feb 2013 07:05:34 -0800

It turns out that the BagOfWords feature generator is insufficient for the 
problem I've been trying to solve using the DocumentCategorizerME. What I need 
is something that performs like the TokenClassFeatureGenerator, but it does not 
appear that AdaptiveFeatureGenerator's are usable with the categorizer. I'm not 
entirely sure about that last point, but I'm perfectly willing to implement my 
own version of a token class feature generator if it is.


However, when I was looking at how to implement a FeatureGenerator, I noticed 
that the text that enters the extractFeatures method has already been broken up 
by whitespace. So, is the featureGenerator the correct place to change how my 
incoming training text is being broken up into features? Or is there another 
process that I've missed which is more appropriate?

Thanks for any help you guys can provide. I've found OpenNLP very useful 
overall, but this part is really confusing me.

-Zach

Document Categorizer Custom Feature Generators

Reply via email to