Hi! I’m writing a sentiment analysis application based on product reviews and am interested in using opennlp for identifying named entities and tokenization. The problem is that the standard models on the project homepage isn’t identifying nearly enough entities and training a completely new model based on my data is outside the scope of my project.
Both training and test set texts has additional information available; is there any way to augment (for instance) the person model to (be more likely to) properly identify Britney Spears as a person in case the text is a product review of her CD (and it’s known beforehand that it’s ”her” CD) or to identify Google as a company if it’s a review of one of their products (under the same conditions)? Is a (pre trained) model approach incorrect? Should I use a regex based model instead? Other approach? Unfeasible idea and I should reconsider? I appreciate any answer and whatever time you spent reading me email. Sincerely Alexander
