Re: Question concerning named entity identification for reviews

Mark G Fri, 31 Oct 2014 14:03:02 -0700

Well here are my thoughts... if you know the product review is associated
with a name, do you still need to perform NER to get names out? If not, one
approach I have done with sentiment analysis in OpenNLP (I run a pretty
large scale production app that performs sentiment analysis with a model
generated from millions of samples) is find some words or phrases that you
are certain 99% of the time are indicators of sentiment, like "this sucks"
or "awesome", put those words in a set, read in your data in java (or
whatever) use regex or .contains and whatever gets a hit on each word or
phrase, save that off as an initial training set and build a Doccat model
from those. Then run more data through the model, pick the top scorers and
add them to the samples etc... do this until you converge on a decent model
and then use it.
If you need to pull out names, do the same thing... start with a list of
known names, create the NER file format by finding the known names, train
the model with those initial sentences, then iterate. For iterative NER
training there is an Addon I wrote that can help with that called
modelbuilder-addon. It's like semi supervised teaching....



On Fri, Oct 31, 2014 at 4:43 PM, Alexander Wallin <
[email protected]> wrote:

> Hi!
>
> I’m writing a sentiment analysis application based on product reviews and
> am interested in using opennlp for identifying named entities and
> tokenization. The problem is that the standard models on the project
> homepage isn’t identifying nearly enough entities and training a completely
> new model based on my data is outside the scope of my project.
>
> Both training and test set texts has additional information available; is
> there any way to augment (for instance) the person model to (be more likely
> to) properly identify Britney Spears as a person in case the text is a
> product review of her CD (and it’s known beforehand that it’s ”her” CD) or
> to identify Google as a company if it’s a review of one of their products
> (under the same conditions)?
>
> Is a (pre trained) model approach incorrect? Should I use a regex based
> model instead? Other approach? Unfeasible idea and I should reconsider?
>
>
> I appreciate any answer and whatever time you spent reading me email.
>
>
> Sincerely
>
> Alexander

Re: Question concerning named entity identification for reviews

Reply via email to