Thanks for a quick reply!

I was thinking about using a variation of a bag of words approach for the 
actual model, so LIBSVM/LIBLINEAR is probably a better fit for my data than 
opennlp, though I do appreciate advise on alternative approaches =)

The thought about using NER was to more easily find correlations with nearby 
words (i.e. once a named entity is found replace it with a predetermined token 
and use that token for correlations and as input to the model) rather than to 
”just” extracting them; as the product name and company are uniquely linked to 
a product I would prefer some kind of NER that takes that into consideration so 
that a higher precision and recall rate can be achieved by the premeditated 
information.



Sincerely



31 okt 2014 kl. 22:00 skrev Mark G <[email protected]>:

> Well here are my thoughts... if you know the product review is associated
> with a name, do you still need to perform NER to get names out? If not, one
> approach I have done with sentiment analysis in OpenNLP (I run a pretty
> large scale production app that performs sentiment analysis with a model
> generated from millions of samples) is find some words or phrases that you
> are certain 99% of the time are indicators of sentiment, like "this sucks"
> or "awesome", put those words in a set, read in your data in java (or
> whatever) use regex or .contains and whatever gets a hit on each word or
> phrase, save that off as an initial training set and build a Doccat model
> from those. Then run more data through the model, pick the top scorers and
> add them to the samples etc... do this until you converge on a decent model
> and then use it.
> If you need to pull out names, do the same thing... start with a list of
> known names, create the NER file format by finding the known names, train
> the model with those initial sentences, then iterate. For iterative NER
> training there is an Addon I wrote that can help with that called
> modelbuilder-addon. It's like semi supervised teaching....
> 
> 
> On Fri, Oct 31, 2014 at 4:43 PM, Alexander Wallin <
> [email protected]> wrote:
> 
>> Hi!
>> 
>> I’m writing a sentiment analysis application based on product reviews and
>> am interested in using opennlp for identifying named entities and
>> tokenization. The problem is that the standard models on the project
>> homepage isn’t identifying nearly enough entities and training a completely
>> new model based on my data is outside the scope of my project.
>> 
>> Both training and test set texts has additional information available; is
>> there any way to augment (for instance) the person model to (be more likely
>> to) properly identify Britney Spears as a person in case the text is a
>> product review of her CD (and it’s known beforehand that it’s ”her” CD) or
>> to identify Google as a company if it’s a review of one of their products
>> (under the same conditions)?
>> 
>> Is a (pre trained) model approach incorrect? Should I use a regex based
>> model instead? Other approach? Unfeasible idea and I should reconsider?
>> 
>> 
>> I appreciate any answer and whatever time you spent reading me email.
>> 
>> 
>> Sincerely
>> 
>> Alexander

Reply via email to