Improving OpenNLP doccat model accuracy and performance

Lahiru Sandakith Gallege Fri, 01 Aug 2014 07:34:31 -0700

Hi,

I have a model trained using OpenNLP doccat programmatically and I am
thinking in which ways I should approach improving my model performance? I
have around 70 labels and 12000 entries in my both training and test
dataset. In my experiments, I am using 90% to 10% training to test data
randomly. Currently my model accuracy is around 60% - 70%.


Here are the questions that I have.

* Will dropping stop words could improve the model accuracy. I did that and
seems it could but did not see a significant improvement. ?
* Does the trained model get skewed if irregular inclusion of spaces or
tabs are present in the training or test data? E.g., "label" "This car
 is made around  2007"
* Does the spaces between label and data should be constant? (Hope the
doccat engine trim() them)? But wanted to make sure?
* Is there a way to configure not to dump the console output from the model?

If possible, Please let me know.
Thanks In Advance.
Lahiru

-- 
Regards
Lahiru Sandakith Gallege

Improving OpenNLP doccat model accuracy and performance

Reply via email to