Re: Exception: The maxent model is not compatible with the sentence detector! when training MaxEnt Sentence Detector

Jim - FooBar(); Tue, 03 Sep 2013 10:12:30 -0700

On a slightly different note, what you're trying to do is NOTunreasonable... I'm thinking about the wider topic of'probabilistic-classifiers' where 'training' essentially boils down togathering a bunch of probabilities. I can't think of any technicalreasons that would prohibit you to train an already trained model onsome extra data, other than the implementation of a given system. Inother words, if all you've got is a bunch of probabilities, why not beable to add to them in the future? Nothing can stop you from doing thatbut the implementation specifics.

To make things more concrete, consider an HMM POS-tagger. When you'train' it, all you're doing is 'observing' what tag appears morefrequently before the tag you’re currently looking. Based on thosefrequencies you built a probability matrix which you consult later on inorder to make predictions. Now, consider this...let's say you chose torepresent your matrix as a HashMap. So the entire model is aHashMap...there is absolutely no problem retraining that whenever youget some more data and without losing the original trainingprobabilities. Of course now you're gonna say that the tag-set might bedifferent, but what if the first time you trained you had access to thefirst half of the Brown corpus and after a couple of months you managedto find the other half...the tag-set doesn't change within the samecorpus so you can retrain your HMM without introducing noise...you just'merge-with +' the maps at the end and voila! You should have the exactsame model as you would had if you'd trained on the entire Brown corpus.

Having said all that, I'll admit that I've never encountered anymachine-learning implementations that allow you to do that and I'mwondering why...It's easy to implement and provides a ton offlexibility. I've proven to myself that it can be done my implementingwhat I described in the previous paragraph (an HMM POS-Tagger) and itworks, but generally speaking libraries and frameworks don't allow it...

I'm saying all that so you don't go thinking that what you tried is justplain wrong...Well, it is the way you tried to but your underlyingthought is perfectly valid.


hope that helps, :)

Jim



On 03/09/13 17:42, Jim - FooBar(); wrote:

On 03/09/13 17:25, Danica Damljanovic wrote:
I was trying to find the original opennlp corpora used for training, but
could not get anything apart from the binary model...

Anyone has any idea on whether it is possible to get this and how?
If I'm not mistaken the original corpora cannot be re-distributed dueto licensing issues...However, don't take my word for it - someonewith the appropriate authority should answer this (someone from thedev-team)...
Also, if I remember correctly, you can get a pretty decentsentence-detecting model with less than 100 sentences, whereas for therest of the components (Tokenizer ,POSTagger, NER etc etc) you needthousands of sentences!
Jim

Re: Exception: The maxent model is not compatible with the sentence detector! when training MaxEnt Sentence Detector

Reply via email to