William, thank you for your advice.
>> In some POS-taggers there are default-features included (I think this is >> the best name for it from all the ones I read), while others didn't have >> them. I should have been more clear about that: I do not specifically mean those from OpenNLP but a lot of others, too. And I try to understand basic principles that indicate that having a normalization-default-feature helps to improve tagging-quality, instead of just doing trial and error or guessing. Knowing how the model might be computed from a mathematical point of view *could* help to understand when to use normalization-features, however if people with more experience at these topics could explain another way - I am happy with that, too :). You talked about another topic, too: Data quality. Are there any metrics that indicate that you have data of good quality? For example I tagged several thousand sentences of a specific wiki and found out that I have a precision of around 90%+ and a recall of around 55-60%. There are several ways to tune these results - do more iterations on the training-data, tune some other parameters, tag more sentences etc.. but what helps me to priorize my options? Kind regards, Em Am 02.03.2012 20:35, schrieb [email protected]: > Hi, Em, > > The OpenNLP default context generator is designed to be portable between > languages. You should try it and evaluate how your system performs. You can > also evaluate your model using the tools provided, for example: > bin/opennlp POSTaggerCrossValidator > > There is no formula to decide if you should include new features. Compare > the accuracy of other machine learning POS Tagger implementations to yours. > Usually researchers can create models with 96,5% accuracy in English POS > Tagger, but it depends on factors like the quality of the training data, > size of the training data etc. > > You can extend the default context generator to include features that would > improve your model effectiveness, by checking some characteristics of the > data you are working with. > > Regards > William > > > > On Fri, Mar 2, 2012 at 1:34 PM, Em <[email protected]> wrote: > >> Hello, >> >> I've read a little bit about POS-tagging and the theory behind that. >> >> In some POS-taggers there are default-features included (I think this is >> the best name for it from all the ones I read), while others didn't have >> them. >> >> Are you explaining somewhere when to include default-features and when not? >> >> Is there a formula one can consult if one has to decide whether to >> include default-features for normalization and when not? >> >> Thank you. >> >> Em >> >
