Hi, Em,

The OpenNLP default context generator is designed to be portable between
languages. You should try it and evaluate how your system performs. You can
also evaluate your model using the tools provided, for example:
bin/opennlp POSTaggerCrossValidator

There is no formula to decide if you should include new features. Compare
the accuracy of other machine learning POS Tagger implementations to yours.
Usually researchers can create models with 96,5% accuracy in English POS
Tagger, but it depends on factors like the quality of the training data,
size of the training data etc.

You can extend the default context generator to include features that would
improve your model effectiveness, by checking some characteristics of the
data you are working with.

Regards
William



On Fri, Mar 2, 2012 at 1:34 PM, Em <[email protected]> wrote:

> Hello,
>
> I've read a little bit about POS-tagging and the theory behind that.
>
> In some POS-taggers there are default-features included (I think this is
> the best name for it from all the ones I read), while others didn't have
> them.
>
> Are you explaining somewhere when to include default-features and when not?
>
> Is there a formula one can consult if one has to decide whether to
> include default-features for normalization and when not?
>
> Thank you.
>
> Em
>

Reply via email to