William,

thank you for your advice.

>> In some POS-taggers there are default-features included (I think this is
>> the best name for it from all the ones I read), while others didn't have
>> them.
I should have been more clear about that: I do not specifically mean
those from OpenNLP but a lot of others, too.
And I try to understand basic principles that indicate that having a
normalization-default-feature helps to improve tagging-quality, instead
of just doing trial and error or guessing.
Knowing how the model might be computed from a mathematical point of
view *could* help to understand when to use normalization-features,
however if people with more experience at these topics could explain
another way - I am happy with that, too :).

You talked about another topic, too:
Data quality.
Are there any metrics that indicate that you have data of good quality?

For example I tagged several thousand sentences of a specific wiki and
found out that I have a precision of around 90%+ and a recall of around
55-60%. There are several ways to tune these results - do more
iterations on the training-data, tune some other parameters, tag more
sentences etc.. but what helps me to priorize my options?

Kind regards,
Em





Am 02.03.2012 20:35, schrieb [email protected]:
> Hi, Em,
> 
> The OpenNLP default context generator is designed to be portable between
> languages. You should try it and evaluate how your system performs. You can
> also evaluate your model using the tools provided, for example:
> bin/opennlp POSTaggerCrossValidator
> 
> There is no formula to decide if you should include new features. Compare
> the accuracy of other machine learning POS Tagger implementations to yours.
> Usually researchers can create models with 96,5% accuracy in English POS
> Tagger, but it depends on factors like the quality of the training data,
> size of the training data etc.
> 
> You can extend the default context generator to include features that would
> improve your model effectiveness, by checking some characteristics of the
> data you are working with.
> 
> Regards
> William
> 
> 
> 
> On Fri, Mar 2, 2012 at 1:34 PM, Em <[email protected]> wrote:
> 
>> Hello,
>>
>> I've read a little bit about POS-tagging and the theory behind that.
>>
>> In some POS-taggers there are default-features included (I think this is
>> the best name for it from all the ones I read), while others didn't have
>> them.
>>
>> Are you explaining somewhere when to include default-features and when not?
>>
>> Is there a formula one can consult if one has to decide whether to
>> include default-features for normalization and when not?
>>
>> Thank you.
>>
>> Em
>>
> 

Reply via email to