On 11/07/2013 10:58 AM, Jens Grivolla wrote:
I don't know specifically about NameFinderME, but with other statistical NER systems I noticed that they tend to give a lot of weight to the fact that a world has initial capitalization when making the decision, often so much that it is the only feature that matters.

This is due to the fact that on cleanly written text (e.g. news articles) this is an extremely reliable predictor. If you have other kinds of text such as UGC (e.g. twitter) you need to train a model using this kind of data and hope for the best. Accuracy will usually be far below what is achieved on news articles.


Exactly. It is mostly a question of the training data, the English SourceForge models are trained on news articles from the 90s. These don't contain
lower cased or all upper cased names.

Jörn

Reply via email to