Re: lower case name recognition

Jörn Kottmann Thu, 07 Nov 2013 02:08:07 -0800

On 11/07/2013 10:58 AM, Jens Grivolla wrote:

I don't know specifically about NameFinderME, but with otherstatistical NER systems I noticed that they tend to give a lot ofweight to the fact that a world has initial capitalization when makingthe decision, often so much that it is the only feature that matters.
This is due to the fact that on cleanly written text (e.g. newsarticles) this is an extremely reliable predictor. If you have otherkinds of text such as UGC (e.g. twitter) you need to train a modelusing this kind of data and hope for the best. Accuracy will usuallybe far below what is achieved on news articles.

Exactly. It is mostly a question of the training data, the EnglishSourceForge models are trained on news articles from the 90s. Thesedon't contain

lower cased or all upper cased names.

Jörn

Re: lower case name recognition

Reply via email to