Hi there,
Just a number: I trained the NER with ca. 5000 manually annotated German
sentences from Wikipedia. The resulting model performs ... not so well.
So I think a higher number is required.
If you are interested, data and compiled model are free available:
http://www.thomas-zastrow.de/nlp/
Best,
Tom
Am 19.12.2014 um 10:32 schrieb Vihari Piratla:
Thanks :)
On Fri, Dec 19, 2014 at 3:02 PM, Vihari Piratla <[email protected]>
wrote:
Useful insight on training Entity Recogniser model from scratch.
---------- Forwarded message ----------
From: Rodrigo Agerri <[email protected]>
Date: Fri, Dec 19, 2014 at 2:52 PM
Subject: Re: Queries related to training Entitiy Recogniser.
To: "[email protected]" <[email protected]>
Hi,
On Fri, Dec 19, 2014 at 10:09 AM, Vihari Piratla
<[email protected]> wrote:
Thanks for the quick response.
Some follow up questions
Is it essential to annotate entities as "misc" class too?
No, it is not. You choose which classes you want to annotate. The 4
conll classes is just a classification, but there are others.
It is usually best to train your own models for the domain data you want
to
annotate,
otherwise the performance of the model suffers.
Isn't it hard to generate accurate 15,000 annotated sentences for every
domain data that
I wish to recognise? (just want to make sure that I am not missing
anything)
Sure, domain adaptation is a well-known, hard and unsolved problem :)
You can try with less data train and see the results, or used models
trained on already available data and know that performance is not
going to be ideal. You can also add gazetteers (lists of entities
perhaps related with the domain you want to annotate), and there are
other more complex approaches trying to learn (almost from scratch)
the classifiers (http://www.aclweb.org/anthology/P10-1029).
In my opinion, the easiest would be to annotate some data and try it
out. If it does not work well, annotate some more and try again.
OpenNLP also offers direct conversion from the Brat annotation tool
format to train the models.
HTH,
Rodrigo
--
V
--
Dr. Thomas Zastrow
Riedererstr. 13
85737 Ismaning
Tel.: 0162 422 8029
www.thomas-zastrow.de