Hi,

On Fri, Dec 19, 2014 at 10:09 AM, Vihari Piratla
<[email protected]> wrote:
> Thanks for the quick response.
> Some follow up questions
> Is it essential to annotate entities as "misc" class too?

No, it is not. You choose which classes you want to annotate. The 4
conll classes is just a classification, but there are others.

> It is usually best to train your own models for the domain data you want to
> annotate,
> otherwise the performance of the model suffers.

> Isn't it hard to generate accurate 15,000 annotated sentences for every
> domain data that
> I wish to recognise? (just want to make sure that I am not missing anything)

Sure, domain adaptation is a well-known, hard and unsolved problem :)
You can try with less data train and see the results, or used models
trained on already available data and know that performance is not
going to be ideal. You can also add gazetteers (lists of entities
perhaps related with the domain you want to annotate), and there are
other more complex approaches trying to learn (almost from scratch)
the classifiers (http://www.aclweb.org/anthology/P10-1029).

In my opinion, the easiest would be to annotate some data and try it
out. If it does not work well, annotate some more and try again.
OpenNLP also offers direct conversion from the Brat annotation tool
format to train the models.

HTH,

Rodrigo

Reply via email to