Thanks :) On Fri, Dec 19, 2014 at 3:02 PM, Vihari Piratla <[email protected]> wrote: > > Useful insight on training Entity Recogniser model from scratch. > > ---------- Forwarded message ---------- > From: Rodrigo Agerri <[email protected]> > Date: Fri, Dec 19, 2014 at 2:52 PM > Subject: Re: Queries related to training Entitiy Recogniser. > To: "[email protected]" <[email protected]> > > Hi, > > On Fri, Dec 19, 2014 at 10:09 AM, Vihari Piratla > <[email protected]> wrote: > > Thanks for the quick response. > > Some follow up questions > > Is it essential to annotate entities as "misc" class too? > > No, it is not. You choose which classes you want to annotate. The 4 > conll classes is just a classification, but there are others. > > > It is usually best to train your own models for the domain data you want > to > > annotate, > > otherwise the performance of the model suffers. > > > Isn't it hard to generate accurate 15,000 annotated sentences for every > > domain data that > > I wish to recognise? (just want to make sure that I am not missing > anything) > > Sure, domain adaptation is a well-known, hard and unsolved problem :) > You can try with less data train and see the results, or used models > trained on already available data and know that performance is not > going to be ideal. You can also add gazetteers (lists of entities > perhaps related with the domain you want to annotate), and there are > other more complex approaches trying to learn (almost from scratch) > the classifiers (http://www.aclweb.org/anthology/P10-1029). > > In my opinion, the easiest would be to annotate some data and try it > out. If it does not work well, annotate some more and try again. > OpenNLP also offers direct conversion from the Brat annotation tool > format to train the models. > > HTH, > > Rodrigo > > > -- > V >
-- V
