Thanks :)

On Fri, Dec 19, 2014 at 3:02 PM, Vihari Piratla <[email protected]>
wrote:
>
> Useful insight on training Entity Recogniser model from scratch.
>
> ---------- Forwarded message ----------
> From: Rodrigo Agerri <[email protected]>
> Date: Fri, Dec 19, 2014 at 2:52 PM
> Subject: Re: Queries related to training Entitiy Recogniser.
> To: "[email protected]" <[email protected]>
>
> Hi,
>
> On Fri, Dec 19, 2014 at 10:09 AM, Vihari Piratla
> <[email protected]> wrote:
> > Thanks for the quick response.
> > Some follow up questions
> > Is it essential to annotate entities as "misc" class too?
>
> No, it is not. You choose which classes you want to annotate. The 4
> conll classes is just a classification, but there are others.
>
> > It is usually best to train your own models for the domain data you want
> to
> > annotate,
> > otherwise the performance of the model suffers.
>
> > Isn't it hard to generate accurate 15,000 annotated sentences for every
> > domain data that
> > I wish to recognise? (just want to make sure that I am not missing
> anything)
>
> Sure, domain adaptation is a well-known, hard and unsolved problem :)
> You can try with less data train and see the results, or used models
> trained on already available data and know that performance is not
> going to be ideal. You can also add gazetteers (lists of entities
> perhaps related with the domain you want to annotate), and there are
> other more complex approaches trying to learn (almost from scratch)
> the classifiers (http://www.aclweb.org/anthology/P10-1029).
>
> In my opinion, the easiest would be to annotate some data and try it
> out. If it does not work well, annotate some more and try again.
> OpenNLP also offers direct conversion from the Brat annotation tool
> format to train the models.
>
> HTH,
>
> Rodrigo
>
>
> --
> V
>


-- 
V

Reply via email to