I have a few questions regarding creating my own training data for the name 
finder. I would like to distinguish between people, organizations and 
locations. The example in the documentation shows the tags to use for people ie

<START:person> Pierre Vinken <END> , 61 years old , will join the board as a 
nonexecutive director Nov. 29 .So would I used <START:organization><END> and 
<START:location><END> for organizations and locations respectively? The name 
entity guidelines in the documentation ie

https://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind.annotation_guides

seem to show different tags getting used which has confused me slightly as to 
which tags I should actually use?

Also I see the 15,000 line recommendation is there any performance hit if you 
use many more lines?

If I create my plain text training file as I outlined above is there any other 
params that are recommended to use beyond the basic ie

opennlp TokenNameFinderTrainer -model OUTPUT_FILE.bin -lang en -data 
TRAINING_FILE.train -encoding UTF-8

For instance what is the -params training parameters file used for? Is this 
necessary should this list the named entities I am looking for ie person, 
organization and location if so what format should it be in?

Sorry for the basic questions here but kind find the answers in the 
documentation or from a quick google.

Thanks,

Robert


> From: [email protected]
> Date: Mon, 18 Apr 2016 09:36:24 +0200
> Subject: Re: Name finder questions
> To: [email protected]
> 
> Hello,
> 
> Yes, that is the idea.
> 
> R
> 
> On Sun, Apr 17, 2016 at 9:10 PM, Robert Logue <[email protected]> wrote:
> > I am slightly confused what I can use the data in those links for? So can I 
> > use this data with the training tool like the following
> >
> > opennlp TokenNameFinderTrainer -model OUTPUT_FILE_NAME -lang en
> > -data DOWNLOADED_FILE_NAME -encoding UTF-8
> > And that should give me a better model file for when I use the name finder?
> >
> > Thanks,
> >
> > Robert
> >
> >> From: [email protected]
> >> Date: Fri, 15 Apr 2016 17:12:20 +0200
> >> Subject: Re: Name finder questions
> >> To: [email protected]
> >>
> >> Hi Robert,
> >>
> >> On Fri, Apr 15, 2016 at 10:25 AM, Robert Logue <[email protected]> 
> >> wrote:
> >> > Hello,
> >> >
> >> > I have just started using OpenNLP in the java application. I am just 
> >> > getting my used with the software and have a couple of newbie questions.
> >> >
> >> > I see for the name finder there is different model data for people and 
> >> > organizations (en-ner-organization.bin and en-ner-person.bin). Is there 
> >> > any way to combine these into one file so I can do 1 search that will 
> >> > give me back person names and organization names. Or is this not 
> >> > possible and is it best to do two searches?
> >>
> >> This used to be experimental. It is not anymore, namely, you can train
> >> a name finder model for more than one entity type. The models
> >> available were trained with rather old newswire data so I would
> >> recommend you to obtain train new models using OpenNLP:
> >>
> >> http://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.tool
> >>
> >> I suppose you do not have manually annotated training data so I could
> >> recommend to get the Ontonotes corpus.
> >>
> >> https://catalog.ldc.upenn.edu/LDC2013T19
> >>
> >> https://github.com/ontonotes/conll-formatted-ontonotes-5.0
> >>
> >> Another option is to get a silver standard corpus obtained
> >> automatically from the Wikipedia:
> >>
> >> http://schwa.org/projects/resources/wiki/Wikiner#Automatic-training-data-from-Wikipedia
> >>
> >> For Dutch, Spanish, German and Italian (that I know of) there are free
> >> resources. Search for Ancora, SONAR-1, GermEval 2014 and Evalita 2009.
> >>
> >> > This question isn't related to the name finder and I don't think it is 
> >> > possible but thought I would ask anyway. If I had two sentences say 
> >> > 'Jack climbed the hill. He was very tired.' Is there any way to know 
> >> > that the pronoun, he, at the start of the second sentence is actually 
> >> > about Jack the subject of the first sentence? I know in this simple case 
> >> > it is obvious but I am wondering if there is anything in the OpenNLP 
> >> > software that will help with this?
> >>
> >> The example you mentioned is called "pronominal anaphora" and it
> >> generalizes in the coreference resolution problem. There used to be a
> >> coreference tool in OpenNLP but got moved to the Sandbox because many
> >> things need to be updated to be able to distribute it.
> >>
> >> See http://conll.cemantix.org/2012/introduction.html for more details.
> >>
> >> HTH,
> >>
> >> R
> >
                                          

Reply via email to