I am slightly confused what I can use the data in those links for? So can I use this data with the training tool like the following
opennlp TokenNameFinderTrainer -model OUTPUT_FILE_NAME -lang en -data DOWNLOADED_FILE_NAME -encoding UTF-8 And that should give me a better model file for when I use the name finder? Thanks, Robert > From: [email protected] > Date: Fri, 15 Apr 2016 17:12:20 +0200 > Subject: Re: Name finder questions > To: [email protected] > > Hi Robert, > > On Fri, Apr 15, 2016 at 10:25 AM, Robert Logue <[email protected]> wrote: > > Hello, > > > > I have just started using OpenNLP in the java application. I am just > > getting my used with the software and have a couple of newbie questions. > > > > I see for the name finder there is different model data for people and > > organizations (en-ner-organization.bin and en-ner-person.bin). Is there any > > way to combine these into one file so I can do 1 search that will give me > > back person names and organization names. Or is this not possible and is it > > best to do two searches? > > This used to be experimental. It is not anymore, namely, you can train > a name finder model for more than one entity type. The models > available were trained with rather old newswire data so I would > recommend you to obtain train new models using OpenNLP: > > http://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.tool > > I suppose you do not have manually annotated training data so I could > recommend to get the Ontonotes corpus. > > https://catalog.ldc.upenn.edu/LDC2013T19 > > https://github.com/ontonotes/conll-formatted-ontonotes-5.0 > > Another option is to get a silver standard corpus obtained > automatically from the Wikipedia: > > http://schwa.org/projects/resources/wiki/Wikiner#Automatic-training-data-from-Wikipedia > > For Dutch, Spanish, German and Italian (that I know of) there are free > resources. Search for Ancora, SONAR-1, GermEval 2014 and Evalita 2009. > > > This question isn't related to the name finder and I don't think it is > > possible but thought I would ask anyway. If I had two sentences say 'Jack > > climbed the hill. He was very tired.' Is there any way to know that the > > pronoun, he, at the start of the second sentence is actually about Jack the > > subject of the first sentence? I know in this simple case it is obvious but > > I am wondering if there is anything in the OpenNLP software that will help > > with this? > > The example you mentioned is called "pronominal anaphora" and it > generalizes in the coreference resolution problem. There used to be a > coreference tool in OpenNLP but got moved to the Sandbox because many > things need to be updated to be able to distribute it. > > See http://conll.cemantix.org/2012/introduction.html for more details. > > HTH, > > R
