Hi Robert, On Fri, Apr 15, 2016 at 10:25 AM, Robert Logue <[email protected]> wrote: > Hello, > > I have just started using OpenNLP in the java application. I am just getting > my used with the software and have a couple of newbie questions. > > I see for the name finder there is different model data for people and > organizations (en-ner-organization.bin and en-ner-person.bin). Is there any > way to combine these into one file so I can do 1 search that will give me > back person names and organization names. Or is this not possible and is it > best to do two searches?
This used to be experimental. It is not anymore, namely, you can train a name finder model for more than one entity type. The models available were trained with rather old newswire data so I would recommend you to obtain train new models using OpenNLP: http://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.tool I suppose you do not have manually annotated training data so I could recommend to get the Ontonotes corpus. https://catalog.ldc.upenn.edu/LDC2013T19 https://github.com/ontonotes/conll-formatted-ontonotes-5.0 Another option is to get a silver standard corpus obtained automatically from the Wikipedia: http://schwa.org/projects/resources/wiki/Wikiner#Automatic-training-data-from-Wikipedia For Dutch, Spanish, German and Italian (that I know of) there are free resources. Search for Ancora, SONAR-1, GermEval 2014 and Evalita 2009. > This question isn't related to the name finder and I don't think it is > possible but thought I would ask anyway. If I had two sentences say 'Jack > climbed the hill. He was very tired.' Is there any way to know that the > pronoun, he, at the start of the second sentence is actually about Jack the > subject of the first sentence? I know in this simple case it is obvious but I > am wondering if there is anything in the OpenNLP software that will help with > this? The example you mentioned is called "pronominal anaphora" and it generalizes in the coreference resolution problem. There used to be a coreference tool in OpenNLP but got moved to the Sandbox because many things need to be updated to be able to distribute it. See http://conll.cemantix.org/2012/introduction.html for more details. HTH, R
