I've been using OpenNLP for a few years and I find the best results occur
when the models are generated using samples of the data they will be run
against, one of the reasons I like the Maxent approach. I am not sure
attempting to provide models will bear much fruit other than users will no
longer be afraid of the licensing issues associated with using them in
commercial systems. I do strongly think we should provide a modelbuilding
framework (that calls the training api) and a default impl.
Coincidentally....I have been building a framework and impl over the last
few months that creates models based on seeding an iterative process with
known entities and iterating through a set of supplied sentences to
recursively create annotations, write them, create a maxentmodel, load the
model, create more annotations based on the results (there is a validation
object involved), and so on.... With this method I was able to create an
NER model for people's names against a 200K sentence corpus that returns
acceptable results just by starting with a list of five highly unambiguous
names. I will propose the framework in more detail in the coming days and
supply my impl if everyone is interested.
As for the initial question, I would like to see OpenNLP provide a
framework for rapidly/semi-automatically building models out of user data,
and also performing entity resolution across documents, in order to assign
a probability to whether the "Bob" in one document is the same as "Bob" in
another.
MG


On Tue, Oct 1, 2013 at 11:01 AM, Michael Schmitz
<[email protected]>wrote:

> Hi, I've used OpenNLP for a few years--in particular the chunker, POS
> tagger, and tokenizer.  We're grateful for a high performance library
> with an Apache license, but one of our greatest complaints is the
> quality of the models.  Yes--we're aware we can train our own--but
> most people are looking for something that is good enough out of the
> box (we aim for this with out products).  I'm not surprised that
> volunteer engineers don't want to spend their time annotating data ;-)
>
> I'm curious what other people see as the biggest shortcomings for Open
> NLP or the most important next steps for OpenNlp.  I may have an
> opportunity to contribute to the project and I'm trying to figure out
> where the community thinks the biggest impact could be made.
>
> Peace.
> Michael Schmitz
>

Reply via email to