Re: Next Steps for OpenNLP

Michael Schmitz Wed, 02 Oct 2013 08:33:04 -0700

In an effective coreference module?  There's extreme interest from my
research group.  But it's a hard problem and we're dissatisfied with
many of the existing systems.


Peace.  Michael

On Tue, Oct 1, 2013 at 6:59 PM, John Stewart <[email protected]> wrote:
> Do we know if there's live interest in using the coref module -- which
> seems like abandonware?  (I've asked this before but I still don't have a
> sense of the level of interest).
>
> jds
>
>
> On Tue, Oct 1, 2013 at 8:06 PM, Mark G <[email protected]> wrote:
>
>> I've been using OpenNLP for a few years and I find the best results occur
>> when the models are generated using samples of the data they will be run
>> against, one of the reasons I like the Maxent approach. I am not sure
>> attempting to provide models will bear much fruit other than users will no
>> longer be afraid of the licensing issues associated with using them in
>> commercial systems. I do strongly think we should provide a modelbuilding
>> framework (that calls the training api) and a default impl.
>> Coincidentally....I have been building a framework and impl over the last
>> few months that creates models based on seeding an iterative process with
>> known entities and iterating through a set of supplied sentences to
>> recursively create annotations, write them, create a maxentmodel, load the
>> model, create more annotations based on the results (there is a validation
>> object involved), and so on.... With this method I was able to create an
>> NER model for people's names against a 200K sentence corpus that returns
>> acceptable results just by starting with a list of five highly unambiguous
>> names. I will propose the framework in more detail in the coming days and
>> supply my impl if everyone is interested.
>> As for the initial question, I would like to see OpenNLP provide a
>> framework for rapidly/semi-automatically building models out of user data,
>> and also performing entity resolution across documents, in order to assign
>> a probability to whether the "Bob" in one document is the same as "Bob" in
>> another.
>> MG
>>
>>
>> On Tue, Oct 1, 2013 at 11:01 AM, Michael Schmitz
>> <[email protected]>wrote:
>>
>> > Hi, I've used OpenNLP for a few years--in particular the chunker, POS
>> > tagger, and tokenizer.  We're grateful for a high performance library
>> > with an Apache license, but one of our greatest complaints is the
>> > quality of the models.  Yes--we're aware we can train our own--but
>> > most people are looking for something that is good enough out of the
>> > box (we aim for this with out products).  I'm not surprised that
>> > volunteer engineers don't want to spend their time annotating data ;-)
>> >
>> > I'm curious what other people see as the biggest shortcomings for Open
>> > NLP or the most important next steps for OpenNlp.  I may have an
>> > opportunity to contribute to the project and I'm trying to figure out
>> > where the community thinks the biggest impact could be made.
>> >
>> > Peace.
>> > Michael Schmitz
>> >
>>

Re: Next Steps for OpenNLP

Reply via email to