In an effective coreference module? There's extreme interest from my research group. But it's a hard problem and we're dissatisfied with many of the existing systems.
Peace. Michael On Tue, Oct 1, 2013 at 6:59 PM, John Stewart <[email protected]> wrote: > Do we know if there's live interest in using the coref module -- which > seems like abandonware? (I've asked this before but I still don't have a > sense of the level of interest). > > jds > > > On Tue, Oct 1, 2013 at 8:06 PM, Mark G <[email protected]> wrote: > >> I've been using OpenNLP for a few years and I find the best results occur >> when the models are generated using samples of the data they will be run >> against, one of the reasons I like the Maxent approach. I am not sure >> attempting to provide models will bear much fruit other than users will no >> longer be afraid of the licensing issues associated with using them in >> commercial systems. I do strongly think we should provide a modelbuilding >> framework (that calls the training api) and a default impl. >> Coincidentally....I have been building a framework and impl over the last >> few months that creates models based on seeding an iterative process with >> known entities and iterating through a set of supplied sentences to >> recursively create annotations, write them, create a maxentmodel, load the >> model, create more annotations based on the results (there is a validation >> object involved), and so on.... With this method I was able to create an >> NER model for people's names against a 200K sentence corpus that returns >> acceptable results just by starting with a list of five highly unambiguous >> names. I will propose the framework in more detail in the coming days and >> supply my impl if everyone is interested. >> As for the initial question, I would like to see OpenNLP provide a >> framework for rapidly/semi-automatically building models out of user data, >> and also performing entity resolution across documents, in order to assign >> a probability to whether the "Bob" in one document is the same as "Bob" in >> another. >> MG >> >> >> On Tue, Oct 1, 2013 at 11:01 AM, Michael Schmitz >> <[email protected]>wrote: >> >> > Hi, I've used OpenNLP for a few years--in particular the chunker, POS >> > tagger, and tokenizer. We're grateful for a high performance library >> > with an Apache license, but one of our greatest complaints is the >> > quality of the models. Yes--we're aware we can train our own--but >> > most people are looking for something that is good enough out of the >> > box (we aim for this with out products). I'm not surprised that >> > volunteer engineers don't want to spend their time annotating data ;-) >> > >> > I'm curious what other people see as the biggest shortcomings for Open >> > NLP or the most important next steps for OpenNlp. I may have an >> > opportunity to contribute to the project and I'm trying to figure out >> > where the community thinks the biggest impact could be made. >> > >> > Peace. >> > Michael Schmitz >> > >>
