OK so per this https://issues.apache.org/jira/browse/OPENNLP-54 you're saying that results may improve with the CONLL training set, yes? That definitely seems worth trying to me. Now, what, if any, policies are there about dependencies between OpenNLP modules? I ask because the coref task might benefit from the NE output -- perhaps they are already linked!
jds On Tue, Jul 17, 2012 at 8:04 AM, Jörn Kottmann <[email protected]> wrote: > On 07/17/2012 01:55 PM, John Stewart wrote: >> >> Well, my sense is that before much more work on packaging steps are >> done, the quality of the output needs to improve. I'm not sure it's >> just a matter of training -- but at this point I'm not at all sure of >> what I'm saying. My*impression* is that the module needs to >> >> incorporate a bit more knowledge of language in order to increase >> recall without over-generating. Does that make sense? Also, is there >> any documentation on how it works currently? I would be interested in >> helping, time permitting as always. > > > We do not have documentation. There are some posts on our > mailing list speaking about it, there is a thesis from Thomas Morton > which has a chapter about the coref component. > > I would like to at least provide very basic documentation for > the next release. > > Do you want to propose some changes or do you have ideas what > we can do to improve the quality of the output? > > The coref component was implemented by Tom and we just maintained > it a very bit here, but do not have good knowledge about it, anyway, that > is something that should be changed, and I actually did read and work on > the code while looking into how to add training support to it. > > Do you think OntoNotes is a good data set to continue the development? > > Jörn >
