On 07/17/2012 01:55 PM, John Stewart wrote:
Well, my sense is that before much more work on packaging steps are done, the quality of the output needs to improve. I'm not sure it's just a matter of training -- but at this point I'm not at all sure of what I'm saying. My*impression* is that the module needs to incorporate a bit more knowledge of language in order to increase recall without over-generating. Does that make sense? Also, is there any documentation on how it works currently? I would be interested in helping, time permitting as always.
We do not have documentation. There are some posts on our mailing list speaking about it, there is a thesis from Thomas Morton which has a chapter about the coref component. I would like to at least provide very basic documentation for the next release. Do you want to propose some changes or do you have ideas what we can do to improve the quality of the output? The coref component was implemented by Tom and we just maintained it a very bit here, but do not have good knowledge about it, anyway, that is something that should be changed, and I actually did read and work on the code while looking into how to add training support to it. Do you think OntoNotes is a good data set to continue the development? Jörn
