On 07/17/2012 01:55 PM, John Stewart wrote:
Well, my sense is that before much more work on packaging steps are
done, the quality of the output needs to improve.  I'm not sure it's
just a matter of training -- but at this point I'm not at all sure of
what I'm saying.  My*impression*  is that the module needs to
incorporate a bit more knowledge of language in order to increase
recall without over-generating.  Does that make sense?  Also, is there
any documentation on how it works currently?  I would be interested in
helping, time permitting as always.

We do not have documentation. There are some posts on our
mailing list speaking about it, there is a thesis from Thomas Morton
which has a chapter about the coref component.

I would like to at least provide very basic documentation for
the next release.

Do you want to propose some changes or do you have ideas what
we can do to improve the quality of the output?

The coref component was implemented by Tom and we just maintained
it a very bit here, but do not have good knowledge about it, anyway, that
is something that should be changed, and I actually did read and work on
the code while looking into how to add training support to it.

Do you think OntoNotes is a good data set to continue the development?

Jörn

Reply via email to