On 07/17/2012 05:03 PM, John Stewart wrote:
OK so per thishttps://issues.apache.org/jira/browse/OPENNLP-54 you're saying that results may improve with the CONLL training set, yes? That definitely seems worth trying to me. Now, what, if any, policies are there about dependencies between OpenNLP modules? I ask because the coref task might benefit from the NE output -- perhaps they are already linked!
The input for coref is this: - Full or shallow parse (depends on how the model was trained) - NER output All this information is encoded into Parse objects and therefore no direct link between the components is necessary. You can see this nicely when you run the command line demo. Yes, we need a corpus to train it on. Maybe OntoNotes would be a good candidate, its affordable to everyone. What do you think? Jörn
