Please use unencumbered training data for all future OpenNLP projects. What exactly does a coref training dataset have to include? What kind of tagging or cross-referencing?
On Tue, Jul 17, 2012 at 10:59 AM, John Stewart <[email protected]> wrote: > Ah good, I was going to ask about parses too -- so this is done. I'll > start reading the code tonight. > > OntoNotes is smallish, yes? Is the English bit larger than the CoNLL > data set? In terms of cost, isn't it free? > > Thanks, > > jds > > On Tue, Jul 17, 2012 at 11:09 AM, Jörn Kottmann <[email protected]> wrote: >> On 07/17/2012 05:03 PM, John Stewart wrote: >>> >>> OK so per thishttps://issues.apache.org/jira/browse/OPENNLP-54 >>> >>> you're saying that results may improve with the CONLL training set, >>> yes? That definitely seems worth trying to me. Now, what, if any, >>> policies are there about dependencies between OpenNLP modules? I ask >>> because the coref task might benefit from the NE output -- perhaps >>> they are already linked! >> >> >> The input for coref is this: >> - Full or shallow parse (depends on how the model was trained) >> - NER output >> >> All this information is encoded into Parse objects and therefore no >> direct link between the components is necessary. >> You can see this nicely when you run the command line demo. >> >> Yes, we need a corpus to train it on. Maybe OntoNotes would be a good >> candidate, its affordable to everyone. >> >> What do you think? >> >> Jörn >> -- Lance Norskog [email protected]
