Coreference Training on MUC 6/7 data

Jörn Kottmann Tue, 27 Mar 2012 08:13:44 -0700

Hi all,

I would like to figure out how the coref component can be trained
on MUC 6 and 7 data. Does anybody know how to do that?


After searching for information in the forum and doing quite some
reverse engineering I think the process is something like this:

1. Load data via MUC plugin into wordfreak.

Getting wordfreak to work is a bit tricky, there seem to be a fewjar files whichare all have the 2.2 version in the name but are quite different. Inow use a self compiled head version.2. Perform Named Entity Recognition via the opennlp plugin (I use itwith opennlp 1.4.3)3. Do Chunking or Parsing (parsing still causes a stack overflow in mysetup, so I only did chunking)4. Save the file to disk (Make sure it is named correctly, wordfreakattaches a .txt which must be removed)

5. Do training with the coref opennlp wordfreak plugin via its main method

But I still have a couple of issues.

Wordfreak saves the linked mentions as "mention" annotations which canthen notbe retrieved by coref code (only looks for noun phrases, a mention isnot a noun phrase).Not sure how this is supposed to work, do I have to write some code tomerge mentions

and the added noun phrases? Or is there some kind of trick I don't know yet?

Parsing in wordfreak does not work because of a stack overflow.

It looks like there is no util to do the actual coref resolution whenonly a shallow parse

was used to train it.

Any hints are very welcome.

Jörn

Coreference Training on MUC 6/7 data

Reply via email to