Hi all,

I would like to figure out how the coref component can be trained
on MUC 6 and 7 data. Does anybody know how to do that?

After searching for information in the forum and doing quite some
reverse engineering I think the process is something like this:

1. Load data via MUC plugin into wordfreak.
Getting wordfreak to work is a bit tricky, there seem to be a few jar files which are all have the 2.2 version in the name but are quite different. I now use a self compiled head version. 2. Perform Named Entity Recognition via the opennlp plugin (I use it with opennlp 1.4.3) 3. Do Chunking or Parsing (parsing still causes a stack overflow in my setup, so I only did chunking) 4. Save the file to disk (Make sure it is named correctly, wordfreak attaches a .txt which must be removed)
5. Do training with the coref opennlp wordfreak plugin via its main method

But I still have a couple of issues.

Wordfreak saves the linked mentions as "mention" annotations which can then not be retrieved by coref code (only looks for noun phrases, a mention is not a noun phrase). Not sure how this is supposed to work, do I have to write some code to merge mentions
and the added noun phrases? Or is there some kind of trick I don't know yet?

Parsing in wordfreak does not work because of a stack overflow.

It looks like there is no util to do the actual coref resolution when only a shallow parse
was used to train it.

Any hints are very welcome.

Jörn

Reply via email to