Re: NE Training + Dictionary?

Jörn Kottmann Thu, 10 Oct 2013 05:41:20 -0700

On 10/10/2013 11:58 AM, Thomas Zastrow wrote:

Hello,
There seems to be no free German NE model available, so I started tothink about creating one - just using free resources like Wikipedia etc.
I still have some questions:
Somewhere in the documnetation, I read about a dictionary driven NErecognizer in OpenNLP. But I didn't found any further informationabout it. Anyway, would it be possible to combine the statisticapproach with dictionaries? For example, having a list of countrynames would be useful.

Yes that is possible, we have a DictionaryFeatureGenerator which canlookup names in a dictionary and produces features for them.There is an xml file you can create to describe how the featuregeneration should be setup for training, the file is then stored in themodelto be able to reproduce the exact same feature generation when the modelis loaded later.


See our documentation:
http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind.training.featuregen

What are the feature you would like to generate via the dictionary?

The Name Finder can be extended with custom feature generators, in caseyou have some ideas or just want to experiment a bit.

As far as I understood, the name finder is at the moment only stablefor one property, like person names. I would like to have thetraditional divison into persons, locations, organizations and misc.When creating manually the training data, would it be OK to add allfour kinds already to the text and then, maybe create later 4 modelsfor the different properties?

The name finder trainer by default trains a model for all name typesoccurring in the training data, the -nameTypes option can reduce theused types

to one or multiple. I often use this, it works great.

The name finder uses as input sentences and tokens. Would it be OK toalso have POS tags assigned to the training data? That would make itmuch easier to manually annotate the data when e.g. NEs are alreadymarked by the POS tagger.

Passing in pos tags is currently not supported by our API. The easiestway to get around that limitation is probably

to run the pos taggger as part of the name finder feature generation.

There is German CONLL training data you could use to train a name findermodel:

http://www.cnts.ua.ac.be/conll2003/ner/

The OpenNLP Name Finder can be directly trained on the CONLL2003 data.

HTH,
Jörn

Re: NE Training + Dictionary?

Reply via email to