Hi Tom, if you're looking for a data set, you should have a look at the data set created by Chris Biemann in the context of the WebAnno project. AFAIK it's not publicly available yet, but they plan to release it soon and without any licensing issues.
Best, Nils On 10.10.2013, at 11:58, Thomas Zastrow <[email protected]> wrote: > Hello, > > There seems to be no free German NE model available, so I started to think > about creating one - just using free resources like Wikipedia etc. > > I still have some questions: > > Somewhere in the documnetation, I read about a dictionary driven NE > recognizer in OpenNLP. But I didn't found any further information about it. > Anyway, would it be possible to combine the statistic approach with > dictionaries? For example, having a list of country names would be useful. > > As far as I understood, the name finder is at the moment only stable for one > property, like person names. I would like to have the traditional divison > into persons, locations, organizations and misc. When creating manually the > training data, would it be OK to add all four kinds already to the text and > then, maybe create later 4 models for the different properties? > > The name finder uses as input sentences and tokens. Would it be OK to also > have POS tags assigned to the training data? That would make it much easier > to manually annotate the data when e.g. NEs are already marked by the POS > tagger. > > Thats it for the moment, I'm quite sure I will come back later with more > questions :-) > > Best, > > Tom > > -- > Dr. Thomas Zastrow > Riemerfeldring 7a > > 85748 Garching > Tel.: 0162 422 8029 > www.thomas-zastrow.de > >
