Dear all,
I created now a named entity model for German. It is trained on 5.000
manually annotated sentences and performs - not perfect, but its already
usable. I will go on with more texts.
I used only texts from Wikipedia and Wikinews, so in my eyes it
shouldn't be a problem to distribute the model. But I'm not sure which
license would be a good choice: OpenNLP uses the Apache license, but
Wikipedia is Creative Commons. On the other hand, because I have the
"raw" trained data, it would be easy to train other NE detectors with
the data.
The OpenNLP page doesn't say anything about the licences of the models
which can be found there already.
So, what do you think, would be the best license for
a)
a trained model
and
b)
the raw data which is overall Wikipedia content
?
Thanks in advance and best regards,
Tom
--
Dr. Thomas Zastrow
Riemerfeldring 7a
85748 Garching
Tel.: 0162 422 8029
www.thomas-zastrow.de