Dear all,

Thanks for the information.


Am 30.10.2013 13:20, schrieb Jörn Kottmann:
On 10/30/2013 12:03 PM, Nils Reiter wrote:
I guess the question is whether a trained model is an “adaptation” of the work according to the license. If that’s the case you’re bound to using creative commons, I think.


I want to publish both: the binary model and the raw, manually annotated texts. The latter is derivated work from Wikipedia, you can still read the articles and just have some annotations in between. So, for that file(s) it will be the original Wikipedia license.


The model does not contain the original texts, it contains the words and bigrams,
but that nothing the original author has a copyright on.


Hhm, thats the point: I know from other contexts, that also trained models from Treebanks have to be under the same condition than the original treebank. So I'm not sure if I'm free to use another license for the binary file. And I don't know whats about the other models on the OpenNLP page: I used the German tokenizer and sentence-detector model, together with the OpenNLP tools. At least, my binary model is a mixture of CC, Apache License and whatever is used for the already existing models.


Any interest to contribute your work back to OpenNLP? It would really be a great start for us to finally have some annotated data as proper Open Source as well. The wikipedia effort can probably
easily be replicated for other language

Yes, of course. I build this model for my own hobby project, but I always had in mind to give it free. I also implemented a graphical user interface for doing manually NE annotation ... all the OpenNLP tools are integrated and now, it can be seen as a generic graphical user interface for OpenNLP. That tool is far away from beeing perfect, but I think I will publish a "beta of a pre-alpha version" the next days :-)

I also found out that the tokenizer and sentence model for German are ... not the best ones. I don't know who did them, but they are lacking some very common features of German texts.

Last not least, I'm working on some converters for the OpenNLP formats, because I need the output beeing TCF. Still don't found the hook in the code if and where that would fit.

Best,

Tom


--
Dr. Thomas Zastrow
Riemerfeldring 7a

85748 Garching
Tel.: 0162 422 8029
www.thomas-zastrow.de

Reply via email to