Dear all,
Thanks for the information.
Am 30.10.2013 13:20, schrieb Jörn Kottmann:
On 10/30/2013 12:03 PM, Nils Reiter wrote:
I guess the question is whether a trained model is an “adaptation” of
the work according to the license. If that’s the case you’re bound to
using creative commons, I think.
I want to publish both: the binary model and the raw, manually annotated
texts. The latter is derivated work from Wikipedia, you can still read
the articles and just have some annotations in between. So, for that
file(s) it will be the original Wikipedia license.
The model does not contain the original texts, it contains the words
and bigrams,
but that nothing the original author has a copyright on.
Hhm, thats the point: I know from other contexts, that also trained
models from Treebanks have to be under the same condition than the
original treebank. So I'm not sure if I'm free to use another license
for the binary file. And I don't know whats about the other models on
the OpenNLP page: I used the German tokenizer and sentence-detector
model, together with the OpenNLP tools. At least, my binary model is a
mixture of CC, Apache License and whatever is used for the already
existing models.
Any interest to contribute your work back to OpenNLP? It would really
be a great start for us
to finally have some annotated data as proper Open Source as well. The
wikipedia effort can probably
easily be replicated for other language
Yes, of course. I build this model for my own hobby project, but I
always had in mind to give it free. I also implemented a graphical user
interface for doing manually NE annotation ... all the OpenNLP tools are
integrated and now, it can be seen as a generic graphical user interface
for OpenNLP. That tool is far away from beeing perfect, but I think I
will publish a "beta of a pre-alpha version" the next days :-)
I also found out that the tokenizer and sentence model for German are
... not the best ones. I don't know who did them, but they are lacking
some very common features of German texts.
Last not least, I'm working on some converters for the OpenNLP formats,
because I need the output beeing TCF. Still don't found the hook in the
code if and where that would fit.
Best,
Tom
--
Dr. Thomas Zastrow
Riemerfeldring 7a
85748 Garching
Tel.: 0162 422 8029
www.thomas-zastrow.de