Hi Jörn, Are all the original training corpora MUC? And would you mind providing a list of which MUC corpora you used or all of them? I am thinking of getting them from MUC if you guys didn't make customized changes to those corpora.
Best, Yuan On Mon, Aug 20, 2012 at 3:44 AM, Jörn Kottmann <[email protected]> wrote: > On 08/17/2012 08:15 AM, Sam Li wrote: >> >> Right now I'm using the English sentence model provided on sourceforge. I >> would like to append additional data to it. >> But this means I need the original source of the model, right? If so, how >> do I get that? > > > The orginial data is copyright protected, its data from the MUC corpus, so > we cannot distribute it > with OpenNLP. But you can use other English resources for training. > You need data which is sentence segmented, such as CONLL2000 for example. > > Jörn
