I see. Would love to see a feature that allows additive training that doesn't require having the original corpus. That way in the future if there are new special cases that users want to add to the model, it would be easier. What do you think?
Is there something like this type of feature in the works? -Sam On Aug 20, 2012, at 3:44 PM, Jörn Kottmann <[email protected]> wrote: > On 08/17/2012 08:15 AM, Sam Li wrote: >> Right now I'm using the English sentence model provided on sourceforge. I >> would like to append additional data to it. >> But this means I need the original source of the model, right? If so, how do >> I get that? > > The orginial data is copyright protected, its data from the MUC corpus, so we > cannot distribute it > with OpenNLP. But you can use other English resources for training. > You need data which is sentence segmented, such as CONLL2000 for example. > > Jörn
