Thanks Michael, that is very helpful. I'll discuss with my employers if we may be able to create a subset training set for public distribution.
On Wed, Mar 13, 2013 at 4:34 PM, Michael Schmitz <[email protected]>wrote: > I ran the OpenNlp stock models on wikipedia text at one time. You may be > able to use this. > > https://gist.github.com/3291931 > > If you create superior models without licensing restrictions, please share. > > Peace. Michael > > > On Wed, Mar 13, 2013 at 12:25 PM, John Helmsen < > [email protected]> wrote: > > > Gentlemen and Ladies, > > > > Currently, my group is undertaking a project that involves performing > > english understanding of sentence fragments. While the Apache parser > with > > the pre-trained binary is very good, we anticipate the need to retrain > the > > parser eventually on our own data sets to handle special terms and > > idiosyncrasies that may arise in our particular context. > > > > The best way to retrain the parser is to mix our parsing solutions in > with > > the existing parser training set, so that we enhance the already good > > performance of the parser in the direction of our particular input. > > > > Unfortunately, it seems that the online documentation for > > en-parser-chunking.bin does not include links to the training sets that > > were used. Do any of you good people know what these might be? Thanks! > > > > John Helmsen > > >
