Re: obtaining data used to train OpenNLP models

Aditya Kulkarni Wed, 02 Apr 2014 02:53:29 -0700

+1
This question is not answered for me too.
Should be great help to get it answered.


-aditya
 On Apr 1, 2014 12:38 AM, "Stuart Robinson" <[email protected]>
wrote:

> I've tried using the tokenizer model for English provided by OpenNLP:
>
> http://opennlp.sourceforge.net/models-1.5/en-token.bin
>
> It's listed here, where it's described as "Trained on opennnlp training
> data":
>
> http://opennlp.sourceforge.net/models-1.5/
>
> It works pretty well but I'm working on some social media text that has
> some non-standard punctuation. For example, it's not uncommon for words to
> be separated by a series of punctuation characters, like so:
>
> oooh,,,,go away fever and flu
>
> I want to train up a new model using text like this but don't want to start
> entirely from scratch. Is the training data for this model available from
> OpenNLP? If so, I could experiment with supplementing its training data. It
> seems like sharing training data, and not just trained models, could be a
> great service.
>
> Thanks,
> Stuart Robinson
>

Re: obtaining data used to train OpenNLP models

Reply via email to