On 05/23/2013 02:56 PM, Яков Керанчук wrote:
Thanks for suggestion with own model, I'll try

I use standard en-token.bin model, text contains mixed upper-lower case
words.

For the english model you should use the SimpleTokenizer, the token output
from the en-token.bin model is not compatible with the training data.

Jörn

Reply via email to