Hello,

The Tokenizer context generator receives just the token and a index
pointing to a character inside the token. Shouldn't it be more effective if
it could use a bigger context? It would be useful for example to know if it
is the first token candidate of a sentence, or the last token candidate etc.

Does anyone know why it was implemented this why?

I am trying to figure out how to pass additional information without
breaking compatibility. I don't want to branch either.

Thanks,
William

Reply via email to