Shouldn't Tokenizer Context Generator use more context?

[email protected] Fri, 23 Mar 2012 21:18:48 -0700

Hello,

The Tokenizer context generator receives just the token and a index
pointing to a character inside the token. Shouldn't it be more effective if
it could use a bigger context? It would be useful for example to know if it
is the first token candidate of a sentence, or the last token candidate etc.


Does anyone know why it was implemented this why?

I am trying to figure out how to pass additional information without
breaking compatibility. I don't want to branch either.

Thanks,
William

Shouldn't Tokenizer Context Generator use more context?

Reply via email to