Hi, When tokenizing a string of text, is there also a way to track the index (of the original text) where the token begins?
For example: "Mary didn't kiss John" [(Mary, 0), (did, 5), (n't, 8), (kiss, 12), (John, 17)] If there is a way to extract the 0, 5, 8, 12 and 17 from somewhere, that would be great. I cannot rely on whitespace, since the tokenizer sometimes breaks up words. Thanks, Adam
