On 14/03/13 12:00, Andreas Niekler wrote:
you tokenized an example of my already tokenized training data for the maxent tonenizer of open nlp.
the sample you posted was a single string, not tokenised text, otherwise you would have posted a collection of strings (tokens).
I asked about the transformation of those texts as input to the train method of the open nlp tokenizer
yes I know, I just thought that maybe tokenising was a more pressing matter than training a maxent model. If you absolutely *have to* train a model then my reply was in vain indeed...
basically, I just noticed that you're having problems converting the data and I thought that maybe you don't really have to (if there is a regex pattern that does a decent job)...anyway, I guess that wasn't of much help...
Jim
