Hi Claus, thank you for responding. The problem we are seeing currently is that, if we provide a regex to the tokenizer to detect token delimiters, the tokenizer inserts that expression literal into the payload itself - while replacing the actual delimiters matched by the regex. I think you will agree that modifying the original payload in any way other than splitting it into chunks is not a desirable behavior.
I think the most natural and logical way would be to correct the existing tokenizer functionality to: 1) Correctly identify the individual tokens by matching the delimiters using the provided regular expression (as it is done today, indeed); b) Ensure that the resulting exchange message body (a group of N tokens) retains the original token separators (vs. them being replaced by the regex literal.) Also, for all it's worth, perhaps it would be helpful to slightly change the terminology in the API documentation. What is currently described as the "token" argument (or "token expression") to the tokenize() method is actually the "token /delimiter/ expression" - the expression that matches the delimiters that separate the tokens in the payload. So, in the case of a file being split into lines or groups of lines, a token represents a line, obviously, not the separator/delimiter. ;) -- View this message in context: http://camel.465427.n5.nabble.com/correct-way-to-provide-regex-in-TokenizerExpression-tp5773192p5773322.html Sent from the Camel - Users mailing list archive at Nabble.com.