Hi Claus,
thank you for responding. The problem we are seeing currently is that, if we
provide a regex to the tokenizer to detect token delimiters, the tokenizer
inserts that expression literal into the payload itself - while replacing
the actual delimiters matched by the regex. I think you will agree that
modifying the original payload in any way other than splitting it into
chunks is not a desirable behavior.

I think the most natural and logical way would be to correct the existing
tokenizer functionality to:

   1) Correctly identify the individual tokens by matching the delimiters
using the provided regular expression (as it is done today, indeed);
   b) Ensure that the resulting exchange message body (a group of N tokens)
retains the original token separators (vs. them being replaced by the regex
literal.) 

Also, for all it's worth, perhaps it would be helpful to slightly change the
terminology in the API documentation. What is currently described as the
"token" argument (or "token expression") to the tokenize() method is
actually the "token /delimiter/ expression" - the expression that matches
the delimiters that separate the tokens in the payload. So, in the case of a
file being split into lines or groups of lines, a token represents a line,
obviously, not the separator/delimiter. ;) 

 



--
View this message in context: 
http://camel.465427.n5.nabble.com/correct-way-to-provide-regex-in-TokenizerExpression-tp5773192p5773322.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Reply via email to