I have played with it some more, and it seems clear that the tokenizer does
NOT support regular expressions as advertised. Moreover, it seems that there
is no way to write a system-agnostic file splitter that groups lines! I may
be wrong, and if so, can anyone PLEASE show me the proper way to do it??
Perhaps, add some real examples to the documentation?

Most examples seem to advocate system-specific tokenizing, e.g. using just
"\n" as a plain String to match a token separator in the payload. But what
if the file is being processed on Unix but was created on Windows? What if
the file has only "\r" separators? (My application has to deal with all
three types: \n, \r\n, \r). 

I have seen examples online that suggest that the regex string to be used
might be "\n|\r\n" or "\n|\r\n|\r". Tried it: it didn't work. Camel creates
bogus lines that contain the '|' characters and inserts those lines into the
exchange. Thank you very much. 

There just has to be an easy way to specify a list of possible token
delimiters, and the best way to do it might be via regular expressions. The
API documentation indicates that it is indeed supported. However, as I have
described, it didn't work for me at all. Any regex-language-specific
characters, such as '|', or '[', ']', etc end up being inserted into the
exchange as part of a junk line "read" from the file.



--
View this message in context: 
http://camel.465427.n5.nabble.com/correct-way-to-provide-regex-in-TokenizerExpression-tp5773192p5773207.html
Sent from the Camel - Users mailing list archive at Nabble.com.

Reply via email to