I have played with it some more, and it seems clear that the tokenizer does NOT support regular expressions as advertised. Moreover, it seems that there is no way to write a system-agnostic file splitter that groups lines! I may be wrong, and if so, can anyone PLEASE show me the proper way to do it?? Perhaps, add some real examples to the documentation?
Most examples seem to advocate system-specific tokenizing, e.g. using just "\n" as a plain String to match a token separator in the payload. But what if the file is being processed on Unix but was created on Windows? What if the file has only "\r" separators? (My application has to deal with all three types: \n, \r\n, \r). I have seen examples online that suggest that the regex string to be used might be "\n|\r\n" or "\n|\r\n|\r". Tried it: it didn't work. Camel creates bogus lines that contain the '|' characters and inserts those lines into the exchange. Thank you very much. There just has to be an easy way to specify a list of possible token delimiters, and the best way to do it might be via regular expressions. The API documentation indicates that it is indeed supported. However, as I have described, it didn't work for me at all. Any regex-language-specific characters, such as '|', or '[', ']', etc end up being inserted into the exchange as part of a junk line "read" from the file. -- View this message in context: http://camel.465427.n5.nabble.com/correct-way-to-provide-regex-in-TokenizerExpression-tp5773192p5773207.html Sent from the Camel - Users mailing list archive at Nabble.com.