Ok, here's the workaround I have implemented to go past the above issue... Some MyConstants.java file:
public static final String SYSTEM_AGNOSTIC_NEWLINE_REGEX = "\r|\r\n|\n"; Splitter route configuration in a RouteBuilder implementation: TokenizerExpression tokenizerExpression = new TokenizerExpression(); tokenizerExpression.setToken(MyConstants.SYSTEM_AGNOSTIC_NEWLINE_REGEX); // tokenize by line separators tokenizerExpression.setGroup(readerConfig.getLinesPerChunk());// group so many lines into one exchange tokenizerExpression.setRegex(true); // indicate that it is a regular expression, not simple string match from(FILE_SPLITTER_ENDPOINT).routeId("fileSplitterRoute"). split(tokenizerExpression). streaming(). // enable streaming vs. reading all into memory parallelProcessing(readerConfig.isParallelProcessing()). // on/off concurrent processing of multiple chunks stopOnException(). // stop processing file if system exception occurs (handled by onException clause) *bean(new TokenizerCharRemover())*. // cleans junk chars inserted by Camel's tokenizer due to bug(?) unmarshal().csv(). // unmarshal each chunk to Java (list of String lists) using Camel's CSV component bean(csvHandler). // hand each unmarshalled list of lines/fields to bean that parses and validates line content bean(importProcessor).// process codes for import (depending on operational mode and errors in exchange) to(AGGREGATE_ERRORS_ENDPOINT). // delegate to nested route to update error report end(); TokenizerCharRemover.java: public class TokenizerCharRemover { /** * Pre-compiled regex pattern to match the instances of character sequences of the regular expression inserted by * Camel's splitter's tokenizer between the file lines in the body of the exchange. The input string that specifies * the pattern is treated as a sequence of literal characters thanks to the {@link Pattern#LITERAL} flag. */ private static final Pattern REPLACE_JUNK_PATTERN = Pattern.compile(MyConstants.SYSTEM_AGNOSTIC_NEWLINE_REGEX, Pattern.LITERAL); /** * Replaces every instance of the {@link FileContext#SYSTEM_AGNOSTIC_NEWLINE_REGEX} character sequence in the * exchange body with a simple '\n' line separator. */ @SuppressWarnings("MethodMayBeStatic") @Handler public void cleanupLineSeparators(Exchange exchange) { String newBody = REPLACE_JUNK_PATTERN.matcher(exchange.getIn().getBody(String.class)) .replaceAll(Matcher.quoteReplacement("\n")); exchange.getIn().setBody(newBody); } } If there is a better solution, or if I have missed some obvious simple way to use the tokenizer that does not replace the matching line separators with the regex character sequence itself, please let me know! I'd very much appreciate that. -- View this message in context: http://camel.465427.n5.nabble.com/correct-way-to-provide-regex-in-TokenizerExpression-tp5773192p5773221.html Sent from the Camel - Users mailing list archive at Nabble.com.