correct way to provide regex in TokenizerExpression?

furchess123 Thu, 29 Oct 2015 06:13:55 -0700

What is the correct way to supply the regular expression in
TokenizerExpression?


Per Claus's advise, I have tried the following to tokenize a file by lines
while grouping lines - using a regex to support more than one type of line
separators:

        TokenizerExpression tokenizerExpression = new TokenizerExpression();
        tokenizerExpression.setToken("\n|\r\n|\r");  // tokenize by line
separators (system-agnostic)
        tokenizerExpression.setGroup(500);// group so many lines into one
exchange
        tokenizerExpression.setRegex(true); 

        ...
        split(tokenizerExpression)...

The file gets split into lines that are grouped by 500, except every other
line in the group is not an actual line from the file, but *a line with the
single character '|'*.  The regex seems correct, but the Camel tokenizer
misinterprets it and adds a bogus line for every '|', which is part of the
regex language. 

I have tried various ways to write a regex, but the tokenizer always seems
to not parse it correctly and adds lines to the exchange that contain
nothing but regex language characters.

How do I provide a regex for the tokenizer to properly interpret it?
Specifically, the regex I am trying to use.

Thanks!



--
View this message in context: 
http://camel.465427.n5.nabble.com/correct-way-to-provide-regex-in-TokenizerExpression-tp5773192.html
Sent from the Camel - Users mailing list archive at Nabble.com.

correct way to provide regex in TokenizerExpression?

Reply via email to