Hi Vincent,
The log (regex) plugin uses newlines a the delimiter between records and so it
cannot currently handle newlines within a record. That is, the plugin really
only works for single-line messages, or cases in which we want to ignore all
but the header line (say).
If you are up for a Java coding effort, you could modify the plugin to take
another config parameter which is the record delimiter. (The text (CSV) plugin
already does this.) You would need a unique marker that gives a context-free
record split. The project would welcome such a contribution. If you made such
an enhancement, you could have the plugin look for, say, double-newline as the
record delimiter.
Recall that Drill works with HDFS files. Each scan operator may be given a
block of a file. When reading the second or later block of a file, the reader
must scan forward to find the start of the next record using the record
delimiter.
For now, I'd suggest transforming your file to replace newlines with some other
character, and replace any existing record delimiter with newline. Then you can
use the log (regex) plugin.
That is:
[Thu May 2 00:17:50 2019]Local/ACTUAL///1/Info(1200450)
External [GLOBAL] macro [@PHASE_INPUT] registered OK
[Thu May 2 00:17:50 2019]Local/ACTUAL///1/Info(1019008)
Reading Application Definition For [ACTUAL]
Becomes:
[Thu May 2 00:17:50 2019]Local/ACTUAL///1/Info(1200450)|External [GLOBAL]
macro [@PHASE_INPUT] registered OK[Thu May 2 00:17:50
2019]Local/ACTUAL///1/Info(1019008)|Reading Application Definition For [ACTUAL]
Maybe Charles has a better idea?
Thanks,
- Paul
On Tuesday, July 23, 2019, 05:11:26 AM PDT, Vincent BENATIER
<[email protected]> wrote:
Hi all,
I was if the logfile plugin can handle multiline parsing ?
When I try my regex syntax online, it works well but I seems that the
"\\r\\n" are note recognized when trying to configure a logfile plugin in
Apache Drill.
Or perhaps I there another way to do but I could not find anything in the
documentation or in the "Learning Apache Drill" book.
Someone could help ?
Vincent
Regex syntaxes I tried
--------------------------
"(\\[.+\\])(.+\\r\\n)(.+)"
"(\\[.+\\])(.+)(\\r\\n.+)"
"(\\[.+\\])(.+) \\r\\n (.+)"
File sample
--------------
[Thu May 2 00:17:50 2019]Local/ACTUAL///1/Info(1200450)
External [GLOBAL] macro [@PHASE_INPUT] registered OK
[Thu May 2 00:17:50 2019]Local/ACTUAL///1/Info(1019008)
Reading Application Definition For [ACTUAL]
[Thu May 2 00:17:50 2019]Local/ACTUAL///1/Info(1019009)
Reading Database Definition For [Actual]
[Thu May 2 00:17:50 2019]Local/ACTUAL///1/Info(1019021)
Reading Database Mapping For [ACTUAL]
[Thu May 2 00:17:50 2019]Local/ACTUAL///1/Info(1019010)
Writing Application Definition For [ACTUAL]
[Thu May 2 00:17:50 2019]Local/ACTUAL///1/Info(1019011)
Writing Database Definition For [Actual]