Re: Processing multiple lines per flowfile with ExtractGrok

Bryan Bende Tue, 26 Sep 2017 06:38:38 -0700

Adam,

I'm only a little bit familiar with Grok, but the ExtactGrok processor
reads the entire content of the flow file into memory and then
performs the match with the grok expression against the entire
content, so it seems like this processor wasn't intended to perform
the match line-by-line.

The likely reason is because when you are extracting information into
flow file attributes, typically you are doing this to then make some
kind of routing decision. So for example, if you have a log message in
the flow file content and then you want to extract the log-level
(warn, error, etc), and then route all the logs of a given level
somewhere. This works when there is one log message per flow file, but
doesn't really work when there are thousands of log messages per flow
because then you would get thousands of flow file attributes and it
would be unclear what to route on.

What Joe pointed out with the record processors and the GrokReader is
a different approach where you should be able to avoid splitting up
your data. For example, in the above scenario you could use
PartitionRecord with a GrokReader to separate a flow file of log
messages into a flow file per log-level, without having to split into
thousands of individual flow files.

Hopefully that helps. Let us know if you have any other questions.

-Bryan

On Mon, Sep 25, 2017 at 9:05 PM, Joe Witt <joe.w...@gmail.com> wrote:
> Adam,
>
> I'm not very familiar with that specific processor but I think you'll
> find your case is probably far better handled using the Record
> reader/writer processors anyway.  There is a GrokReader which you can
> use to read each line of a given input as grok expressions to parse
> out key fields against your desired schema.  Then there are writers
> for csv, json, avro, etc..  They provide processors to partition based
> on like records, validate records match expected structure, merge
> records, convert, transfer to/from Kafka, Split records, etc..
>
> Thanks
> Joe
>
> On Mon, Sep 25, 2017 at 6:46 PM, Adam Lamar <adamond...@gmail.com> wrote:
>> Hi there,
>>
>> I've been playing with the ExtractGrok processor and noticed I was missing
>> some data that I expected to be extracted. After some investigation, it
>> seems that ExtractGrok extracts only the first line of the flowfile content,
>> and ignores the rest.
>>
>> Is this expected behavior? I should be able to use SplitText to break up the
>> records, but it surprised me because other grok tools I've used have been
>> line-oriented by default (at least from the perspective of the user).
>>
>> Cheers,
>> Adam

Re: Processing multiple lines per flowfile with ExtractGrok

Reply via email to