Adam, I'm only a little bit familiar with Grok, but the ExtactGrok processor reads the entire content of the flow file into memory and then performs the match with the grok expression against the entire content, so it seems like this processor wasn't intended to perform the match line-by-line.
The likely reason is because when you are extracting information into flow file attributes, typically you are doing this to then make some kind of routing decision. So for example, if you have a log message in the flow file content and then you want to extract the log-level (warn, error, etc), and then route all the logs of a given level somewhere. This works when there is one log message per flow file, but doesn't really work when there are thousands of log messages per flow because then you would get thousands of flow file attributes and it would be unclear what to route on. What Joe pointed out with the record processors and the GrokReader is a different approach where you should be able to avoid splitting up your data. For example, in the above scenario you could use PartitionRecord with a GrokReader to separate a flow file of log messages into a flow file per log-level, without having to split into thousands of individual flow files. Hopefully that helps. Let us know if you have any other questions. -Bryan On Mon, Sep 25, 2017 at 9:05 PM, Joe Witt <joe.w...@gmail.com> wrote: > Adam, > > I'm not very familiar with that specific processor but I think you'll > find your case is probably far better handled using the Record > reader/writer processors anyway. There is a GrokReader which you can > use to read each line of a given input as grok expressions to parse > out key fields against your desired schema. Then there are writers > for csv, json, avro, etc.. They provide processors to partition based > on like records, validate records match expected structure, merge > records, convert, transfer to/from Kafka, Split records, etc.. > > Thanks > Joe > > On Mon, Sep 25, 2017 at 6:46 PM, Adam Lamar <adamond...@gmail.com> wrote: >> Hi there, >> >> I've been playing with the ExtractGrok processor and noticed I was missing >> some data that I expected to be extracted. After some investigation, it >> seems that ExtractGrok extracts only the first line of the flowfile content, >> and ignores the rest. >> >> Is this expected behavior? I should be able to use SplitText to break up the >> records, but it surprised me because other grok tools I've used have been >> line-oriented by default (at least from the perspective of the user). >> >> Cheers, >> Adam