Hello. I have a requirement to scan for multiple regex patterns in very
large flowfiles. Given that my flowfiles can be very large, I think my best
approach is to employ an ExecuteGroovyScript processor and a script using a
BufferedReader to scan the file one line at a time.
I am concerned that I
Hi James,
in case the NiFi processors such as ExtractText, ReplaceText and
RouteOnContent (maybe multiple in a row/in parallel) do not match your
use case, I'd definitely go with a bufferend reader and line wise
processing. Afaik you can get it as easily as
new File("/path/to/my/file").ea
Jim,
Take a look at RouteText.
Thanks
-Mark
> On Jun 5, 2023, at 8:09 AM, James McMahon wrote:
>
> Hello. I have a requirement to scan for multiple regex patterns in very large
> flowfiles. Given that my flowfiles can be very large, I think my best
> approach is to employ an ExecuteGroovySc
Thank you very much Mark and Lars. Ideally I do prefer to employ standard
"out of the box" processors. In this case my requirement is to identify
bounding dates across all content in the flowfile. As I match my DT
patterns, I'll add the tokens to a groovy list that I can later sort and
use to ident
Hi Jim,
RouteText works in a line-by-line fashion, so that shouldn't exhaust
memory (unless for /very/ long lines). Other processors such as
ReplaceText have the option to choose whether you want to stream lines,
or slurp the whole file at once.
Best,
Lars
On 23-06-05 14:49, James McMahon w