
There's a really good blog by the man himself [1] :) I highly recommend the
official blog in general, lots of great posts and many are record-oriented



On Wed, Feb 24, 2021 at 5:57 PM Greene (US), Geoffrey N <> wrote:

> Thank you for the fast response Mark.
> Hrm, record processing does sound useful.
> Are there any good blogs / documentation on this?  I’d really like to
> learn more.  I’ve been doing mostly text processing, as you’ve observed.
> My use case is something like this
> 1)      Use server API to get list of sensors
> 2)      Use server API to get list of jobs
> 3)      For each job, get count of frames/job.  There are up to 6 sets of
> frames/job, depending on which set you query for.
> 4)      Frames can only be queried 50 at a time, so for each 50, get set
> of time stamps of a frame
> 5)      For each time stamp, query for all sensor values at that time.
> These have to be queried one-at-a-time., because of the way the API works –
> one sensor value can only be given for one time.
> 6)      Glue all the data associated with that job (frame times, sensor
> readings,etc) together and paste in a big json. (There are more steps after
> that.
> Geoffrey Greene
> Associate Technical Fellow/Senior Software Ninjaneer
> (703) 414 2421
> The Boeing Company
> *From:* Mark Payne []
> *Sent:* Wednesday, February 24, 2021 5:20 PM
> *To:*
> *Subject:* [EXTERNAL] Re: some questions about splits
> EXT email: be mindful of links/attachments.
> Geoffrey,
> At a high level, if you’re splitting multiple times and then trying to
> re-assemble everything, then yes I think your thought process is correct.
> But you’ve no doubt seen how complex and cumbersome this approach can be.
> It can also result in extremely poor performance. So much so that when I
> began creating a series of YouTube videos on NiFi Anti-Patterns, the first
> anti-pattern that I covered was the splitting and re-merging of data [1].
> Generally, this should be an absolute last resort, and Record-oriented
> processors should be used instead of splitting the data up and re-merging
> it. If you need to perform REST calls, you could do that with LookupRecord,
> and either use the RESTLookupService or if that doesn’t fit the bill
> exactly you could actually use the ScriptedLookupService and write a small
> script in Groovy or Python that would perform the REST call for you and
> return the results. Or perhaps the ScriptedTransformRecord would be more
> appropriate - hard to tell without knowing the exact use case.
> Obviously, your mileage may vary, but switching the data flow to use
> record-oriented processors, if possible, would typically yield a flow that
> is much simpler and yield throughput that is at least an order of magnitude
> better.
> But if for whatever reason you do end up being stuck with the split/merge
> approach - the key would likely be to consider backpressure heavily. If you
> have backpressure set to 10,000 FlowFiles (the default) and then you’re
> trying to merge together data, but the data comes from many different
> upstream splits, you can certainly end up in a situation like this, where
> you don’t have all of the data from a given ’split’ queued up. for
> MergeContent.
> Hope this helps!
> -Mark
> [1]
> On Feb 24, 2021, at 4:59 PM, Greene (US), Geoffrey N <
>> wrote:
> Im having some trouble with multiple splits/merges.  Here’s the idea:
> Big data -> split 1->Save all the fragment.*attributes into variables ->
> split 2-> save all the fragment.* attributes
>     |
> Split 1
>    |
> Save fragment.* attributes into split1.fragment.*
> |
> Split 2
> |
> Save fragment.* attributes into split2.fragment.* attributes
> |
> (More processing)
> |
> Split 3
> |
> Save fragment.* attributes into split3.fragment.* attributes
> |
> (other stuff)
> |
> Restore split3.fragment.* attributes to fragment.*
> |
> Merge3, using defragment strategy
> |
> Restore split2.fragment.* attributes to fragment.*
> |
> Merge 2 using defragment strategy
> |
> Restore split1.frragment.* attributes to fragment.*
> |
> Merge 1 using defragment strategy
> Am I thinking about this correctly?  It seems like sometimes, nifi is
> unable to do a merge on some of the split data (errors like “there are 50
> fragments, but we only found one).  Is it possible that I need to do some
> prioritization in the queues? I have noticed that my things do back up and
> the queues seem to fill up as its going through (several of the splits need
> to perform rest calls and processing, which can take time.  Maybe the issue
> is that one fragment “slips” through, before the others have even been
> processed far enough.  Is there an approved way to do this?
> Thanks for the help!

Reply via email to