Geoffrey, There's a really good blog by the man himself [1] :) I highly recommend the official blog in general, lots of great posts and many are record-oriented [2]
Regards, Matt [1] https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi [2] https://blogs.apache.org/nifi/ On Wed, Feb 24, 2021 at 5:57 PM Greene (US), Geoffrey N < geoffrey.n.gre...@boeing.com> wrote: > Thank you for the fast response Mark. > > > > Hrm, record processing does sound useful. > > > > Are there any good blogs / documentation on this? I’d really like to > learn more. I’ve been doing mostly text processing, as you’ve observed. > > > > My use case is something like this > > 1) Use server API to get list of sensors > > 2) Use server API to get list of jobs > > 3) For each job, get count of frames/job. There are up to 6 sets of > frames/job, depending on which set you query for. > > 4) Frames can only be queried 50 at a time, so for each 50, get set > of time stamps of a frame > > 5) For each time stamp, query for all sensor values at that time. > These have to be queried one-at-a-time., because of the way the API works – > one sensor value can only be given for one time. > > 6) Glue all the data associated with that job (frame times, sensor > readings,etc) together and paste in a big json. (There are more steps after > that. > > > > > > Geoffrey Greene > > Associate Technical Fellow/Senior Software Ninjaneer > > (703) 414 2421 > > The Boeing Company > > > > *From:* Mark Payne [mailto:marka...@hotmail.com] > *Sent:* Wednesday, February 24, 2021 5:20 PM > *To:* users@nifi.apache.org > *Subject:* [EXTERNAL] Re: some questions about splits > > > > EXT email: be mindful of links/attachments. > > > > > Geoffrey, > > > > At a high level, if you’re splitting multiple times and then trying to > re-assemble everything, then yes I think your thought process is correct. > But you’ve no doubt seen how complex and cumbersome this approach can be. > It can also result in extremely poor performance. So much so that when I > began creating a series of YouTube videos on NiFi Anti-Patterns, the first > anti-pattern that I covered was the splitting and re-merging of data [1]. > > > > Generally, this should be an absolute last resort, and Record-oriented > processors should be used instead of splitting the data up and re-merging > it. If you need to perform REST calls, you could do that with LookupRecord, > and either use the RESTLookupService or if that doesn’t fit the bill > exactly you could actually use the ScriptedLookupService and write a small > script in Groovy or Python that would perform the REST call for you and > return the results. Or perhaps the ScriptedTransformRecord would be more > appropriate - hard to tell without knowing the exact use case. > > > > Obviously, your mileage may vary, but switching the data flow to use > record-oriented processors, if possible, would typically yield a flow that > is much simpler and yield throughput that is at least an order of magnitude > better. > > > > But if for whatever reason you do end up being stuck with the split/merge > approach - the key would likely be to consider backpressure heavily. If you > have backpressure set to 10,000 FlowFiles (the default) and then you’re > trying to merge together data, but the data comes from many different > upstream splits, you can certainly end up in a situation like this, where > you don’t have all of the data from a given ’split’ queued up. for > MergeContent. > > > > Hope this helps! > > -Mark > > > > [1] https://www.youtube.com/watch?v=RjWstt7nRVY > > > > > > On Feb 24, 2021, at 4:59 PM, Greene (US), Geoffrey N < > geoffrey.n.gre...@boeing.com> wrote: > > > > Im having some trouble with multiple splits/merges. Here’s the idea: > > > > > > Big data -> split 1->Save all the fragment.*attributes into variables -> > split 2-> save all the fragment.* attributes > > | > > Split 1 > > | > > Save fragment.* attributes into split1.fragment.* > > | > > Split 2 > > | > > Save fragment.* attributes into split2.fragment.* attributes > > | > > (More processing) > > | > > Split 3 > > | > > Save fragment.* attributes into split3.fragment.* attributes > > | > > (other stuff) > > | > > Restore split3.fragment.* attributes to fragment.* > > | > > Merge3, using defragment strategy > > | > > Restore split2.fragment.* attributes to fragment.* > > | > > Merge 2 using defragment strategy > > | > > Restore split1.frragment.* attributes to fragment.* > > | > > Merge 1 using defragment strategy > > > > Am I thinking about this correctly? It seems like sometimes, nifi is > unable to do a merge on some of the split data (errors like “there are 50 > fragments, but we only found one). Is it possible that I need to do some > prioritization in the queues? I have noticed that my things do back up and > the queues seem to fill up as its going through (several of the splits need > to perform rest calls and processing, which can take time. Maybe the issue > is that one fragment “slips” through, before the others have even been > processed far enough. Is there an approved way to do this? > > > > Thanks for the help! > > >