Well, in my experience Dead Letter is more of a global storage mechanism for undeliverable units of “something” (i.e., messages). While I do see how “failed” Flow Files may resemble such undeliverable/unprocessable units of work, the nature of Data Flow makes it a bit different, since there is really no one stop solution for that. Some may be satisfied with auto-termination while others may try to reprocess or route to a different sub-flow which may be the actual delegation to some storage where things could be reviewed. And for that case as you’ve already noticed NiFi has provides all the moving parts. So PutS3 or PutFile followed by PutSlack or SendEmail are just a different implementations of the "store and notify” pattern that you essentially presenting here.
Also, keep in mind that once failed there is no one-stop re-processing solution. You may have certain FlowFiles that can be automatically fixed and re-sent, while others may need manual intervention and for that you really need to build and customize the Dead Latter component which may include some internal splitting and routing of Flow Files that Could be fixed vs the ones that can not be. Cheers Oleg On Mar 1, 2017, at 3:38 PM, Nick Carenza <nick.care...@thecontrolgroup.com<mailto:nick.care...@thecontrolgroup.com>> wrote: Sorry for the confusion, I meant to put emphasis on the _you_, as in 'you all' or other users of nifi. I am looking to get insight into solutions other have implemented to deal with failures. - Nick On Wed, Mar 1, 2017 at 12:29 PM, Oleg Zhurakousky <ozhurakou...@hortonworks.com<mailto:ozhurakou...@hortonworks.com>> wrote: Nick Since you’ve already designed Process Group (PG) that is specific to failed flow files, I am not sure I understand your last question “. . .How do you manage failure relationships?. . .”. I am assuming that within your global flow all failure relationships are sent to this PG, which essentially is a Dead Letter Storage. Are you asking about how do you get more information from the failed Flow Files (i.e., failure location, reason etc)? Cheers Oleg On Mar 1, 2017, at 3:21 PM, Nick Carenza <nick.care...@thecontrolgroup.com<mailto:nick.care...@thecontrolgroup.com>> wrote: I have a lot of processors in my flow, all of which can, and do, route flowfiles to their failure relationships at some point. In the first iteration of my flow, I routed every failure relationship to an inactive DebugFlow but monitoring these was difficult, I wouldn't get notifications when something started to fail and if the queue got filled up it would apply backpressure and prevent new, good flowfiles from being processed. Not only was that just not a good way to handle failures, but my flow was littered with all of these do-nothing processors and was an eye sore. So then I tried routing processor failure relationships into themselves which tidied up my flow but caused nifi to go berserk when a failure occurred because the failure relationship is not penalized (nor should it be) and most processors don't provide a 'Retry' relationship (InvokeHttp being a notable exception). But really, most processors wouldn't conceivable succeed if they were tried again. I mostly just wanted the flowfiles to sit there until I had a chance to check out why they failed and fix them manually. This leads me to https://issues.apache.org/jira/browse/NIFI-3351. I think I need a way to store failed flowfiles, fix them and reprocess them. The process group I am currently considering implementing everywhere is: Input Port [Failed Flowfile] --> PutS3 deadletter/<failure location>/<failure reason>/${uuid} --> PutSlack ListS3 deadletter/<failure location>/<failure reason>/ --> FetchS3 -> Output Port [Fixed] This gives me storage of failed messages logically grouped and in a place that wont block up my flow since s3 never goes down, err... wait. Configurable process groups or template like https://issues.apache.org/jira/browse/NIFI-1096 would make this easier to reuse. How do you manage failure relationships? - Nick