Hello Manish Thx for the very helpful answer , but I was thinking that this functional perimeter ( ie logging, storing transformations of data, data lineage ) was built in Nifi and available through REST API ... Or internal calls ... The point is that I am not ready to hook devoted logging processors on every processor of my DF or on DF developed by others - firstly , it is intrusive in the DF - secondly , it cannot be easily hooked with a template approach .. because it is very dependent of the chosen processors in the DF
Ideally (in a very simple /naïve requirement) I would like to run my DF taking again my example : (File1 (in) --> Processor1 --> flow1 --> Processor2 --> flow2 --> File2 (out)) And then store all the stuff in a Database and says : getTrace (Processor1, beforeProcessing) -> returning ( Attributes , flowfile) getTrace ( Processor2, afterProcessing) ........................ phil best regards -----Original Message----- From: Manish Gupta 8 [mailto:mgupt...@sapient.com] Sent: mardi 27 septembre 2016 16:46 To: users@nifi.apache.org Subject: RE: logging all transformed flowfiles Hi Phil, We are also doing a similar thing but not keeping all the content after each transformation externally. What we do is, only send the flow file attributes to an external storage (like file / Event Hub / Database/NoSQL) using AttributesToJSON processor and then send it for logging after every logical step where we want to log (after adding couple of additional details like - step name, #of rows in file, hascode etc.). For your scenario, I think you can simply clone the output relationship from each of your processors and send it to a single/multiple logging/sink processors. For keeping the lineage, you have couple of options: 1. Use different sink/folder/table for each step (with corresponding name) 2. Keep file name consistent to track the lineage 3. Modify the Flow file content to make sure you can track the lineage from the metadata content. Regards, Manish -----Original Message----- From: philippe.gib...@orange.com [mailto:philippe.gib...@orange.com] Sent: Tuesday, September 27, 2016 7:33 PM To: users@nifi.apache.org Subject: logging all transformed flowfiles Hello, My SW context : standalone NiFi 1.0.0 My Problem : I would like to log all the different transformations applied to an initial file ( input) up to exiting the DF ( output) : If imagine this simple DF : File1 (in) --> Processor1 --> flow1 --> Processor2 --> flow2 --> File2 (out) I would like to store outside of Nifi ( in my own external DB) -> File1, flow1, flow2, File2 Are there some simple REST API to help to accomplish this ( I looked at Data provenance and SiteToSiteProvenanceReportingTask but not clearly found the right way to implement this) Any idea ? Phil Best regards _________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration, Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci. This message and its attachments may contain confidential or privileged information that may be protected by law; they should not be distributed, used or copied without authorisation. If you have received this email in error, please notify the sender and delete this message and its attachments. As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified. Thank you.