Hello Manish
Thx for the very helpful  answer , but I was thinking that this functional 
perimeter ( ie logging, storing transformations of data, data lineage ) was 
built in Nifi and available  through REST API  ...
Or internal calls ...
The point is that I am not ready to  hook devoted logging processors on every 
processor of my DF or on DF developed by others 
-  firstly , it is intrusive in the DF
- secondly , it cannot be easily hooked with a template approach .. because it 
is very dependent of the chosen processors in the DF

 Ideally (in  a very simple /naïve requirement)  I would like to run my DF 
taking again my example :
 (File1 (in) --> Processor1 --> flow1 --> Processor2 --> flow2 --> File2 (out)) 
And then store all the stuff in a Database and says :

getTrace  (Processor1, beforeProcessing)  -> returning ( Attributes , flowfile)
getTrace ( Processor2, afterProcessing)  ........................

phil
best regards


-----Original Message-----
From: Manish Gupta 8 [mailto:mgupt...@sapient.com] 
Sent: mardi 27 septembre 2016 16:46
To: users@nifi.apache.org
Subject: RE: logging all transformed flowfiles

Hi Phil,

We are also doing a similar thing but not keeping all the content after each 
transformation externally. What we do is, only send the flow file attributes to 
an external storage (like file / Event Hub / Database/NoSQL) using 
AttributesToJSON processor and then send it for logging after every logical 
step where we want to log (after adding couple of additional details like - 
step name, #of rows in file, hascode etc.).

For your scenario, I think you can simply clone the output relationship from 
each of your processors and send it to a single/multiple logging/sink 
processors. For keeping the lineage, you have couple of options:
1. Use different sink/folder/table for each step (with corresponding name) 2. 
Keep file name consistent to track the lineage 3. Modify the Flow file content 
to make sure you can track the lineage from the metadata content.


Regards,
Manish

-----Original Message-----
From: philippe.gib...@orange.com [mailto:philippe.gib...@orange.com]
Sent: Tuesday, September 27, 2016 7:33 PM
To: users@nifi.apache.org
Subject: logging all transformed flowfiles

Hello,
My SW context : standalone  NiFi  1.0.0

My Problem  : I would like to log all the different transformations applied to 
an initial file ( input) up to exiting the  DF ( output) :
If imagine this simple DF :
File1 (in) --> Processor1 --> flow1 --> Processor2 --> flow2 --> File2 (out)
I would like to store outside of Nifi  ( in my own  external DB) ->   File1, 
flow1, flow2, File2
Are  there some simple  REST API to help to accomplish this ( I looked at  Data 
provenance and SiteToSiteProvenanceReportingTask but not clearly found the 
right way to implement this) Any idea ?

Phil
Best regards 



_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

Reply via email to