I do not think provenance data alone will make any sense even to you,
certainly not for your users. We followed advise from Pierre's blog
https://pierrevillard.com/best-of-nifi/ and it is a combination of naming
processors and having our own custom "AuditLog" processor that logs key
events, record counts, statuses etc. to the external MySQL table in append
only fashion.

Users can use this table to poll for events or build dashboards to report
specific ingest processes. We considered using Kafka as we use it already
for other projects but did not do it because it would be difficult for
BI/ETL developers using traditional tools to work with Kafka messages.
MySQL was the easiest option. We might switch to NoSQL DB at some point but
we have no issues with our volumes staying on mysql.

On Fri, Mar 27, 2020 at 8:31 AM Mike Thomsen <mikerthom...@gmail.com> wrote:

> Thanks. I've done something similar in the past using Elasticsearch as the
> data store. I think we might start with that and hope that we don't get
> more nuanced requirements. I guess we could look at naming the processors
> after the steps and hope that that works to keep users happy.
>
> On Fri, Mar 27, 2020 at 8:22 AM Marc Parisi <phroc...@apache.org> wrote:
>
>> Hey Mike,
>>
>> I recently did something similar for a personal project. I ingested
>> Provenance data into a NoSQL store ( through a reporting task that also
>> indexed the data ), primarily querying upon the ProvenanceEventType.
>>
>> I tracked some piece of information ( in my case the original file name
>> with an identifier ) and queried for event types to get an idea of what
>> occurred - for example I looked for ROUTE and ATTRIBUTES_MODIFIED to
>> determine which path my data took.
>>
>> It was very easy to monitor the provenance event types for DROP and to
>> check if data succeeded or failed. I didn't concern myself with diving into
>> why data failed because I was worried that would be a bit more complex and
>> requires a bit more thought.
>>
>> I originally had an ingest processor perform this notification but moved
>> to a provenance reporting task as it just worked so well ( at least for my
>> purposes ).
>>
>> In my case the dashboard was a simple table that showed what file(s) I
>> uploaded and their state, flashing red if data took more than a
>> configurable period of time to complete ( fail or success). The table
>> linked to a separate query interface that would allow a deeper dive into
>> the provenance records so that i can dive into a problem set further if
>> failure or extreme latency occurred.  it was super simple...
>>
>> Hope this helps,
>> Marc
>>
>> On Fri, Mar 27, 2020 at 7:51 AM Mike Thomsen <mikerthom...@gmail.com>
>> wrote:
>>
>>> Has anyone ever created good dashboards on top of NiFi flows or
>>> provenance data that will report the status of a flowfile back to the user?
>>> Our client would like to give users the ability to feed Nifi data and then
>>> get a basic view of where it is. It can be fairly simplistic, like
>>> "Started..." "Processing..." "Done..." for now, but I was wondering if
>>> anyone has any good patterns for this before I dive into it myself.
>>>
>>> My current thought here is to create a new processor bundle that would
>>> add a new processor called "ProgressGateProcessor" that would allow users
>>> in one step to signal to an external application or data store the status
>>> of a flowfile, so you don't have to mix in process groups.
>>>
>>> Thanks,
>>>
>>> Mike
>>>
>>

Reply via email to