Tim

Got ya.  So yeah keep in mind you'll only have at most 1GB of prov
data and for at most 24 hours with that configuration.  Also, as James
mentioned the default searching for provenance can be too restrictive
and you have to pay close attention to time stamps relative to the
system doing the query/etc..  In general though it should work just
fine.

1) definitely use the newer provenance.  We need to change the default
as the new one is very fast and very stable.

To do this change

nifi.provenance.repository.implementation=org.apache.nifi.provenance.PersistentProvenanceRepository
to
nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository

2) Change retention period and size values such as

nifi.provenance.repository.max.storage.time=72 hours
nifi.provenance.repository.max.storage.size=50 GB

There are some other tweaks you can do in terms of
threads/sharding/etc.. that help with performance but the above are
good to do now regardless of performance.

Thanks

On Tue, May 22, 2018 at 10:50 AM, Tim Dean <tim.d...@gmail.com> wrote:
> Thanks Joe:
>
> I have not yet made any changes to the configuration. We are just beginning
> the process of running out flow at scale and figuring out how to best
> optimize the configuration, and I plan to make changes as needed once we can
> get the flow functionally correct. Right now I’m having difficulty doing
> that because the lack of provenance events.
>
> Here is the provenance-related properties I have in my nifi.properties file:
>
> # Provenance Repository Properties
> nifi.provenance.repository.implementation=org.apache.nifi.provenance.PersistentProvenanceRepository
> nifi.provenance.repository.debug.frequency=1_000_000
> nifi.provenance.repository.encryption.key.provider.implementation=
> nifi.provenance.repository.encryption.key.provider.location=
> nifi.provenance.repository.encryption.key.id=
> nifi.provenance.repository.encryption.key=
>
> # Persistent Provenance Repository Properties
> nifi.provenance.repository.directory.default=./provenance_repository
> nifi.provenance.repository.max.storage.time=24 hours
> nifi.provenance.repository.max.storage.size=1 GB
> nifi.provenance.repository.rollover.time=30 secs
> nifi.provenance.repository.rollover.size=100 MB
> nifi.provenance.repository.query.threads=2
> nifi.provenance.repository.index.threads=2
> nifi.provenance.repository.compress.on.rollover=true
> nifi.provenance.repository.always.sync=false
> nifi.provenance.repository.journal.count=16
> # Comma-separated list of fields. Fields that are not indexed will not be
> searchable. Valid fields are:
> # EventType, FlowFileUUID, Filename, TransitURI, ProcessorID,
> AlternateIdentifierURI, Relationship, Details
> nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename,
> ProcessorID, Relationship
> # FlowFile Attributes that should be indexed and made searchable.  Some
> examples to consider are filename, uuid, mime.type
> nifi.provenance.repository.indexed.attributes=
> # Large values for the shard size will result in more Java heap usage when
> searching the Provenance Repository
> # but should provide better performance
> nifi.provenance.repository.index.shard.size=500 MB
> # Indicates the maximum length that a FlowFile attribute can be when
> retrieving a Provenance Event from
> # the repository. If the length of any attribute exceeds this value, it will
> be truncated when the event is retrieved.
> nifi.provenance.repository.max.attribute.length=65536
> nifi.provenance.repository.concurrent.merge.threads=2
> nifi.provenance.repository.warm.cache.frequency=1 hour
>
> # Volatile Provenance Respository Properties
> nifi.provenance.repository.buffer.size=100000
>
>
> Thanks for any help you can provide on this
>
> -Tim
>
> On May 21, 2018, at 11:23 PM, Joe Witt <joe.w...@gmail.com> wrote:
>
> Tim,
>
> The default configuration for provenance event retention is
> potentially a factor.
>
> Did you make any changes to those?  Can you share relevant segments
> from the nifi.properties file?
>
> Thanks
>
> On Mon, May 21, 2018 at 8:32 PM, Tim Dean <tim.d...@gmail.com> wrote:
>
> Hello,
>
> I am having a hard time troubleshooting a NiFi flow to see where things are
> failing. I am trying to look at the provenance repository for a variety of
> processors, but for some reason nothing more recent seems to be appearing
> there. For example:
>
> At approximately 10:30 this morning I started a flow and observed it for a
> couple of hours before disabling it to look into a few unexpected results.
> By right-clicking individual processors and selecting “View data provenance”
> I can see the NiFi Data Provenance view
> For each processor I investigate I can see anywhere from 10 to 100
> provenance events that came in during the hours I was running my flow
> A few hours later I restart the flow. Data once again flows through and
> after a while I stop my flow again
> Now I again right-click on the processors and select “View data provenance”.
> No new provenance events seem to show up in the NiFi Data Provenance view
>
>
> I have checked m search filter to make sure I am not accidentally filtering
> out events. I have looked at the external systems that this flow touches and
> confirmed that data is/was flowing through these processors. But for some
> reason I can see no provenance records in the UI.
>
> I am using NiFi version 1.5
>
> I have not (yet) changed any of the default settings for NiFi and how its
> provenance repository is configured
>
> Any advice on where my provenance events are going or what I might be doing
> that causes the provenance system to go silent on me?
>
> Thanks
>
> -Tim
>
>

Reply via email to