Re: provenance

Brett Tiplitz Wed, 05 Oct 2016 09:21:58 -0700

I was always trying to understand the run duration.  I'm good on the
latency, so if it processes a bunch of events at once and my overall
throughput is the same, then it's ok.  I increased it to 100ms.  But I
looked at the bulk of my flow and this feature was only on 1 of the > 10
processors data goes through.


I realize that slowing the rate of commits seems bad, but even the big guys
limit commits


On Wed, Oct 5, 2016 at 12:05 PM, Bryan Bende <bbe...@gmail.com> wrote:

> Brett,
>
> One thing that could possibly improve the performance here, although hard
> to say how much, is the concept of "Run Duration" on the processor
> scheduling tab. This is only available on processors marked with the
> @SupportsBatching annotation, so it depends what processors you are using.
>
> By increasing the run duration it lets the framework batch together all of
> the framework operations during that time period. The default setting is 0
> which means no batching by default, giving you the lowest latency per flow
> file, but users can choose to sacrifice some latency for higher throughput.
>
> I don't know enough about how provenance events are specifically
> committed, but I believe they would be tied to the session commits so that
> if a rollback occurred there wouldn't be unwanted events written.
>
> -Bryan
>
>
> On Wed, Oct 5, 2016 at 11:38 AM, Brett Tiplitz <
> brett.m.tipl...@systolic-inc.com> wrote:
>
>> James -
>>
>> I believe the complication for me is both the number of objects as well
>> as the number of processors the data goes through.  I talked with a few
>> people and it sounds like NIFI writes each event out disk and then executes
>> a commit, which really does have a major impact on the performance.  I
>> don't have the liberty of resolving the disk performance, though I think I
>> will try moving the journals directory to /dev/shm.  I know on reboot I'll
>> loose data, but that is just like 1-2 times a year, so I think that loss is
>> acceptable.  Also, I'm not specifying anything on what data get's indexed
>> so it's what ever the default is.
>>
>> If I'm producing about 6000 (just a guess, though I think it's pretty
>> large) events per second, it would be nice if there was an option not to
>> perform a commit on every one of the 6000 items.  In reality, I would say a
>> commit should never occur more than once a second and that is likely way
>> too often.
>>
>> Last, is there a way to measure the actual provenance events going
>> through as I'm guessing on what it's actually doing here.
>>
>> brett
>>
>> On Fri, Sep 30, 2016 at 2:16 PM, James Wing <jvw...@gmail.com> wrote:
>>
>>> Brett,
>>>
>>> The default provenance store, PersistentProvenanceRepository, does
>>> require I/O in proportion to flowfile events.  Flowfiles with many
>>> attributes, especially large attributes, are a frequent contributor to
>>> provenance overload because attribute state is tracked in provenance
>>> events.  But this is different from flowfile content reads and writes,
>>> which use the separate content repository.  You might consider moving the
>>> provenance repository to a separate disk for additional I/O capacity.
>>>
>>> Does this sound relevant?  Can you share some details of your flow
>>> volumes and attribute sizes?
>>>
>>> nifi.provenance.repository.buffer.size is only used by the
>>> VolatileProvenanceRepository implementation, an in-memory provenance
>>> store.  The property defines the size of the in-memory store.  The volatile
>>> store can avoid disk I/O issues, but at the expense of reduced provenance
>>> functionality.
>>>
>>> Thanks,
>>>
>>> James
>>>
>>> On Thu, Sep 29, 2016 at 1:37 PM, Brett Tiplitz <
>>> brett.m.tipl...@systolic-inc.com> wrote:
>>>
>>>> I'm having a throughput problem when processing data with Provenance
>>>> recording enabled.  I've pretty much disabled it, so I believe that is the
>>>> source of my issue.  On occasion, I get a message saying the flow is
>>>> slowing due to provenance recording.  I was running the out of the box
>>>> configuration for provenance.
>>>>
>>>> I believe the issue might be related to commit writes, though it's just
>>>> a theory.  There is a variable nifi.provenance.repository.buffer.size,
>>>> though I don't see anything about what that does.
>>>>
>>>> Any suggestions ?
>>>>
>>>> thanks,
>>>>
>>>> brett
>>>>
>>>> --
>>>> Brett Tiplitz
>>>> Systolic, Inc
>>>>
>>>
>>>
>>
>>
>> --
>> Brett Tiplitz
>> Systolic, Inc
>>
>
>


-- 
Brett Tiplitz
Systolic, Inc

Re: provenance

Reply via email to