I was always trying to understand the run duration. I'm good on the latency, so if it processes a bunch of events at once and my overall throughput is the same, then it's ok. I increased it to 100ms. But I looked at the bulk of my flow and this feature was only on 1 of the > 10 processors data goes through.
I realize that slowing the rate of commits seems bad, but even the big guys limit commits On Wed, Oct 5, 2016 at 12:05 PM, Bryan Bende <bbe...@gmail.com> wrote: > Brett, > > One thing that could possibly improve the performance here, although hard > to say how much, is the concept of "Run Duration" on the processor > scheduling tab. This is only available on processors marked with the > @SupportsBatching annotation, so it depends what processors you are using. > > By increasing the run duration it lets the framework batch together all of > the framework operations during that time period. The default setting is 0 > which means no batching by default, giving you the lowest latency per flow > file, but users can choose to sacrifice some latency for higher throughput. > > I don't know enough about how provenance events are specifically > committed, but I believe they would be tied to the session commits so that > if a rollback occurred there wouldn't be unwanted events written. > > -Bryan > > > On Wed, Oct 5, 2016 at 11:38 AM, Brett Tiplitz < > brett.m.tipl...@systolic-inc.com> wrote: > >> James - >> >> I believe the complication for me is both the number of objects as well >> as the number of processors the data goes through. I talked with a few >> people and it sounds like NIFI writes each event out disk and then executes >> a commit, which really does have a major impact on the performance. I >> don't have the liberty of resolving the disk performance, though I think I >> will try moving the journals directory to /dev/shm. I know on reboot I'll >> loose data, but that is just like 1-2 times a year, so I think that loss is >> acceptable. Also, I'm not specifying anything on what data get's indexed >> so it's what ever the default is. >> >> If I'm producing about 6000 (just a guess, though I think it's pretty >> large) events per second, it would be nice if there was an option not to >> perform a commit on every one of the 6000 items. In reality, I would say a >> commit should never occur more than once a second and that is likely way >> too often. >> >> Last, is there a way to measure the actual provenance events going >> through as I'm guessing on what it's actually doing here. >> >> brett >> >> On Fri, Sep 30, 2016 at 2:16 PM, James Wing <jvw...@gmail.com> wrote: >> >>> Brett, >>> >>> The default provenance store, PersistentProvenanceRepository, does >>> require I/O in proportion to flowfile events. Flowfiles with many >>> attributes, especially large attributes, are a frequent contributor to >>> provenance overload because attribute state is tracked in provenance >>> events. But this is different from flowfile content reads and writes, >>> which use the separate content repository. You might consider moving the >>> provenance repository to a separate disk for additional I/O capacity. >>> >>> Does this sound relevant? Can you share some details of your flow >>> volumes and attribute sizes? >>> >>> nifi.provenance.repository.buffer.size is only used by the >>> VolatileProvenanceRepository implementation, an in-memory provenance >>> store. The property defines the size of the in-memory store. The volatile >>> store can avoid disk I/O issues, but at the expense of reduced provenance >>> functionality. >>> >>> Thanks, >>> >>> James >>> >>> On Thu, Sep 29, 2016 at 1:37 PM, Brett Tiplitz < >>> brett.m.tipl...@systolic-inc.com> wrote: >>> >>>> I'm having a throughput problem when processing data with Provenance >>>> recording enabled. I've pretty much disabled it, so I believe that is the >>>> source of my issue. On occasion, I get a message saying the flow is >>>> slowing due to provenance recording. I was running the out of the box >>>> configuration for provenance. >>>> >>>> I believe the issue might be related to commit writes, though it's just >>>> a theory. There is a variable nifi.provenance.repository.buffer.size, >>>> though I don't see anything about what that does. >>>> >>>> Any suggestions ? >>>> >>>> thanks, >>>> >>>> brett >>>> >>>> -- >>>> Brett Tiplitz >>>> Systolic, Inc >>>> >>> >>> >> >> >> -- >> Brett Tiplitz >> Systolic, Inc >> > > -- Brett Tiplitz Systolic, Inc