I've noticed that the total# of flow files and processors is missing from
the questions.  Since NIFI keeps state on disk, every transaction has to be
committed.  Some newer processors support batch mode, but worst case is
absent of this.  Thus the limit might not be in bytes but number of flow
files times the number of processors.

I've been using NiFI with tons of smaller files and I'm getting crushed by
the limit in the number of transactions without any concern for the MB/s,
so I thought I would throw this into the questions.  I believe I'm also
limited by the speed of light, since I'm using network attached storage, so
the actual time through the network is causing additional latency, while
the network is not actually saturated.

brett

On Mon, Oct 17, 2016 at 9:59 PM, Joe Witt <[email protected]> wrote:

> Ali,
>
> Without knowing the details of the data streams, nature of each event
> and the operations that will be performed against them, or how the
> processors themselves will work, I cannot give you a solid answer.  Do
> I think it is possible?  Absolutely.  Do I think there will be hurdles
> to overcome to reach and sustain such a rate?  Absolutely.
>
> Thanks
> Joe
>
> On Mon, Oct 17, 2016 at 9:28 PM, Lee Laim <[email protected]> wrote:
> > Ali,
> > I used the pcie for all repos and the PutFile destination.
> >
> >
> >
> > On Oct 18, 2016, at 8:38 AM, Ali Nazemian <[email protected]> wrote:
> >
> > Hi Lee,
> >
> > I was wondering, did you use PCIe for file flow repo or provenance repo
> or
> > content repo? or all of them?
> >
> > Joe,
> >
> > The ETL is not very complicated ETL, so do you think isn't it possible to
> > reach 800MBps in production even if I use PCIe for file flow repo? Is it
> > worth spending money on PCIe for the file flow repo?
> >
> > Best regards
> >
> > On Tue, Oct 18, 2016 at 2:36 AM, Joe Witt <[email protected]> wrote:
> >>
> >> Thanks Lee.  Your response was awesome and really made me want to get
> >> hands on a set of boxes like this so we could do some testing.
> >>
> >> Thanks
> >> Joe
> >>
> >> On Mon, Oct 17, 2016 at 11:32 AM, Lee Laim <[email protected]> wrote:
> >> > Joe,
> >> > Good points regarding throughput on real flows and sustained basis.
> My
> >> > test
> >> > was only pushing one aspect of the system.
> >> >
> >> > That said, I would be interested discussing/developing a more
> >> > comprehensive
> >> > test flow to capture more real world use cases. I'll check to see if
> >> > that
> >> > conversation has started.
> >> >
> >> > Thanks,
> >> > Lee
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Lee Laim
> >> > 610-864-1657
> >> >
> >> > On Oct 17, 2016, at 9:55 PM, Ali Nazemian <[email protected]>
> wrote:
> >> >
> >> > Dear Joe,
> >> > Thank you very much.
> >> >
> >> > Best regards
> >> >
> >> >
> >> > On Mon, Oct 17, 2016 at 10:08 PM, Joe Witt <[email protected]>
> wrote:
> >> >>
> >> >> Ali
> >> >>
> >> >> I suspect bottlenecks in the software itself and the flow design will
> >> >> become a factor before you 800 MB/s. You'd likely hit CPU efficiency
> >> >> issues before this caused by the flow processors themselves and due
> to
> >> >> garbage collection.  Probably the most important factor though will
> be
> >> >> the transaction rate and whether the flow is configured to tradeoff
> >> >> some latency for higher throughput.  So many variables at play but
> >> >> under idealized conditions and a system like you describe it is
> >> >> theoretically feasible to hit that value.
> >> >>
> >> >> Practically speaking I think you'd be looking at a couple hundred
> MB/s
> >> >> per server like this on real flows on a sustained basis.
> >> >>
> >> >> Thanks
> >> >> Joe
> >> >>
> >> >> On Sun, Oct 16, 2016 at 11:06 PM, Ali Nazemian <
> [email protected]>
> >> >> wrote:
> >> >> > Dear Nifi users/developers,
> >> >> > Hi,
> >> >> >
> >> >> > I was wondering how can I calculate the theoretical throughput of a
> >> >> > Nifi
> >> >> > server? let's suppose we can eliminate different bottlenecks such
> as
> >> >> > the
> >> >> > file flow rep and provenance repo bottleneck by using a very
> high-end
> >> >> > SSD.
> >> >> > Moreover, assume that a very high-end network infrastructure is
> >> >> > available.
> >> >> > In this case, is it possible to reach 800MB throughput per second
> per
> >> >> > each
> >> >> > server? Suppose each server comes with 24 disk slots. 16 disk slots
> >> >> > are
> >> >> > used
> >> >> > for creating 8 x RAID1(SAS 10k) mount points and are dedicated to
> the
> >> >> > content repo. Let's say each content repo can achieve 100 MB
> >> >> > throughput.
> >> >> > May
> >> >> > I say the total throughput per each server can be 8x100=800MBps?
> Is
> >> >> > it
> >> >> > possible to reach this amount of throughput practically?
> >> >> > Thank you very much.
> >> >> >
> >> >> > Best regards,
> >> >> > Ali
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > A.Nazemian
> >
> >
> >
> >
> > --
> > A.Nazemian
>



-- 
Brett Tiplitz
Systolic, Inc

Reply via email to