If you really need to get that low something else might be more suitable. Given the times a custom solution might be necessary. Flink is a generic powerful framework - hence it does not address these latencies.
> On 31. Aug 2017, at 14:50, Marchant, Hayden <hayden.march...@citi.com> wrote: > > We're about to get started on a 9-person-month PoC using Flink Streaming. > Before we get started, I am interested to know how low-latency I can expect > for my end-to-end flow for a single event (from source to sink). > > Here is a very high-level description of our Flink design: > We need at least once semantics, and our main flow of application is parsing > a message ( < 50 microseconds) from Kafka, and then doing a keyBy on the > parsed event ( <1kb) and then updating a very small user state in the > KeyedStream, and then doing another keyBy and then operator of that > KeyedStream. Each of the operators is a very simple operation - very little > calculation and no I/O. > > > ** Our requirement is to get close to 1ms (99%) or lower for end-to-end > processing (timer starts once we get message from Kafka). Is this at all > realistic if are flow contains 2 aggregations? If so, what optimizations > might we need to get there regarding cluster configuration (both Flink and > Hardware). Our throughput is possibly small enough (40,000 events per second) > that we could run on one node - which might eliminate some network latency. > > I did read in > https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html > in Exactly Once vs At Least Once that a few milliseconds is considered super > low-latency - wondering if we can get lower. > > Any advice or 'war stories' are very welcome. > > Thanks, > Hayden Marchant > >