We're about to get started on a 9-person-month PoC using Flink Streaming. 
Before we get started, I am interested to know how low-latency I can expect for 
my end-to-end flow for a single event (from source to sink). 

Here is a very high-level description of our Flink design: 
We need at least once semantics, and our main flow of application is parsing a 
message ( < 50 microseconds) from Kafka, and then doing a keyBy on the parsed 
event ( <1kb) and then updating a very small user state in the KeyedStream, and 
then doing another keyBy and then operator of that KeyedStream. Each of the 
operators is a very simple operation - very little calculation and no I/O.


** Our requirement is to get close to 1ms (99%) or lower for end-to-end 
processing (timer starts once we get message from Kafka). Is this at all 
realistic if are flow contains 2 aggregations?  If so, what optimizations might 
we need to get there regarding cluster configuration (both Flink and Hardware). 
Our throughput is possibly small enough (40,000 events per second) that we 
could run on one node - which might eliminate some network latency. 

I did read in 
https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html
 in Exactly Once vs At Least Once that a few milliseconds is considered super 
low-latency - wondering if we can get lower.

Any advice or 'war stories' are very welcome.

Thanks,
Hayden Marchant


Reply via email to