One thing to check is how much you're serializing to the network. If you're using Avro Generic records without special handling you can wind up serializing the schema with every record, greatly increasing the amount of data you're sending across the wire.
On 11/9/20, 8:14 AM, "ashwinkonale" <ashwin.kon...@gmail.com> wrote: Hi, Thanks a lot for the reply. I added some more metrics to the pipeline to understand bottleneck. Seems like avro deserialization introduces some delay. With use of histogram I found processing of a single message takes ~300us(p99). ~180(p50). Which means a single slot can output at most 3000 messages per second. This essentially means, to support QPS of 3mil/s I will need parallelism of 1000. Is my understanding correct ? Can I do anything else apart from having so many slots in my job cluster ? Also do you have any guides or pointers how to do such setups. eg, large number of taskmanagers with smaller slots or bigger TMs with many slots and bigger jvms, larger network buffers etc ? -- Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dflink-2Duser-2Dmailing-2Dlist-2Darchive.2336050.n4.nabble.com_&d=DwICAg&c=qE8EibqjfXM-zBfebVhd4gtjNZbrDcrKYXvb1gt38s4&r=zKznthi6OTKpoJID9dIcyiJ28NX59JIQ2bD246nnMac&m=JLzzDt63U24H1L2TG-WER0CKB0WbqSbr0WnC6dIIwS4&s=vGEnh77tTs1Mdynjks6LhXUaNZRRBvj3pS5es-Bg3cI&e=