One thing to check is how much you're serializing to the network. If you're 
using Avro Generic records without special handling you can wind up serializing 
the schema with every record, greatly increasing the amount of data you're 
sending across the wire.

On 11/9/20, 8:14 AM, "ashwinkonale" <ashwin.kon...@gmail.com> wrote:

    Hi,
    Thanks a lot for the reply. I added some more metrics to the pipeline to
    understand bottleneck. Seems like avro deserialization introduces some
    delay. With use of histogram I found processing of a single message takes
    ~300us(p99). ~180(p50). Which means a single slot can output at most 3000
    messages per second. This essentially means, to support QPS of 3mil/s I will
    need parallelism of 1000. Is my understanding correct ? Can I do anything
    else apart from having so many slots in my job cluster ? Also do you have
    any guides or pointers how to do such setups. eg, large number of
    taskmanagers with smaller slots or bigger TMs with many slots and bigger
    jvms, larger network buffers etc ? 



    --
    Sent from: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dflink-2Duser-2Dmailing-2Dlist-2Darchive.2336050.n4.nabble.com_&d=DwICAg&c=qE8EibqjfXM-zBfebVhd4gtjNZbrDcrKYXvb1gt38s4&r=zKznthi6OTKpoJID9dIcyiJ28NX59JIQ2bD246nnMac&m=JLzzDt63U24H1L2TG-WER0CKB0WbqSbr0WnC6dIIwS4&s=vGEnh77tTs1Mdynjks6LhXUaNZRRBvj3pS5es-Bg3cI&e=
 

Reply via email to