Hello Dimitry,

Could you please elaborate on your tuning on -> environment.addDefaultKryoSerializer(..) .

I'm interested on knowing what have you done there for a boost of about 50% .

Some small or simple example would be very nice.

Thank you very much in advance.

Kind Regards,

Daniel Santos


On 02/17/2017 12:43 PM, Dmitry Golubets wrote:
Hi,

My streaming job cannot benefit much from parallelization unfortunately.
So I'm looking for things I can tune in Flink, to make it process sequential stream faster.

So far in our current engine based on Akka Streams (non distributed ofc) we have 20k msg/sec.
Ported to Flink I'm getting 14k so far.

My observations are following:

  * if I chain operations together they execute all in sequence, so I
    basically sum up the time required to process one data item across
    all my stream operators, not good
  * if I split chains, they execute asynchronously to each other, but
    there is serialization and network overhead

Second approach gives me better results, considering that I have a server with more than enough memory and cores to do all side work for serialization. But I want to reduce this serialization\data transfer overhead to a minimum.

So what I have now:

environment.getConfig.enableObjectReuse() // cos it's Scala we don't need unnecessary serialization environment.getConfig.disableAutoTypeRegistration() // it works faster with it, I'm not sure why environment.addDefaultKryoSerializer(..) // custom Message Pack serialization for all message types, gives about 50% boost

But that's it, I don't know what else to do.
I didn't find any interesting network\buffer settings in docs.

Best regards,
Dmitry

Reply via email to