Re: Parallelize Kafka Deserialization of a single partition?

2020-02-19 Thread Till Rohrmann
s one left? And if null, I instantiate a new instance > > in my code? With billions of small events ingested per day, I can > > imagine this to be another small performance improvement especially in > > terms of garbage collection… > > > > Best regads > > > > T

Re: Parallelize Kafka Deserialization of a single partition?

2020-02-19 Thread Timo Walther
especially in terms of garbage collection… Best regads Theo *From:*Till Rohrmann *Sent:* Mittwoch, 19. Februar 2020 07:34 *To:* Jin Yi *Cc:* user *Subject:* Re: Parallelize Kafka Deserialization of a single partition? Then my statement must be wrong. Let me double check this. Yesterday when

RE: Parallelize Kafka Deserialization of a single partition?

2020-02-19 Thread Theo Diefenthal
… Best regads Theo From: Till Rohrmann Sent: Mittwoch, 19. Februar 2020 07:34 To: Jin Yi Cc: user Subject: Re: Parallelize Kafka Deserialization of a single partition? Then my statement must be wrong. Let me double check this. Yesterday when checking the usage of the objectReuse field, I

Re: Parallelize Kafka Deserialization of a single partition?

2020-02-18 Thread Till Rohrmann
Then my statement must be wrong. Let me double check this. Yesterday when checking the usage of the objectReuse field, I could only see it in the batch operators. I'll get back to you. Cheers, Till On Wed, Feb 19, 2020, 07:05 Jin Yi wrote: > Hi Till, > I just read your comment: > Currently,

Re: Parallelize Kafka Deserialization of a single partition?

2020-02-18 Thread Jin Yi
Hi Till, I just read your comment: Currently, enabling object reuse via ExecutionConfig.enableObjectReuse() only affects the DataSet API. DataStream programs will always do defensive copies. There is a FLIP to improve this behaviour [1]. I have an application that is written in apache beam, but

Re: Parallelize Kafka Deserialization of a single partition?

2020-02-18 Thread Till Rohrmann
Hi Theo, the KafkaDeserializationSchema does not allow to return asynchronous results. Hence, Flink will always wait until KafkaDeserializationSchema.deserialize returns the parsed value. Consequently, the only way I can think of to offload the complex parsing logic would be to do it in a

Parallelize Kafka Deserialization of a single partition?

2020-02-17 Thread Theo Diefenthal
Hi, As for most pipelines, our flink pipeline start with parsing source kafka events into POJOs. We perform this step within a KafkaDeserizationSchema so that we properly extract the event itme timestamp for the downstream Timestamp-Assigner. Now it turned out that parsing is currently the