Each separate job would have its own consumer group hence they will read
independently from the same topic and when checkpointing they will commit
their own offsets.
So if any job fails, it will not affect the progress of other jobs when
reading from Kafka.

I am not sure of the impact of network load when multiple consumer groups
are requesting data from the same topic.

Multiple small jobs ensure that each job is scaled and monitored in an
isolated way.

Having an efficient serde can help a lot of the data we store in state,
data forwarded to next steps and overall state management.

Another thing you can look into is if your job step is keyed by some key,
then make sure they are keyed as a string or any other Java primitive types
since Object keys are much slower when reading from and writing to a state
store.

Thanks
Sachin


On Wed, May 15, 2024 at 7:58 AM longfeng Xu <xulongfeng2...@gmail.com>
wrote:

> Thank you . we will try .
>
> I‘m still confused about multiple jobs on a cluster (flink-session-yarn)
> reading the same topic from kafka cluster, I understand that in this mode,
> the number of times reading the topic has not decreased; it just shares the
> TCP channel of the task manager, reducing the network load. Is my
> understanding correct?
>
> Or are there any other advantages to it? Please advise. Thank you.
>
> Sachin Mittal <sjmit...@gmail.com> 于2024年5月15日周三 09:24写道:
>
>> We have the same scenario.
>> We thought of having one big job with multiple branches but this leads to
>> single point of failure as any issue with any branch would lead to the job
>> failure and also all the sub branches would stop processing.
>>
>> Hence running multiple jobs on a cluster say yarn is better.
>>
>> Now to overcome serde issue try to use some of the more efficient schemes
>> as recommended by flink. We are using POJO and it has yielded good results
>> for us.
>>
>>
>> On Wed, 15 May 2024 at 5:59 AM, longfeng Xu <xulongfeng2...@gmail.com>
>> wrote:
>>
>>> hi
>>>   there are many flink jobs read one kafka topic in this scenario,
>>> therefore CPU resources waste in  serialization/deserialization and
>>> network  load is too heavy . Can you recommend a solution to avoid this
>>> situation? e.g it can be more effectively using one large stream job with
>>> multi branchs ?
>>>
>>>  Best regards,
>>>
>>>

Reply via email to