Hello Niklas,

If you can monitor your repartition topic's consumer lag, and it was
increasing consistently, it means your downstream processor cannot simply
keep up with the throughput of the upstream processor. Usually it means
your downstream operators is heavier (e.g. aggregations, joins that are all
stateful) than your upstreams (e.g. simply for shuffling the data to
repartition topics), and since tasks assignment only consider a task as the
smallest unit of work and did not differentiate "heavy" and "light" tasks,
such imbalance of task assignment may happen. At the moment, to resolve
this you should add more resources to make sure the heavy tasks get enough
computational resource assigned (more threads, e.g.).

If your observed consumer lag stays plateau after increasing to some point,
it means your consumer can actually keep up with some constant lag; if you
hit your open file limits before seeing this, it means you either need to
increase your open file limits, OR you can simply increase the segment size
to reduce num. files via "StreamsConfig.TOPIC_PREFIX"to set the value of
TopicConfig.SEGMENT_BYTES_CONFIG.


Guozhang


On Tue, Jan 22, 2019 at 4:38 AM Niklas Lönn <niklas.l...@gmail.com> wrote:

> Hi Kafka Devs & Users,
>
> We recently had an issue where we processed a lot of old data and we
> crashed our brokers due to too many memory mapped files.
> It seems to me that the nature of Kafka / Kafka Streams is a bit
> suboptimal in terms of resource management. (Keeping all files open all the
> time, maybe there should be something managing this more on-demand?)
>
> In the issue I described, the repartition topic was produced very fast,
> but not consumed, causing a lot of segments and files to be open at the
> same time.
>
> I have worked around the issue by making sure I have more threads than
> partitions to force tasks to subscribe to internal topics only, but seems a
> bit hacky and maybe there should be some guidance in documentation if
> considered part of design..
>
> After quite some testing and code reversing it seems that the nature of
> this imbalance lies within how the broker multiplexes the consumed
> topic-partitions.
>
> I have attached a slide that I will present to my team to explain the
> issue in a bit more detail, it might be good to check it out to understand
> the context.
>
> Any thoughts about my findings and concerns?
>
> Kind regards
> Niklas
>


-- 
-- Guozhang

Reply via email to