Hi,

We have a topology which consumes data generated by multiple applications
via Kafka. The data for one application is aggregated in a single bolt task
using fieldsgrouping. All applications push data at different rates so some
executors of the bolt are busier/overloaded than others and capacity
distribution is non-uniform.

The problem we're facing now is that when there's a spike in data produced
by one (or more applications), capacity goes up for that executor, we see
frequent gc pauses and eventually the corresponding jvm crashes causing
worker restarts.

As an ideal solution, we want to slow down only the application(s) which
cause the spike. We cannot use the built in backpressure here because it
happens at the spout level and slows down the entire pipeline.

What are your thoughts on this? How can we fix this?

Thanks

Reply via email to