Hi, We have a topology which consumes data generated by multiple applications via Kafka. The data for one application is aggregated in a single bolt task using fieldsgrouping. All applications push data at different rates so some executors of the bolt are busier/overloaded than others and capacity distribution is non-uniform.
The problem we're facing now is that when there's a spike in data produced by one (or more applications), capacity goes up for that executor, we see frequent gc pauses and eventually the corresponding jvm crashes causing worker restarts. As an ideal solution, we want to slow down only the application(s) which cause the spike. We cannot use the built in backpressure here because it happens at the spout level and slows down the entire pipeline. What are your thoughts on this? How can we fix this? Thanks