Hi all,

I am doing a performance benchmark for our topology which is to evaluate a
set of rules. The topology is as below:

Spout -> Bolt#1(Dispatch) -> Bolt#2(Evaluation)

The bolt#1 load the rules from DB and dispatch them to bolt#2 for
evaluation. One bolt#2 task evaluates one rule. So how many emits from
bolt#1 depend on how many rules we have.

When we have 1 rule, it is no problem with 2G memory.

But when we increase to 2 rules, memory are consumed very fast and finally
the works are down even we set memory to 3G. The dump shows that we have
too many TaskMessage instances in hashmap.

Then we tried many fix and come to change the topology to:

Spout -> Bolt#1(Dispatch) -> Bolt#2(Evaluate Rule1) -> Bolt#2(Evaluate
Rule2) -> ... -> Bolt#2(Evaluate RuleN)

With this topology, when we have N rules, bolt#1 only emits one message
(Rule#1, Rule#2, ... Rule#N) to bolt#2. Then bolt#2 evaluates the Rule#1
and emit message (Rule#2, ... Rule#N) to bolt#2 again. So how deep the
bolt#2 chain is depends on the count of rules.

Then the memory issue disappears even we have 200 rules.

So the question is why?
As the total number of TaskMessage are same.

Thanks a lot
BR/Wind

Reply via email to