Thanks for reply Nathan.

But our case is that the rule evaluation depends on field grouping and each
rule might have different grouping condition.

/wind
2015年9月12日 下午11:01,"Nathan Leung" <ncle...@gmail.com>写道:

> I would try making lots of bolt 2 in parallel and have each bolt 2 process
> the full list of rules.
> On Sep 12, 2015 10:37 AM, "Hong Wind" <hzyw...@gmail.com> wrote:
>
>> Hi all again,
>>
>> Since no reply I guess the question might be not described clearly
>> enough.
>> So let me try to elaborate it ...
>>
>> Performance requirements for our system is 100K TPS with latency must be
>> < 500 ms.
>> And the system needs to evaluate some rules for each incoming message but
>> we don't know how many rules it will have in the system as they can be
>> dynamically added by the users when it is running.
>>
>> So at the beginning we designed a topology as below to dispatch message
>> to rule evaluation bolts based on count of rules.
>>
>> *Spout -> Bolt#1(Dispatch) -> Bolt#2(Evaluation)*
>>
>> For example if there are 10 rules in DB then Bolt#1 emits 10 message to
>> Bolt#2 tasks for evaluation. We thought that would take the advantages of
>> concurrent processing when we have enough task instances for Bolt#2.
>>
>> We tried 1 rule case and it perfectly reaches 125K TPS and 80ms latency
>> with 4 nodes.
>> But it failed to handle 2 rules. Memory is consumed very fast, a lot of
>> full GC happened and finally the workers are down.
>> We tried setting memory to 4G. Its better but still could not handle 3
>> rules. And we also did a memory dump which shows that memory eat up by
>> TaskMessage instances.
>>
>> We guess it is a topology design problem because it gets 2 times more
>> messages emitted from dispatcher bolt for 2 rules. And 3 times for 3 rules.
>> It will definitely not work in production where we have no idea how many
>> rules will be created.
>>
>> We tried several changes and come to the updated version as below:
>>
>> *Spout -> Bolt#1(Dispatch) -> Bolt#2(Evaluation) --> Bolt#2 --> Bolt#2
>> --> Bolt#2 --> Bolt#2 --> ...*
>>
>> The difference is that in new topology bolt#1 (dispatcher bolt) emits
>> only one message that includes a list of rule IDs to the Bolt#2 (evaluation
>> bolt). As soon as the Bolt#2 receives the message, it picks up the first
>> rule ID from the list and immediately emits the rest in a message to
>> another Bolt#2 (maybe itself since it is based on field grouping) before it
>> starts to evaluate the 1st rule. Then that bolt#2 does the same, until the
>> rule list is empty.
>>
>> This topology solves the memory issue very well because it can support
>> 200 rules with 3G memory although the throughput and latency is not good
>> but we believe it can be resolved by adding more nodes into the cluster.
>>
>> Although it seems to solve the issue but we are not 100% sure why.. We
>> want to figure it out to make sure what we did is right.
>>
>> We guess the first topology causes a message explosion at the dispatcher
>> bolt when there are more than one rule in the system and then the message
>> stuck in the queue or somewhere and cannot be consumed immediately and
>> finally eat up the memory.
>>
>> But in the 2nd topology, the load of dispatching rules is distributed to
>> the rule evaluation bolt instances (in different workers and nodes) so it
>> does not cause the explosion. We trade off the latency to memory.
>>
>> Is the thinking correct? Or is it a correct solve?
>>
>>
>> Any idea is welcome and appreciated.
>>
>> Thanks
>> BR/Wind
>>
>>
>>
>>
>> On Wed, Sep 9, 2015 at 6:24 PM, Hong Wind <hzyw...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am doing a performance benchmark for our topology which is to evaluate
>>> a set of rules. The topology is as below:
>>>
>>> Spout -> Bolt#1(Dispatch) -> Bolt#2(Evaluation)
>>>
>>> The bolt#1 load the rules from DB and dispatch them to bolt#2 for
>>> evaluation. One bolt#2 task evaluates one rule. So how many emits from
>>> bolt#1 depend on how many rules we have.
>>>
>>> When we have 1 rule, it is no problem with 2G memory.
>>>
>>> But when we increase to 2 rules, memory are consumed very fast and
>>> finally the works are down even we set memory to 3G. The dump shows that we
>>> have too many TaskMessage instances in hashmap.
>>>
>>> Then we tried many fix and come to change the topology to:
>>>
>>> Spout -> Bolt#1(Dispatch) -> Bolt#2(Evaluate Rule1) -> Bolt#2(Evaluate
>>> Rule2) -> ... -> Bolt#2(Evaluate RuleN)
>>>
>>> With this topology, when we have N rules, bolt#1 only emits one message
>>> (Rule#1, Rule#2, ... Rule#N) to bolt#2. Then bolt#2 evaluates the Rule#1
>>> and emit message (Rule#2, ... Rule#N) to bolt#2 again. So how deep the
>>> bolt#2 chain is depends on the count of rules.
>>>
>>> Then the memory issue disappears even we have 200 rules.
>>>
>>> So the question is why?
>>> As the total number of TaskMessage are same.
>>>
>>> Thanks a lot
>>> BR/Wind
>>>
>>>
>>>
>>>
>>>
>>>
>>

Reply via email to