First, I must address several questions regarding your motivation for this RIP:

1. Firstly, DLedger has the capability of snapshot (although initially it was 
only used for Commitlog and lacked this feature, it has now been supplemented). 
RIP-59 has also started integrating this capability. Additionally, by writing 
an empty log through raft, dledger can achieve linear consistency semantics, 
and it has also been tested by Jepsen. You can find these details in the latest 
code and release notes: https://github.com/openmessaging/dledger/releases.

2. Regarding the dissatisfaction with DLedger Controller's lack of linear 
consistency and resulting ABA-like issues, I want to clarify that this problem 
will not occur. DLedger Controller executes requests in a linear manner, all 
requests enter the EventQueue and are processed one by one by a single thread. 
Therefore, one request will not be executed before another request is 
completed. Furthermore, as DLedger Controller currently only manages 
broker-level metadata (not queue-level), its performance is sufficient.




3. The issue you mentioned in https://github.com/apache/rocketmq/pull/4442 
mainly involves writing an empty log to ensure that all previous logs have been 
applied before the controller becomes the leader. This is not directly related 
to replacing dledger implementation.


From these perspectives, I believe your motivation is not well-founded.

On the other hand, perhaps you think jraft is more widely adopted and has more 
practical experience, making it appear more trustworthy (I can understand that, 
although in my opinion, DLedger is stable and maintainable enough). However, 
replacing the implementation is a significant task. We have already conducted 
extensive testing to validate the stability of DLedger Controller. If we need 
to implement the controller using jraft, I believe it also requires further 
verification. In your test report, the following content can be improved:

1. The report does not specify which components were affected by each failure 
injection, making it impossible to determine the expected availability of these 
components. Besides data loss, we should also focus on the recovery of 
availability during failures.

2. The scenarios and samples provided are too few to sufficiently prove 
stability.

Due to the importance of this module, I believe it needs more testing and 
review. If possible, I am willing willing to be the shepherd of this RIP and 
work together to further improve the high availability of RocketMQ.


At 2023-09-14 11:50:37, "fuyou" <fuyou...@gmail.com> wrote:
>Thank you for your suggestion to propose a RIP .
>
>This denotes a significant transformation that mandates a more meticulous
>appraisal. The pair of concerns expounded within RIP-67 do harbor a measure
>of jeopardy, albeit insufficient to impede the overarching steadiness. It
>is imperative for us to plunge further into the indispensability of RIP-67.
>
>王海涛-浙江大学 <wanghaitao0...@qq.com.invalid> 于2023年9月14日周四 10:25写道:
>
>> Hi RocketMQ Community:&nbsp;
>>
>>
>> We re implemented the Controller using JRaft, fixed the issue of linear
>> inconsistency, and relied on JRaft's snapshot function to achieve log
>> truncation, avoiding infinite growth of Raft logs.&nbsp; &nbsp; In the
>> scenario of millions of topics, frequent persistence generates the large
>> memory object jsonString of the topicConfigTable. When the memory is tight,
>> the large memory object jsonString will be directly allocated to the old
>> generation, resulting in frequent Full GC.
>>
>>
>> We have already done part of the work. Our proposals are provided at the
>> links below:
>>
>>
>>
>> https://docs.google.com/document/d/1mpzTv1vnWxQwPGsHj6Ng2fK9aL9f6MZFw7ZgvW5284o/edit?usp=sharing
>>
>>
>>
>> Please welcome to reply to this email or comment on the proposal if you
>> have any questions or suggestions.&nbsp; &nbsp;
>>
>>
>>
>>
>> Thanks,
>> HaitaoWang
>
>
>
>-- 
>   =============================================
>
>  fuyou001
>Best Regards

Reply via email to