First, I must address several questions regarding your motivation for this RIP:
1. Firstly, DLedger has the capability of snapshot (although initially it was only used for Commitlog and lacked this feature, it has now been supplemented). RIP-59 has also started integrating this capability. Additionally, by writing an empty log through raft, dledger can achieve linear consistency semantics, and it has also been tested by Jepsen. You can find these details in the latest code and release notes: https://github.com/openmessaging/dledger/releases. 2. Regarding the dissatisfaction with DLedger Controller's lack of linear consistency and resulting ABA-like issues, I want to clarify that this problem will not occur. DLedger Controller executes requests in a linear manner, all requests enter the EventQueue and are processed one by one by a single thread. Therefore, one request will not be executed before another request is completed. Furthermore, as DLedger Controller currently only manages broker-level metadata (not queue-level), its performance is sufficient. 3. The issue you mentioned in https://github.com/apache/rocketmq/pull/4442 mainly involves writing an empty log to ensure that all previous logs have been applied before the controller becomes the leader. This is not directly related to replacing dledger implementation. From these perspectives, I believe your motivation is not well-founded. On the other hand, perhaps you think jraft is more widely adopted and has more practical experience, making it appear more trustworthy (I can understand that, although in my opinion, DLedger is stable and maintainable enough). However, replacing the implementation is a significant task. We have already conducted extensive testing to validate the stability of DLedger Controller. If we need to implement the controller using jraft, I believe it also requires further verification. In your test report, the following content can be improved: 1. The report does not specify which components were affected by each failure injection, making it impossible to determine the expected availability of these components. Besides data loss, we should also focus on the recovery of availability during failures. 2. The scenarios and samples provided are too few to sufficiently prove stability. Due to the importance of this module, I believe it needs more testing and review. If possible, I am willing willing to be the shepherd of this RIP and work together to further improve the high availability of RocketMQ. At 2023-09-14 11:50:37, "fuyou" <fuyou...@gmail.com> wrote: >Thank you for your suggestion to propose a RIP . > >This denotes a significant transformation that mandates a more meticulous >appraisal. The pair of concerns expounded within RIP-67 do harbor a measure >of jeopardy, albeit insufficient to impede the overarching steadiness. It >is imperative for us to plunge further into the indispensability of RIP-67. > >王海涛-浙江大学 <wanghaitao0...@qq.com.invalid> 于2023年9月14日周四 10:25写道: > >> Hi RocketMQ Community: >> >> >> We re implemented the Controller using JRaft, fixed the issue of linear >> inconsistency, and relied on JRaft's snapshot function to achieve log >> truncation, avoiding infinite growth of Raft logs. In the >> scenario of millions of topics, frequent persistence generates the large >> memory object jsonString of the topicConfigTable. When the memory is tight, >> the large memory object jsonString will be directly allocated to the old >> generation, resulting in frequent Full GC. >> >> >> We have already done part of the work. Our proposals are provided at the >> links below: >> >> >> >> https://docs.google.com/document/d/1mpzTv1vnWxQwPGsHj6Ng2fK9aL9f6MZFw7ZgvW5284o/edit?usp=sharing >> >> >> >> Please welcome to reply to this email or comment on the proposal if you >> have any questions or suggestions. >> >> >> >> >> Thanks, >> HaitaoWang > > > >-- > ============================================= > > fuyou001 >Best Regards