devkanro opened a new issue #4282: Collector base on async stack URL: https://github.com/apache/skywalking/issues/4282 ## Theme This issue discusses async segment collectors. ## Problem I found there is bad throughput about OAP server collect tracing. I have tried to use skywalking in my production env, but there are too many segments to collect. And I scale the OAP server to 8c16gx8 instances, there are also too many errors about collecting(gRPC client canceled.) The CPU looks fine about OAP server and ES server. Maybe separate collectors from OAP server is a good idea? Agents are not calling OAP server to collect spans directly, use the MessageQueue or log collector to do it, and make collectors consume those messages or log, one collector maybe 1c2g/2c4g, we can have many collectors with low cost. ## Opinion I can enumerate many advantages of it. 01. Improve the performance of agents Agents do not need the result from collectors, they just write and write again. 02. Elastic scale OAP server is heavy and expensive, it at least needs 4c8g. But one collector just needs 1c2g or smaller, because it just needs to handle one segment at a time. 03. Delay for stability and safety When collectors can't support agents, the message queue will keep the data, it will not cause data loss, just some delay. 04. Throughput base on collector numbers If you have more collectors, you will have more message throughput.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services