Hello, I will be working on SMP support in the coming months. I caught up on this and the previous SMP-related squid-dev thread and it looks like the approach I currently favor has been discussed (again) and did not cause any violent objections, although some ideas were much more ambitious. I am not sure how we can reach consensus on this topic, but I will through in some specifics in hope to better identify competing approaches.
My short-term focus would be on the following three areas: A) Identifying a few "large", "rarely-interacting" threads that would work reasonably well on an 8-core 2-CPU machine with 8 http_ports. This should take the lessons learned from existing SMP designs into account, with Squid specifics in mind. Henrik, Amos, and Adrian started discussing this already. B) Making commonly used primitives thread-safe (mostly not in terms of locking their shared state but in terms of not using static/shared data that needs locking). Many posts on this subject, starting with Roberts advice to desynchronize. C) Posting performance benchmarking results for single- and multi-instance Squids on mutli-core systems as a baseline. My mid-term focus will probably be on sharing http_port, memory cache, disk cache and possibly loggin/stats among a "few large threads". My overall goal is to at least approach the performance of a multi-instance caching Squid on 8-core hardware. I am not excited by the "one thread per message", "one thread per AsyncJob", or similar "many tiny threads" designs because, IMO, they would require too much rewriting to be implemented properly. This may need to be re-evaluated as the world moves towards 1000-core systems, but a lot of improvements necessary for the "few large threads" design will not be wasted anyway. I hope that by focusing on a "few large threads" design and fixing primitives we can gain "enough" SMP benefits in a few months of active development. If you think there is a better way to get SMP benefits in the foreseeable future, please post. Thank you, Alex.