sön 2009-11-22 klockan 00:12 +1300 skrev Amos Jeffries: > I think we can open the doors earlier than after that. I'm happy with an > approach that would see the smaller units of Squid growing in > parallelism to encompass two full cores.
And I have a more careful opinion. Introducing threads in the current Squid core processing is very non-trivial. This due to the relatively high amount of shared data with no access protection. We already have sufficient nightmares from data access synchronization issues in the current non-threaded design, and trying to synchronize access in a threaded operations is many orders of magnitude more complex. The day the code base is cleaned up to the level that one can actually assess what data is being accessed where threads may be a viable discussion, but as things are today it's almost impossible to judge what data will be directly or indirectly accessed by any larger operation. Using threads for micro operations will not help us. The overhead involved in scheduling an operation to a thread is comparably large to most operations we are performing, and if adding to this the amount of synchronization needed to shield the data accessed by that operation then the overhead will in nearly all cases by far out weight the actual processing time of the micro operations only resulting in a net loss of performance. There is some isolated cases I can think of like SSL handshake negotiation where actual processing may be significant, but at the general level I don't see many operations which would be candidates for micro threading. Using threads for isolated things like disk I/O is one thing. The code running in those threads are very very isolated and limited in what it's allowed to do (may only access the data given to them, may NOT allocate new data or look up any other global data), but is still heavily penalized from synchronization overhead. Further the only reason why we have the threaded I/O model is because Posix AIO do not provide a rich enough interface, missing open/close operations which may both block for significant amount of time. So we had to implement our own alternative having open/close operations. If you look closely at the threads I/O code you will see that it goes to quite great lengths to isolate the threads from the main code, with obvious performance drawbacks. The initial code even went much further in isolation, but core changes have over time provided a somewhat more suitable environment for some of those operations. For the same reasons I don't see OpenMP as fitting for the problem scope we have. The strength of OpenMP is to parallize CPU intensive operations of the code where those regions is well defined in what data they access, not to deal with a large scale of concurrent operations with access to unknown amounts of shared data. Trying to thread the Squid core engine is in many ways similar to the problems kernel developers have had to fight in making the OS kernels multithreaded, except that we don't even have threads of execution (the OS developers at least had processes). If trying to do the same with the Squid code then we would need an approach like the following: 1. Create a big Squid main lock, always held except for audited regions known to use more fine grained locking. 2. Set up N threads of executing, all initially fighting for that big main lock in each operation. 3. Gradually work over the code identify areas where that big lock is not needed to be held, transition over to more fine grained locking. Starting at the main loops and work down from there. This is not a path I favor for the Squid code. It's a transition which is larger than the Squid-3 transition, and which have even bigger negative impacts on performance until most of the work have been completed. Another alternative is to start on Squid-4, rewriting the code base completely from scratch starting at a parallel design and then plug in any pieces that can be rescued from earlier Squid generations if any. But for obvious staffing reasons this is an approach I do not recommend in this project. It's effectively starting another project, with very little shared with the Squid we have today. For these reasons I am more in favor for multi-process approaches. The amount of work needed for making Squid multi-process capable is fairly limited and mainly circulates around the cache index and a couple of other areas that need to be shared for proper operation. We can fully parallelize Squid today at process level if disabling persistent shared cache + digest auth, and this is done by many users already. Squid-2 can even do it on the same http_port, letting the OS schedule connections to the available Squid processes. Regards Henrik
