On 16/03/2017 3:43 a.m., Adam Majer wrote: > On 03/15/2017 03:17 PM, Amos Jeffries wrote: >> Theoretically range-for loops should allow multi-threaded CPU to run >> those loops a bit faster. If that can be demonstrated using a tool like >> polygraph you have a good argument for a patch containing that change to >> go in as a pure performance change. > > No. That would break many many things. There are special directives that > allow this to happen with things like OpenMPI compilers, but that's not > what we are talking about here. > > And theoretically, if you blindly allow compilers to optimize loops like > that, you are just as likely to introduce hardware stalls that will > result in slower execution of the overall loop. The only way to look at > these, > > for (TYPE _i : _c ) > > is syntactic sugar. > > > Best regards, > - Adam > > PS. And if you are talking about vertorization of these loops, that > already happens with regular loops. See, > > https://gcc.gnu.org/projects/tree-ssa/vectorization.html
I mean tricks like compiler with CPU-specific knowledge being able to emit assembly that helps pre-fetch the address pointers for all objects in the container, and/or if it can prove the objects are read-only can have hyper-threads pre-load the container contents into L1/L2 cache in time for the main thread to run the business logic faster without much loading delays. AFAIK the range-for does allow certain code flow guarantees (like full-length container iteration) being known without any analysis. So not completely syntactic sugar. Yes compiler could do the same with traditional loops, but only after extra analysis which might be turned off. I'm not sure if thus would have any visible effect at all. We might be unlucky in that these things are not supportable, or the data sizes Squid handles blow the benefits away. Thus the request for proof. Amos _______________________________________________ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev