29.08.2017, 14:44, "Alicia Boya García" <ab...@igalia.com>: > On 08/29/2017 06:20 AM, Daniel Bates wrote: > >> Do we know what is the cause(es) for the slow clean builds? I am assuming >> that much of the speed up from the "unified source" comes from clang being >> able to use an in-memory #include cache to avoid disk I/O and re-parsing of >> a seen header. Have we exhausted all efforts (or have we given up?) removing >> extraneous #includes? Do we think this pursuing this effort would be more >> time consuming or have results that pale in comparison to the "unified >> source" approach? > > Whilst having an in-process-memory #include cache is not a bad thing, > it's not the greatest gain as the operating systems should already cache > file reads just fine.
From my experience, Windows OS is particularly bad at this, even when given quite enough memory for caching. > > The greatest gain comes from reducing the amount of times C++ headers > are parsed. If you are building a certain .cpp file and include a .h > file, the compiler has to parse it, which can take quite a bit because > C++ is really complex monster, especially when templates are used. Doing > this more than necessary raises build times really quickly. > > Header files almost always are include-guarded (either with #pragma once > or traditional #ifndef guards), so including the same header twice > within the same .cpp file (or any of its included files) has no cost. On > the other hand, if you then start building a different .cpp file that > also includes the same header, you have to parse it again because so far > C++ is concerned, every inclusion could add different symbols to the AST > the compiler is building, so their output can't be reused*. In turn we > end up parsing most headers many more times than actually needed (i.e. > for every .cpp file that includes Node.h the compiler has to parse > Node.h and all its dependencies from source, that's a lot of wasted > effort!). > > *Note that including twice the same .h within the same .cpp file is not > fast because the output is cached in anyway, but because the entire .h > file is skipped, adding no additional nodes to the AST. > > The goal of C++ modules is to fix this problem from its root cause: > instead of literally including text header files, .cpp files declare > dependencies on module files that can be compiled, stored, loaded and > referenced from .cpp files in any order, so you would only parse the > Node module source code once for the entire project, whilst the compiler > could load directly the AST every other time from a cached module object > file. > > Note the great advantage of modules comes from the fact that they can be > imported in different contexts and their content is still semantically > equivalent, whereas with plain C/C++ includes every header file may act > differently depending on the preprocessor variables defined by the > includer and other previous inclusions. In the worst case, when they are > not include-guarded (luckily this is not too common, but it still > happens), even including the same file in the same .cpp twice could add > different symbols each time! > > Unfortunately C++ modules are a work in progress... There are two > different competing proposals with implementations, one from Clang and > another one from Microsoft, and the C++ technical specification is in a > very early stage too: > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4681.pdf > > We know for sure modules are very important for the future of C++, but > maybe it's still a bit too early to bet a big project like WebKit on them. > > So how else can we avoid parsing the same header files so many times and > speed up our builds? Enter unified builds. > > A requirement for unified builds to work correctly is that header files > are coded in such a way they work as independent units, much like C++ > modules, i.e. including headers should work no matter in what order you > place them, and in each case they must define the same symbols. On July > 31 I wrote about some issues we currently have because of not doing > exactly this in WebKit (particularly, our #include "config.h" lines are > ambiguous). They can be worked around so they will not become blockers > for unified builds, but I still think we should fix them at some point. > > Once you have a set of .cpp files whose includes all (1) are guarded > (e.g. by #pragma once) and (2) are independent units according to the > above rule, you can take advantage of unified builds: > > Instead of invoking once the compiler for each .cpp file, you create a > new artificial "unified" or "bundle" .cpp file that concatenates (or > #include's) a number of different .cpp files. This way, headers included > within the bundle are parsed only once, even if they are used by > different individual .cpp files, as long as they are within the same > bundle. This can often result in a massive build speed gain. > > Unfortunately, there are some pitfalls, as there is a dissonance between > what the programmer thinks the compilation unit is (the individual .cpp > file) and the actual compilation unit used by the compiler (the bundle > .cpp file). > > * `static` variables and functions are intended to be scoped to the > individual .cpp file, but the compiler has no way to know this and > instead are scoped to the bundle. This can lead to non-intuitive > name-clashes that we are trying to avoid (e.g. with `namespace FILENAME`). > > * Header files that don't work as independent units as they should may > still work somehow or may fail in hard to diagnose ways. > > * Editing a .cpp file that is part of a bundle will trigger > recompilation of the entire bundle. This makes changes to small, > independent files slower the more files are grouped per bundle. > > * Similar to the issue before, editing a .h depended by a .cpp file will > trigger recompilation of the entire bundle, not the individual file. > > It's desirable to bundle .cpp files in a way that minimizes the impact > of the last two issues (e.g. by bundling by features, changing a header > used by all files implementing that feature may trigger the > recompilation of that single feature bundle rather than many scattered > bundles containing a few .cpp files each using that feature). > > Even with these issues, editing files depended by many will usually > become much more faster than before, because although more individual > .cpp files will be rebuilt, the number of actual compilation units > (bundles) will be much lower and so will be the number of times header > files are re-parsed. > > Compared to modules, unified builds are really a dirty hack. Modules > don't have any of these issues: they are potentially faster and more > reliable. If only they existed now as an standard rather than some > experimental implementation with uncertain tooling, we would > definitively use them. > > In absence of modules, unified builds still provide really good speedups > for our dime. > > -- Alicia > _______________________________________________ > webkit-dev mailing list > webkit-dev@lists.webkit.org > https://lists.webkit.org/mailman/listinfo/webkit-dev -- Regards, Konstantin _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev