On Sat, 1 May 2021, Yakov wrote:
On this sample using macros speeds the program up 400%
Be that as it may, it's not representative of most application. For instance, cpython's performance increases by only 10-15% with the inliner turned on.
(And actually that's misleading, because inlining enables many other optimizations. The impact for tcc if it _only_ added inlining would probably be much less. Unfortunately gcc doesn't seem to be willing to inline at -O0.)
I have recently read a paper about a Linear Scan Register Allocator[1], they claim it gives you 95% performance or Graph Coloring Register Allocator in basically no time, and requires no SSA.
Yes, linear scan is quite nice. It's not really compatible with tcc's compilation model--nor are most other optimizations, including inlining--but I mentioned it because it's probably the most worthwhile optimization a compiler can perform and it's not too difficult.
In the context of a compiler like gcc or llvm, linear scan takes almost no time at all. However it depends on a certain model of code that tcc does not provide currently. Gcc already produces such a model, even without optimizations, and linear scan takes advantage of the information which is already there; a big part of the reason why tcc is so fast is that it produces no such model. For gcc this is a sunk cost; for tcc, not.
-E _______________________________________________ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel