On 12/11/2021 09:36, Julien Grall wrote:
On 11/11/2021 17:57, Andrew Cooper wrote:
Retpolines are expensive, and all these do are select between the
sync and
nosync helpers. Pass a boolean instead, and use direct calls
everywhere.
To be honest, I much prefer to read the old code. I am totally not
against the change but I can see how I would be ready to introduce new
function pointers use in the future.
Really? The only reason there are function points to begin with was
because of a far more naive (and far pre-spectre) me fixing a reference
counting mess in 2014 by consolidating the logic. My mistake was not
spotting that the function pointers weren't actually necessary in the
first place.
So I think we need some guidelines on when to use function pointers in
Xen.
It's easy. If you are in any way unsure, they're probably the wrong
answer. (Ok - I'm being a little facetious)
There are concrete security improvements from not using function
pointers, demonstrated by fact that JOP/COP attacks are so pervasive
that all major hardware and software vendors are working on techniques
(both hardware and software) to prevent forward-edge control flow
integrity violations. (The mandate from the NSA to make this happen
certainly helped spur things on, too.)
There are also concrete performance improvements too. All competitive
processors in the market today can cope with direct branches more
efficiently than indirect branches, and a key principle behind
profile-guided-optimsiation is to rearrange your `ptr()` function
pointer call into `if ( ptr == a ) a(); else if ( ptr == b ) b(); else
ptr()` based on the frequency of destinations, because this really does
make orders of magnitude improvements in some cases.
We have some shockingly inappropriate uses of function pointers in Xen
right now (patches 4 and 5 in particular, and "x86/hvm: Remove callback
from paging->flush_tlb() hook" posted today). While this specific
example doesn't fall into shockingly inappropriate in my books, it is
firmly in the "not appropriate" category.
The more...
Pause/unpause operations on behalf of dom0 are not fastpaths, so avoid
exposing the __domain_pause_by_systemcontroller() internal.
This actually compiles smaller than before:
... the code doesn't really compile smaller on Arm:
42sh> ../scripts/bloat-o-meter xen-syms-old xen-syms
add/remove: 4/2 grow/shrink: 0/6 up/down: 272/-252 (20)
Function old new delta
_domain_pause - 136 +136
_domain_pause_by_systemcontroller - 120 +120
domain_pause_by_systemcontroller_nosync - 8 +8
domain_pause_by_systemcontroller - 8 +8
domain_resume 136 132 -4
domain_pause_nosync 12 8 -4
domain_pause 12 8 -4
domain_pause_except_self 188 180 -8
do_domctl 5480 5472 -8
domain_kill 372 356 -16
do_domain_pause 88 - -88
__domain_pause_by_systemcontroller 120 - -120
Total: Before=966919, After=966939, chg +0.00%
ARM, like x86, compiles for speed, not size. "it got a bit larger" is
generally not as interesting as "it got smaller, despite everything the
compiler would normally do in the opposite direction". Obviously, if
the build of Xen got 10x larger then we'd have a problem, but it hasn't.
Conversely, if we were compiling for size not speed, then "it got
smaller" is the uninteresting direction.
The truth is that Xen (both x86 and ARM) is a couple of megabytes in RAM
and this literally doesn't matter these days where RAM is measured in GB
and TB. We care to an extent for efficient use of the cache/etc, but
noone would bat an eyelid at several MB more for a want-to-have feature.
Here, you're saying that for an added 5 instructions, totalling 0.00%
delta in the size of the hypervisor, you've got some reasonably frequent
codepaths which will execute faster, and cannot be hijacked with a
JOP/COP attack.
That's a clear all-around improvement on ARM, just like it is on x86.
~Andrew