Hi Stefano,
On 06/09/2016 19:51, Stefano Stabellini wrote:
On Tue, 6 Sep 2016, Julien Grall wrote:
I was asking myself the same question
It is not trivial. On ARMv7, there is no way to invalidate by IPA, so we still
need to do a full flush.
In the case of ARMv8, it is possible to do a flush by IPA with the following
sequence (see D4-1739 in ARM DDI 0487A.i):
tlbi ipa2e1, xt
dsb
tlbi vmalle1
So I was wondering if we could leave that for a future optimization.
We can leave it for now but I have an hunch that it is going to have a
pretty significant impact.
In theory, the current approach will have an impact on platform where
the TLBs are caching separately stage-1 and stage-2 translation.
This will not be the case if the TLBs are caching stage-1 and stage-2 in
a single entry.
However, on ARMv7 it will not be possible to minimize the impact.
But to be honest, most of the time, the p2m will be modified via
guest_physmap_add_entry and guest_physmap_remove_page. Both are often
called with order = 0 or order = 6 (for domain use 64KB page granularity).
Taken aside domains using 64KB page granularity, the number of TLB
flushs will be either 1 or 2 depending whether you need to shatter a
superpage. Which is the same as today.
In the case of 64KB domain, the number of TLB flush will be higher if
the domain is replacing existing mapping (1 TLB flush per 4KB-entry).
But this is already a programming issue given that in this case the
underlying memory (if RAM) will not be freed until the domain is
destroyed. In general a domain should remove a mapping before creating a
new one at the same address.
I have done some testing and noticed that DOMU p2m will not be often
modified. In the case of DOM0, this will mostly happen when a domain is
created (you have to map the new domain memory). Although, the number of
flushes should be the same given dom0 will use balloon page (i.e the
stage-2 mapping does not exist).
I have some ideas on how to optimize a bit more the code, but they are
heavy and will be hard to maintain. I would prefer to defer it until
user come with use case where the performance hit is too much.
+ if ( !(mask & ((1UL << FIRST_ORDER) - 1)) )
+ order = FIRST_ORDER;
+ else if ( !(mask & ((1UL << SECOND_ORDER) - 1)) )
+ order = SECOND_ORDER;
+ else
+ order = THIRD_ORDER;
+
+ rc = __p2m_set_entry(p2m, sgfn, order, smfn, t, a);
+ if ( rc )
+ break;
Shouldn't we be checking that "(1<<order)" doesn't exceed "todo" before
calling __p2m_set_entry? Otherwise we risk creating a mapping bigger
than requested.
The mask is defined as gfn_x(sgfn) | mfn_x(smfn) | todo
So we will never have the order exceeding "todo".
Ah, I see. But this way we are not able to do superpage mappings for
regions larger than 2MB but not multiple of 2MB (3MB for example). But I
don't think we were able to do it before either so it is not a
requirement for the patch.
From my understanding of is_mapping_aligned, the case you mentioned is
handled.
The function p2m_set_entry is using the number of page because some
caller (such as MMIO ones) are passing a number of page. However, the
main callers are guest_physmap_remove_page and guest_physmap_add_entry.
They take an order in parameter.
So it would be a nice feature to have, but I don't think this is
strictly necessary.
Regards,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel