>>>>> "JM" == Jean-Yves Migeon <jeanyves.mig...@free.fr> writes:

    JM> On 21.08.2011 12:26, Jean-Yves Migeon wrote:
    >> - second, the lock is not placed at the correct abstraction level
    >> IMHO, it is way too high in the caller/callee hierarchy. It
    >> should remain hidden from most consumers of the xpq_queue API,
    >> and should only be used to protect the xpq_queue array together
    >> with its counters (and everything that isn't safe for all memory
    >> operations happening in xpq).
    >> 

I agree - part of the reason I did this was to make sure I didn't mess
with the current queueing scheme before having a working
implementation/PoC.

    >> Reason behind this is that your lock protects calls to hypervisor
    >> MMU operations, which are hypercalls (hence, a "slow" operation
    >> with regard to kernel). You are serializing lots of memory
    >> operations, something that should not happen from a performance
    >> point of view (some may take a faire amount of cycles to
    >> complete, like TLB flushes). I'd expect all Xen MMU hypercalls to
    >> be reentrant.

I agree - it's not meant to be an efficient implementation - just a
correct one.

    JM> An alternative would be to have per-CPU xpq_queue[] also. This
    JM> is not completely stupid, xpq_queue is meant as a way to put
    JM> multiple MMU operations in a queue asynchronously before issuing
    JM> only one hypercall to handle them all.

This is slightly more complicated than it appears. Some of the "ops" in
a per-cpu queue may have ordering dependencies with other cpu queues,
and I think this would be hard to express trivially. (an example would
be a pte update on one queue, and reading the same pte read on another
queue - these cases are quite analogous (although completely unrelated)
to classic RAW and other ordering dependencies in out-of-order execution
scenarios due to pipelining, etc.).

I'm thinking that it might be easier and more justifiable to nuke the
current queue scheme and implement shadow page tables, which would fit
more naturally and efficiently with CAS pte updates, etc.

Otherwise, we'll have to re-invent scoreboarding or other dynamic
scheduling tricks - I think that's a bit overkill :-)

Cheers,
-- 
Cherry

Reply via email to