On Wednesday 28 September 2005 21:25, Young Koh wrote:
> hi,
>
> On 9/28/05, Jeff Dike <[EMAIL PROTECTED]> wrote:
> > On Wed, Sep 28, 2005 at 09:47:41AM -0400, Young Koh wrote:

> > No, just that page is bad.
Again, that page is not bad. There is no page yet for this address, and the 
host won't allocate one for now.
> > Another page could have been dirtied and thus 
> > allocated on the host, and it would be usable.  So, it's not a fatal
> > problem.
Ok, this makes a bit of sense, even if IMHO it doesn't work, I now see your 
point (but I still insist with what said above).

However, even a dirtied page could be "bad", if it has been swapped. If we're 
getting a SIGBUS, it meant that it didn't succeed in freeing any memory.

And, frankly, unless the UML ram file is kept on ramfs (which is RAM-only), it 
can be swapped (both for disk-based filesystem and for tmpfs).
So, I don't think what you suggest could work.
> then, if just that page is bad, shouldn't UML kernel wait until
> another page is usable (or force another page to be swapped out)
We're talking about the host, and if the host is on OOM, you can't help it. 
The best you can do is to reclaim cache memory. But that must be done when the 
host is starting to swap, not at SIGBUS time.

> and 
> allocate the free page? and proceed normal? i may be still confused.

Jeff's point is that once we have dirtied a page, and the host hasn't yet 
swapped it, it could be in memory - and accessing it would work. Having 
dirtied it means we allocated it with (say) kmalloc and then dirtied it.

> Ok, my thought/idea/suggestion is that what if UML uses a TLB-like
> table before it does the address translation?

> i mean, once there is a 
> valid mapping, UML inserts the address mapping(or page mapping) into a
> software TLB. after that, for that userspace address, UML can search
> the software TLB table and use the mapping without calling sigsetjmp()
> and walking through page tables.

> it seems that sigsetjmp() has 
> relatively large overhead, we could reduce some overhead by not
> calling it.

How do you measure it? I'm curious myself - I know there's the possibility to 
use gprof, but I've never used that myself.

Surely the "sig" thing is heavy (some syscalls like sigprocmask() for 
blocking/unblocking signals). Or better, it's the only heavy thing - the rest 
consists only of saving a couple of registers (6, IIRC) in memory, there's no 
interest in optimizing it away (I assume).

But that part will go away - there's a "softints" patch for this (i.e. moving 
the blocking/unblocking to userspace - the signal handler notices the signal 
is stopped and queues the handling via userspace mechanisms). It's at:

http://user-mode-linux.sourceforge.net/patches.html

> but surely the problem is the mapping can go corrupted. 
> for that, UML may invalidate a TLB entry if the corresponding page is
> swapped out or any change is made. what do you think?

In short, it's a really interesting idea.

The kernel (arch-independent) infrastructure for this exists, for managing the 
real TLBs. You already need to invalidate TLB entries when you swap a page and 
such. Reading Documentation/cachetlb.txt is definitely worth the time spent.

And actually, currently that is used to update the host mappings.

Using TLBs to save the page table walk is interesting, especially since that 
would avoid taking a spinlock on SMP (the current implementation of 
maybe_map() doesn't, but it should), and more important because the TLBs would 
likely be hotter than all the page tables, so it would probably fit in the L2 
cache (while, to walk page tables, we're likely going to have a L2 miss - 
they're too big). Hotter means "more likely to be accessed, and thus more 
worth to keep in cache". The cache usage discussion in Reiser4 whitepaper 
(www.namesys.com) is really enlightening on this point.

Don't know if we can optimize the locking on the TLBs, though - we could use 
maybe atomic ops, or have per-processor TLBs (which is the way it's 
implemented in hardware - you get IPIs on flush, though, and on i386 
atomic_read and atomic_set have no additional cost over non-atomic 
counterparts. So maybe shared TLBs are ok - they'd need to be tagged, 
however).

I've not yet thought about an efficient data structure - an array means that 
invalidation checks each entry, and I'd like to avoid that. The other way is 
to empty the TLB on flushing a single entry.

The only problem is when there is a fault on the kernelspace address.
However, we may implement some checking, if setjmp() is still costly:
only kernelspace addresses upper than TASK_SIZE are valid (or something of 
this sort).
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade


        

        
                
___________________________________ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
User-mode-linux-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

Reply via email to