[trimmed cc list a bit] On Sun Jul 11 2010 at 20:52:32 +0200, Jean-Yves Migeon wrote: > On 11.07.2010 17:46, Antti Kantee wrote: > > "perform"? Are you using that term for execution speed, or was it > > accidentally bundled with the rest of the paragraph? > > execution speed (could be incorrect wording, I am no native speaker)
Why are you worried about execution speed? My hypothesis is that it's not going to be slower and without benchmarks to show otherwise I'd not worry about it. The main difference is that instead of switching to dom0, you switch to a part of dom0. *If* there is need for interaction between the different partitions of dom0, it should be done by replication instead of a ping-pong mesh of requests. Yes, everything should be implementated as distributed by default as I've talked about e.g. in my AsiaBSDCon and EuroBSDCon 2008 presentations. > >> I think he was referring to using a rump kernel as a "syscall proxy > >> server" rather than having in-kernel virtualization like jails/zones. > >> > >> That would make sense, you already have proxy-like feature with rump. > > > > I'm not so sure. That would require a lot of "kernel help" to make > > everything work correctly. > > What kernel? rump or "host"? host > Per see, most of the syscalls would go to the proxy, only "privileged" > operations like memory allocations/device multiplexing would need > special handling by the host kernel. It's not that simple if you want to run largely unmodified kernel code against arbitrary processes. Consider e.g. code that wants to call some pmap() operation. You're so deep into execution that the call to pmap itself does not convey any semantic information and you can't remedy the situation by doing a host kernel request from the syscall server. Luckily the server doesn't have to exist at the syscall layer, but the subsystem layer. We already have examples like puffs and pud. They "workaround" the above problem by using a better-defined semantic layer. Even so, there are some little things that don't quite work correctly. For example, rump_nfs does not properly do the "modified" thing that the nfs client does by pmap_protecting pages and marking them as modified in the fault handler. Yea, fixing that has been on my todo-list for quite a while, but nobody has complained and there are more pressing matters .... (I had support for something like that in puffs in 2006, but since there were no users back then it kinda bitrotted away). Anyway, the solution as usual is to work the problem from both ends (improve the server methods and the kernel drivers) and perform a meet-in-the-middle attack at the sweet spot where nothing is lost and everything is gained. The cool thing about working on NetBSD is that we can actually do these things properly instead of bolting some hacks on top of a black-magic-box we're not allowed to touch. > > The first example is mmap: you run into it > > pretty fast when you start work on a syscall server ;) > > > > That's not to say there is not synergy. For example, a jail networking > > stack virtualized this way would avoid having to go over all the code, and > > "reboot" would be as simple as kill $serverpid. Plus, more obviously, > > it would not require every jail to share the same code, i.e. you can > > have text optimized in various ways for various applications. > > You also gain the advantage about resource control, as the proxy kernel > is, by itself, a process. Buggy kernel code would only crash the server > also, without putting too much of the host kernel at risk. > > However, this design is very close to the one I envisioned with Xen and > "multiple small dom0's": with Xen, you may consider the "proxy server" > as the domU kernel, and the application running within the domain is the > jailed one. The difference being that the containers are handled just > like any other process, whereas for Xen, they are domains. Although I'm not familiar with the Xen hypercall interface, I assume it to be infinitely more well-defined than unix process<->kernel interaction with no funny bits like fiddling about here and there just because the kernel can get away with it. > The jails/containers approach is more lightweight, you just have one > instance of the kernel; IMHO, they could be compared to chroot, with > many, many improvements. Each solution has its advantages/inconvenients. Is it now? In both cases you can have 1 copy of the kernel text (ok, O(1) copies) and n copies of the *relevant* data (ok, and the process overhead). For non-academic measurements where's you're interested in application scenarios instead of pathologic microbenchmarks, I'd expect there to be ~0 difference in "lightweightedness". Anyway, they're completely different and I don't see the point of comparing them. I was just trying to point out one possible strategy for *implementing* the necessary support bits for jails/zones.
