Date: Sun, 30 Jul 2017 16:04:38 -0000 (UTC) From: mlel...@serpens.de (Michael van Elst) Message-ID: <oll02b$ikd$1...@serpens.de>
| There are slower emulated systems that don't have these issues. (*) Yes, that it is not qemu's execution speed was (really, always) becoming obvious. | If the host misses interrupts, time in the guest just passes slower | than real-time. But inside the guest it is consistent. If we could achieve that (which changing the timecounter in qemu apparently achieves) it would at least make the world become rational. Of course, keeping the timing running faster would be better - if we were able to get to a state where the client/guest were actually able to talk to the outside world (that part is easy) and run NTP, and act as a time server that others could trust, that would be ideal. | This is not to be confused with the kernel idea of wall-clock time | (i.e. what date reports). wall-clock time is usually maintained | by hardware seperated from the interrupt timers. The 'date; sleep 5; date' | sequence therefore can show that 10 seconds passed. But that is totally broken. While there is no guarantee that a sleep will wake up after exactly the time requested, it should be as close as is reasonably possible - and on an unloaded system, where there is sufficient RAM, and nothing swapped out, and nothing computing for cpu cycles, that sequence should (always) show something between 5 and 5 and a bit seconds have passed. If the cpu is busy, or things are getting swapped/paged out, then we can expect slower (not only for processes waiting upon timer signals, but for everything), and that's acceptable. But otherwise, inconsistent timing is not acceptable. All kinds of applications (including network protocols) require time to be kept in a way that is at least close to what others observe, even if not identical. One easy (poor) fix is simply to do as used to be done, and have kernel wall clock time maintained by the tick interrupt - that makes things consistent, but without any real expectation of accuracy. The alternative is to make the tic counts depend upon the external wall clock time source, so they keep in sync - much the same as the power companies do with frequency, over any short period, the nominal 50/60 Hz frequency can drift around a lot, but when measured over any reasonable period, those things are highly accurate (which is why old AC frequency based tick systems used to have very good long term time stability, provided they never lost clock interrupts.) | The problem with qemu is that it's running on a NetBSD host and | therefore cannot issue interrupts based on host time unless the | host has a larger HZ value. In the system of most interest, the host, and the guest, are the exact same system (the exact same binary kernel) - unless we alter the config of one of them explicitly to avoid this issue, they cannot help but have the same HZ value. As long as the emulated qemu client has access to a reasonably accurate ToD value (which it obviously does, as the host's time is available to qemu, and can be, and is it seems, made available to the guest) there's no reason at all the guest cannot produce the correct number of ticks. And doing so (since it is just a generic NetBSD) would solve the similar, but less blatant issue for any other system using ticks, where the occasional clock interrupt might get lost, and where there is some other ToD source available. | With host and guest running at HZ=100, it's obvious that interrupts | mostly come just too late and require two ticks on the host, thus | slowing down guest time by a factor of two. Yes, that is a very good explanation for the observed behaviour, and I cannot help but be grateful that simply beginning to discuss this issue has provided so many insights into what is happening, and what we can do to fix things. When there is no alternative than tick interrupts, we can, and do, use those to measure time, and everything works - just if the ticks are not received at the expected rate time keeping drifts away from real time (but invisibly when considered only within the system.) When there is some better measure of real time we can use, we can make sure that keeps all time keeping synchronised better, regardless of whether the system is "tickless" or still tick based - it isn't required that every single tick be 1/HZ apart (they never are precisely anyway) just that over the long term (which in computing is a half second or so) the correct number of ticks have occurred. I think it should be possible to make that happen, and that is what I am going to see if I can do. Then we can see if we can find a (good enough) way to make nanosleep() less ticky - whether by giving up on ticks altogether (which is probably not the best solution - even if we don't use ticks for timing, we'd end up emulating them for other things, if only to avoid needing to rewrite too much of the kernel in one step) or by implementing some other mechanism (interrupts from a shirt term timer not used for time calculations at all perhaps, for very short delays only, I have no idea - yet). kre ps: I simply do not care if there is (or could be) a much better fix for these issues than the type I am considering - if someone implements that before I am done, that's great, and we have made even more progress than expected. If not, they're still free to implement it after, and in the interim, we will have a system that is better than we have now, and then perhaps later, even better still. What this means is that I will totally ignore "it should be done a better way than that" arguments... If there is a defect in what I do (SMP related problems, or something) that I will listen to, and attempt to fix, but "you should have done it this other way instead" will go nowhere.