On 07:17 Tue 28 Jun , Willy Tarreau wrote: > On Mon, Jun 27, 2011 at 07:25:31PM -0700, john stultz wrote: > > That said, I didn't see from any of the backtraces in this thread why > > the system actually crashed. The softlockup message on its own > > shouldn't do that, so I suspect there's still a related issue > > somewhere else here. > > One of the traces clearly showed that the kernel's uptime had wrapped > or jumped, because the uptime suddenly jumped forwards to something > like 2^32/HZ seconds IIRC. > > Thus it is possible that we have two bugs, one on the clock making it > jump forwards and one somewhere else causing an overflow when the clock > jumps too far.
Our last machine with wrapped time crashed 1 month ago, almost 1 month after the time wrap. One thing I noticed, was that although the machine seemed healthy apart from the time-wrap, there seemed to be random scheduling glitches, which were mostly visible as high ping times to the KVM guests running on the machine. Unfortunately I don't have any exact numbers, so I suppose the best I can do is describe what we saw. All scheduler statistics under /proc/sched_debug on the host seemed normal, however pinging a VM from outside would give random spikes in the order of hundreds of ms among the usual 1-2 ms times. Moving the VM to another host would restore sane ping times and any other VM moved to this host would exhibit the same behaviour. Ping times to the host itself from outside were stable. This was also accompanied by bad I/O performance in the KVM guests themelves and the strange effect that the total CPU time on the VM's munin graphs would add to less than 100% * #CPUs. Neither the host nor the guests were experiencing heavy load. As a side note, this was similar to the behaviour we had experienced once when some of multipathd's path checkers (which are RT tasks IIRC) had crashed, although this time restarting multipathd didn't help. Regards, Apollon _______________________________________________ stable mailing list [email protected] http://linux.kernel.org/mailman/listinfo/stable
