On Wed, 2016-06-08 at 04:42 -0600, Jan Beulich wrote: > > > > On 07.06.16 at 17:54, <dario.faggi...@citrix.com> wrote: > > So, it looks to me that the TSC is actually ok, and it could be the > > local_tsc_stamp and scale_delta() magic done in get_s_time_fixed() > > (which, AFAICUI, is there to convert cycles to time) that does not > > guarantee the result to be monotonic, even if the input is... > > Thoughts? > Indeed this smells like an issue in the scaling. However, the scaling > values vary only when !CONSTANT_TSC. Can you check that this > flag is actually set on that system? > Checked. I do have it. I instrumented the code to print stuff if it finds it, and it prints.
Also: root@Zhaman:~# xl debug-keys s (XEN) [ 406.719464] TSC marked as reliable, warp = 0 (count=3) (XEN) [ 406.719467] dom1(hvm): mode=0,ofs=0xffffd9279716c276,khz=2394069,inc=4,vtsc count: 195367 kernel, 0 user > (I hope you're not running a > strange Dom0 setup with FREQCTL_dom0_kernel in effect.) > I've no idea what this is. I've been running 4.1.0, built myself, and stock Debian unstable 4.5.0, and I'm seeing the issue in both cases. Looking FREQCTL_dom0_kernel up, I guess you mean what happens when one passes cpufreq="dom0-kernel" to xen on boot command line. In which case, no, I'm not doing that. > And > at the same time that it's time_calibration_tsc_rendezvous() that > is in use? > The code you're referring to should be this: /* If we have constant-rate TSCs then scale factor can be shared. */ if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ) { /* If TSCs are not marked as 'reliable', re-sync during rendezvous. */ if ( !boot_cpu_has(X86_FEATURE_TSC_RELIABLE) ) time_calibration_rendezvous_fn = time_calibration_tsc_rendezvous; } And I have both X86_FEATURE_CONSTANT_TSC and X86_FEATURE_TSC_RELIABLE. I've again instrumented the code in order to check whether it is time_calibration_tsc_rendezvous() or time_calibration_std_rendezvous() that is being used, and it's the latter: (XEN) [ 1.795916] TSC reliable. Yay!! Using ffff82d080198362 for rendevousez [dario@Solace xen.git] $ objdump -D xen/xen-syms |grep ffff82d080198362 ffff82d080198362 <time_calibration_std_rendezvous>: which looks correct to me. > Yet when the scaling values get set only once at boot, monotonic > (cross-CPU) TSC means monotonic (cross-CPU) returns from NOW(). > Yep. And at this point, this is what needs to be verified, I guess... > In any event - could you try to exclude C- and P-state effects? Of > course that makes sense only if you can reasonably repro the > problem situation (and hence can tell from its absence over a certain > period of time that whatever change was done did have some > positive effect). > It's actually quite hard *NOT* to reproduce the problem... it happens all the time, and if the load is high enough, I see the "Time went backwards?" printk several times per second! So, trying to do what you suggest in an online fashion, i.e., issueing: # xenpm set-max-cstate 0 # xenpm set-scaling-governor all performance did not change the situation (I still see the printks). I've then tried passing both cpufreq="none" and max_cstate=0 to xen at boot, but they made no difference at all either. Most of the time, we're speaking of very small skews, e.g.: (XEN) [ 59.999959] WARNING: __update_runq_load: Time went backwards? now 59999946079 llu 59999946085 (XEN) [ 117.595508] WARNING: __update_runq_load: Time went backwards? now 117595495295 llu 117595495310 i.e., 6 nanoseconds or 15 nanoseconds! Then there are instances where the difference is bigger (microseconds time scale, like in the first email of the thread). > How big of system are we talking about? I'm asking to assess the > overhead of adding some cross-CPU checks to get_s_time_fixed() > (in a debugging patch), logging relevant values if non-monotonic > output gets detected. (On too big a system, the overhead here > might end up masking the problem.) > Yeah, I sort of tried doing something like that already, but was logging the wrong thing (I was not yet suspecting a problem with scaling). I can try putting something together again. In any case, the system I use most for testing is a 16 cpus (in 2 NUMA nodes) Xeon. But I see the issue even within the same socket, so I can easily reduce to use a subset of the available processors. Thanks for your time. :-) Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel