On Wed, 2016-06-08 at 04:42 -0600, Jan Beulich wrote:
> > > > On 07.06.16 at 17:54, <dario.faggi...@citrix.com> wrote:
> > So, it looks to me that the TSC is actually ok, and it could be the
> > local_tsc_stamp and scale_delta() magic done in get_s_time_fixed()
> > (which, AFAICUI, is there to convert cycles to time) that does not
> > guarantee the result to be monotonic, even if the input is...
> > Thoughts?
> Indeed this smells like an issue in the scaling. However, the scaling
> values vary only when !CONSTANT_TSC. Can you check that this
> flag is actually set on that system? 
>
Checked. I do have it. I instrumented the code to print stuff if it
finds it, and it prints.

Also:
root@Zhaman:~# xl debug-keys s
(XEN) [  406.719464] TSC marked as reliable, warp = 0 (count=3)
(XEN) [  406.719467] dom1(hvm): 
mode=0,ofs=0xffffd9279716c276,khz=2394069,inc=4,vtsc count: 195367 kernel, 0 
user

> (I hope you're not running a
> strange Dom0 setup with FREQCTL_dom0_kernel in effect.) 
>
I've no idea what this is. I've been running 4.1.0, built myself, and
stock Debian unstable 4.5.0, and I'm seeing the issue in both cases.

Looking FREQCTL_dom0_kernel up, I guess you mean what happens when one
passes cpufreq="dom0-kernel" to xen on boot command line. In which
case, no, I'm not doing that.

> And
> at the same time that it's time_calibration_tsc_rendezvous() that
> is in use?
> 
The code you're referring to should be this:

    /* If we have constant-rate TSCs then scale factor can be shared. */
    if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) )
    {
        /* If TSCs are not marked as 'reliable', re-sync during rendezvous. */
        if ( !boot_cpu_has(X86_FEATURE_TSC_RELIABLE) )
            time_calibration_rendezvous_fn = time_calibration_tsc_rendezvous;
    }

And I have both X86_FEATURE_CONSTANT_TSC and X86_FEATURE_TSC_RELIABLE.

I've again instrumented the code in order to check whether it is
time_calibration_tsc_rendezvous() or time_calibration_std_rendezvous()
that is being used, and it's the latter:

(XEN) [    1.795916] TSC reliable. Yay!! Using ffff82d080198362 for rendevousez

[dario@Solace xen.git] $ objdump -D xen/xen-syms |grep ffff82d080198362
ffff82d080198362 <time_calibration_std_rendezvous>:

which looks correct to me.

> Yet when the scaling values get set only once at boot, monotonic
> (cross-CPU) TSC means monotonic (cross-CPU) returns from NOW().
> 
Yep. And at this point, this is what needs to be verified, I guess...

> In any event - could you try to exclude C- and P-state effects? Of
> course that makes sense only if you can reasonably repro the
> problem situation (and hence can tell from its absence over a certain
> period of time that whatever change was done did have some
> positive effect).
> 
It's actually quite hard *NOT* to reproduce the problem... it happens
all the time, and if the load is high enough, I see the "Time went
backwards?" printk several times per second!

So, trying to do what you suggest in an online fashion, i.e., issueing:

 # xenpm set-max-cstate 0
 # xenpm set-scaling-governor all performance

did not change the situation (I still see the printks).

I've then tried passing both cpufreq="none" and max_cstate=0 to xen at
boot, but they made no difference at all either.

Most of the time, we're speaking of very small skews, e.g.:

(XEN) [   59.999959] WARNING: __update_runq_load: Time went backwards? now 
59999946079 llu 59999946085
(XEN) [  117.595508] WARNING: __update_runq_load: Time went backwards? now 
117595495295 llu 117595495310

i.e., 6 nanoseconds or 15 nanoseconds!

Then there are instances where the difference is bigger (microseconds
time scale, like in the first email of the thread).

> How big of system are we talking about? I'm asking to assess the
> overhead of adding some cross-CPU checks to get_s_time_fixed()
> (in a debugging patch), logging relevant values if non-monotonic
> output gets detected. (On too big a system, the overhead here
> might end up masking the problem.)
> 
Yeah, I sort of tried doing something like that already, but was
logging the wrong thing (I was not yet suspecting a problem with
scaling).

I can try putting something together again.

In any case, the system I use most for testing is a 16 cpus (in 2 NUMA
nodes) Xeon. But I see the issue even within the same socket, so I can
easily reduce to use a subset of the available processors.

Thanks for your time. :-)
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to