Am Mon, 8 Aug 2016 18:21:28 +0200
schrieb Vincent Berenz <[email protected]>:

> Hi,
> 
> I set tsc=reliable, and "skipped synchronization checks as TSC is 
> reliable" showed up in the syslog.
> 
> The machine boots correctly on both the patched and non patched
> kernel. And in both case everything seems to run fine. On xenomai
> patched kernel the issues related to the keyboard and glxgears are
> gone. The latency is still low (between 4 and 20) and our software
> seems to work well. So, seemingly all good.
> 
> Anything else I should check or be careful about ?

Let's say i did not suggest that parameter as a solution. Linux does
not do those checks for fun and does not fail them because its broken.
A comment in the Linux suggests that your BIOS programmed the TSC
offsets incorrectly, because on your machine the test failed for
socket-siblings.

If the tests fail at every boot and the values are at the same order of
magnitude i guess the TSCs are indeed off. You should be able to see
that with the xenomai clocktest.

Could you please run /usr/lib/xenomai/testsuite/clocktest
I am guessing you might see "warps" and "max delta [us]" values
different from 0.

The max delta is how far a tsc based clock reading could jump if the
process migrated between the cores with that offset. In that case
processes measuring time could get negative or very high outliers.

Henning

> 
> 
> On 08.08.2016 11:34, Henning Schild wrote:
> > Am Fri, 5 Aug 2016 19:13:13 +0200
> > schrieb Vincent Berenz <[email protected]>:
> >  
> >> I checked the syslog when booting on the non realtime kernel, and
> >> indeed the same messages related to TSC showed up. Yet, I do not
> >> experience any of the issues observed on the patched kernel (e.g
> >> glxgears or keyboard)
> >>
> >> I ran lstopo and lshw and there seem to be 2 sockets with 12 cores
> >> on each.
> >>  
> > I have seen this several times across sockets, but in your case the
> > two CPUs are on the same socket. And i have a 32 core XEON that
> > also fails the TSC test between 0 and 1 on the same socket.
> >  
> >> lstopo
> >>
> >> ---
> >> Machine (126GB)
> >>    Socket L#0 + L3 L#0 (30MB)
> >>      L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 +
> >> PU L#0 (P#0) L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) +
> >> Core L#1
> >> + PU L#1 (P#1) L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) +
> >> Core L#2 + PU L#2 (P#2) L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3
> >> (32KB) + Core L#3 + PU L#3 (P#3) L2 L#4 (256KB) + L1d L#4 (32KB) +
> >> L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4) L2 L#5 (256KB) + L1d L#5
> >> (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5) L2 L#6 (256KB) +
> >> L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6) L2 L#7
> >> (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7)
> >> L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU
> >> L#8 (P#8) L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core
> >> L#9 + PU L#9 (P#9) L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10
> >> (32KB) + Core L#10 + PU L#10 (P#10) L2 L#11 (256KB) + L1d L#11
> >> (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#11) Socket L#1 +
> >> L3 L#1 (30MB) L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB)
> >> + Core L#12 + PU L#12 (P#12) L2 L#13 (256KB) + L1d L#13 (32KB) +
> >> L1i L#13 (32KB) + Core L#13 + PU L#13 (P#13) L2 L#14 (256KB) + L1d
> >> L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#14) L2 L#15
> >> (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15
> >> (P#15) L2 L#16 (256KB)
> >> + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16 + PU L#16 (P#16) L2
> >> L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17 + PU
> >> L#17 (P#17) L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) +
> >> Core L#18 + PU L#18 (P#18) L2 L#19 (256KB) + L1d L#19 (32KB) + L1i
> >> L#19 (32KB) + Core L#19 + PU L#19 (P#19) L2 L#20 (256KB) + L1d L#20
> >> (32KB) + L1i L#20 (32KB) + Core L#20 + PU L#20 (P#20) L2 L#21
> >> (256KB)
> >> + L1d L#21 (32KB) + L1i L#21 (32KB) + Core L#21 + PU L#21 (P#21) L2
> >> L#22 (256KB) + L1d L#22 (32KB) + L1i L#22 (32KB) + Core L#22 + PU
> >> L#22 (P#22) L2 L#23 (256KB) + L1d L#23 (32KB) + L1i L#23 (32KB) +
> >> Core L#23 + PU L#23 (P#23) ---
> >>
> >>
> >> lshw -class processor
> >>
> >> ---
> >>    *-cpu:0
> >>         description: CPU
> >>         product: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
> >>         vendor: Intel Corp.
> >>         physical id: 106
> >>         bus info: cpu@0
> >>         version: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
> >>         slot: SOCKET 1
> >>         size: 2600MHz
> >>         capacity: 4GHz
> >>         width: 64 bits
> >>         clock: 100MHz
> >>         capabilities: x86-64 fpu fpu_exception wp vme de pse tsc
> >> msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts
> >> acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
> >> constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> >> nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx
> >> est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
> >> movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm
> >> abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi
> >> flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms
> >> invpcid configuration: cores=12 enabledcores=12 threads=24 *-cpu:1
> >> description: CPU product: Intel(R) Xeon(R) CPU E5-2690 v3 @
> >> 2.60GHz vendor: Intel Corp. physical id: 11a bus info: cpu@1
> >> version: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz slot: SOCKET 2
> >> size: 2600MHz capacity: 4GHz width: 64 bits clock: 100MHz
> >>         capabilities: x86-64 fpu fpu_exception wp vme de pse tsc
> >> msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts
> >> acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp
> >> constant_tsc arch_perfmon pebs bts rep_good nopl xtopology
> >> nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx
> >> est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
> >> movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm
> >> abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi
> >> flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms
> >> invpcid configuration: cores=12 enabledcores=12 threads=24 ---
> >>
> >> To add the kernel parameter I updated /etc/default/grub to :
> >>
> >> ---
> >> GRUB_CMDLINE_LINUX_DEFAULT="quiet splash
> >> xeno_nucleus.xenomai_gid=1001
> >> xeno_hal.supported_cpus=0xfffffffffffffffd" ---
> >>
> >> Is that the correct way to do this ?
> >> Is there a way to check this was effective ? (I attached the
> >> syslogs, just in case).
> >>
> >> Stressing the kernel resulted in :
> >>
> >> ---
> >> [  515.420275] Broke affinity for irq 98
> >> [  515.421329] kvm: disabling virtualization on CPU1
> >> [  515.424184] smpboot: CPU 1 is now offline
> >> [  530.021118] x86: Booting SMP configuration:
> >> [  530.021121] smpboot: Booting Node 0 Processor 1 APIC 0x2
> >> [  530.037201] kvm: enabling virtualization on CPU1
> >> ---  
> > Sorry, i should have explained that in more detail. The systems i
> > have seen the problem on do not always fail the TSC sync test. So
> > the idea is to hotplug a CPU to not have to reboot all the time. If
> > any CPU pair fails the test during boot you will not be able to do
> > anything with cpu hotplugging, because the TSC will be marked
> > unstable already.
> >
> > I guess in your case the TSC tests fails all the time on 0 -> 1. So
> > you do not need the hotplugging to try and reproduce it.
> >
> > There is a switch that tells Linux to skip the test and assume the
> > tsc was stable. "tsc=reliable"
> > What is the behaviour if you use that? Both in regular Linux and in
> > the patched kernel. The problem with this guy is that it skips a
> > test very relevant to Xenomai operation later on.
> >  
> >> In case this hardware is not best for xenomai:
> >> We selected this configuration for the only reason it has lots of
> >> pci-express slots. We would be happy to switch to any other
> >> preferred solution. Just in case : would you have by chance some
> >> recommendation ?  
> > I do not have a recommendation, but you could try different BIOS
> > versions for that machine. (up- or downgrade)
> >     
> >> Have a nice week end !
> >>
> >> Vincent
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, 4 Aug 2016 16:11:55 +0200
> >>   Henning Schild <[email protected]> wrote:  
> >>> Am Thu, 4 Aug 2016 15:23:34 +0200
> >>> schrieb Vincent Berenz <[email protected]>:
> >>>      
> >>>> Hi,
> >>>>
> >>>> Many thanks for the answer.
> >>>>
> >>>> We use new hardware. I am working on a recent dell precision
> >>>> T7910. I did not try to update our older hardware (still in use).
> >>>>
> >>>> Info on the CPU of the new machine:
> >>>>
> >>>> -----
> >>>> processor        : 23
> >>>> vendor_id        : GenuineIntel
> >>>> cpu family       : 6
> >>>> model            : 63
> >>>> model name       : Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
> >>>> stepping : 2
> >>>> microcode        : 0x36
> >>>> cpu MHz          : 2594.037
> >>>> cache size       : 30720 KB
> >>>> physical id      : 1
> >>>> siblings : 12
> >>>> core id          : 13
> >>>> cpu cores        : 12
> >>>> apicid           : 58
> >>>> initial apicid   : 58
> >>>> fpu              : yes
> >>>> fpu_exception    : yes
> >>>> cpuid level      : 15
> >>>> wp               : yes
> >>>> flags            : fpu vme de pse tsc msr pae mce cx8 apic
> >>>> sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse
> >>>> sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc
> >>>> arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
> >>>> aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2
> >>>> ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe
> >>>> popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm
> >>>> ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi
> >>>> flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2
> >>>> erms invpcid bogomips    : 5189.70 clflush size  : 64
> >>>> cache_alignment  : 64 address sizes      : 46 bits
> >>>> physical, 48 bits virtual power management: -----
> >>>>
> >>>> There are 24 processors and I had to update the config file:  
> >>> That is a big machine. Are cpu0 and cpu1 on different sockets?
> >>> (lstopo) Linux detects a problem with the TSCs of the two cores
> >>> not beeing in sync, that should be unrelated to Xenomai and
> >>> should also happen on your Distro-Kernel.
> >>>
> >>> You can stress the Linux-Kernel code that generated that message
> >>> with offlining/onlining the CPU.
> >>>
> >>> For your case "TSC synchronization [CPU#0 -> CPU#1]" you want to
> >>> offline CPU1 and online it from CPU0.
> >>>
> >>> # make sure online comes from CPU0
> >>> taskset 0x1 bash
> >>> # offline CPU1
> >>> echo 0 >  /sys/devices/system/cpu/cpu1/online
> >>> # online CPU1
> >>> echo 1 >  /sys/devices/system/cpu/cpu1/online
> >>>
> >>> Doing that on a xenomai enabled kernel you will have to exclude
> >>> the CPU in question from xenomai. In your case add the following
> >>> kernel parameter "xeno_hal.supported_cpus=0xfffffffffffffffd".
> >>>
> >>> I am guessing you will be able to reproduce this
> >>>      
> >>>>>> [    0.109150] TSC synchronization [CPU#0 -> CPU#1]:
> >>>>>> [    0.109157] Measured 25802382 cycles TSC warp between CPUs,
> >>>>>> turning off TSC clock. [    0.109161] tsc: Marking TSC
> >>>>>> unstable  
> >>> on a xenomai kernel and a regular kernel. I would be interested in
> >>> the results.
> >>> In the worst case the TSC of your machine can indeed not be
> >>> trusted. 
> >>>> ---
> >>>> CONFIG_XENO_OPT_PIPE_NRDEV=32
> >>>> CONFIG_XENO_OPT_REGISTRY_NRSLOTS=1024
> >>>> CONFIG_XENO_OPT_SYS_HEAPSZ=32768
> >>>> CONFIG_XENO_OPT_SYS_STACKPOOLSZ=4096
> >>>> ---
> >>>>
> >>>> Best
> >>>>
> >>>> Vincent
> >>>>
> >>>> On Thu, 4 Aug 2016 14:17:44 +0200
> >>>>   Henning Schild <[email protected]> wrote:  
> >>>>> Am Wed, 3 Aug 2016 12:12:51 +0200
> >>>>> schrieb Vincent Berenz <[email protected]>:
> >>>>>        
> >>>>>> Hi,
> >>>>>>
> >>>>>> After using for years xenomai 2.5.6 on ubuntu 12.04, we
> >>>>>> decided to upgrade to ubuntu 14.04 and a newer machine. I
> >>>>>> installed xenomai 2.6.4 and kernel 3.14.39. The installation
> >>>>>> boots correctly, the latency is low and our software seems to
> >>>>>> work ok.
> >>>>>>
> >>>>>> But the system has "frequency surge" (I could not find better
> >>>>>> wording). For example:
> >>>>>>
> >>>>>> - sometime when typing on the keyboard, the pressed key is
> >>>>>> printed many times ('aaaaaaaa' instead of 'a')
> >>>>>>
> >>>>>> - 'glxgears' has change in frame rates, the gears can be seen
> >>>>>> as sometime changing speed. For example:
> >>>>>>
> >>>>>> ---
> >>>>>> 1141 frames in 5.0 seconds = 228.186 FPS
> >>>>>> 1024 frames in 5.0 seconds = 204.787 FPS
> >>>>>> 506 frames in 5.0 seconds = 101.194 FPS
> >>>>>> 482 frames in 5.0 seconds = 96.317 FPS
> >>>>>> 1416 frames in 5.0 seconds = 283.182 FPS
> >>>>>> 2614 frames in 5.0 seconds = 521.100 FPS
> >>>>>> 2618 frames in 5.0 seconds = 522.314 FPS
> >>>>>> 3073 frames in 5.0 seconds = 614.562 FPS
> >>>>>> ---
> >>>>>>
> >>>>>> All the tests run fine (as far as I could tell) with the
> >>>>>> notable exception of tsc which sometimes (not always)
> >>>>>> terminates with something like:
> >>>>>>
> >>>>>> ---
> >>>>>> tsc not monotonic after 7430687798 ticks, jumped back 49567650
> >>>>>> tick ---
> >>>>>>
> >>>>>> I could find this in the syslog:
> >>>>>>
> >>>>>> -------
> >>>>>> [    0.092932] TSC deadline timer enabled
> >>>>>> [    0.092941] Performance Events: PEBS fmt2+, 16-deep LBR,
> >>>>>> Haswell events, full-width counters, Intel PMU driver.
> >>>>>> [    0.092961] ... version:                3
> >>>>>> [    0.092962] ... bit width: 48 [    0.092963] ... generic
> >>>>>> registers:      4 [    0.092964] ... value mask:
> >>>>>> 0000ffffffffffff [    0.092965] ... max period:
> >>>>>> 0000ffffffffffff [    0.092965] ... fixed-purpose events:   3
> >>>>>> [    0.092966] ... event mask:             000000070000000f
> >>>>>> [    0.094914] x86: Booting SMP configuration:
> >>>>>> [    0.094916] .... node  #0, CPUs:        #1
> >>>>>> [    0.109150] TSC synchronization [CPU#0 -> CPU#1]:
> >>>>>> [    0.109157] Measured 25802382 cycles TSC warp between CPUs,
> >>>>>> turning off TSC clock. [    0.109161] tsc: Marking TSC
> >>>>>> unstable due to check_tsc_sync_source failed ---------  
> >>>>> I have seen this message before, but with smaller numbers.
> >>>>>
> >>>>> I assume you have not changed the Hardware, which versions of
> >>>>> Xenomai and the Kernel did you use before? Trying to find out
> >>>>> whether these checks did not trigger before because they did not
> >>>>> exist or where different in your old setup.
> >>>>>        
> >>>>>> Best
> >>>>>>
> >>>>>> Vincent
> >>>>>> -------------- next part --------------
> >>>>>> A non-text attachment was scrubbed...
> >>>>>> Name: config
> >>>>>> Type: application/octet-stream
> >>>>>> Size: 162268 bytes
> >>>>>> Desc: not available
> >>>>>> URL:
> >>>>>> <http://xenomai.org/pipermail/xenomai/attachments/20160803/26bc2e90/attachment.obj>
> >>>>>> -------------- next part -------------- An embedded and
> >>>>>> charset-unspecified text was scrubbed... Name: dmesg_xeno.txt
> >>>>>> URL:
> >>>>>> <http://xenomai.org/pipermail/xenomai/attachments/20160803/26bc2e90/attachment.txt>
> >>>>>> _______________________________________________ Xenomai
> >>>>>> mailing list [email protected]
> >>>>>> https://xenomai.org/mailman/listinfo/xenomai  
> >>>>>        
> >>>>      
> >>>      
> 


_______________________________________________
Xenomai mailing list
[email protected]
https://xenomai.org/mailman/listinfo/xenomai

Reply via email to