Re: [Xenomai] Isolation of CPU (isolcpus=1). Unexpected better performance when RT thread is on core0.

Philippe Gerum Tue, 06 Feb 2018 09:21:12 -0800

On 02/05/2018 04:50 PM, Yann le Chevoir wrote:

Hello,


I am an engineering student and I try to proof that a 4000Hz hard real-time
application can run on an ARM board rather than on a more powerful machine.

I work with an IMX6 dual-core and Xenomai Cobalt 3.0.4. I use POSIX skin.
By the way, I first installed Xenomai Cobalt 3.0.5 but first
experimentations revealed that Alchemy API did not work properly (for
example, altency test did not work).

Any specifics regarding what went wrong would be more helpful.Otherwise, nobody may bother and a potential bug would stay.


 I needed more investigations but when

I tried previous version, it worked. I did not test v3.0.6.

You should not have downgraded but rather pulled the latest code fromthe stable-3.0.x branch at git://git.xenomai.org/xenomai-3.git.As a general note, please disregard the release tarballs: our releasecycle is way too slow to make them a sane option, as truckloads of bugfixes can pass before a new tarball is issued. Tracking the stable treewould get you the latest validated fixes.

For now, my point is that I observe some unexpected behaviors when
isolating cpu1 and perhaps you can explain some to me.


TID 881 is the main.
I am not sure why there is the TID 890 thread. Is it a Xenomai one (main)?

libcobalt's internal printer loop thread for carrying out deferredprintf() calls. Ancillary stuff.

Min execution time is 32us.
Max execution time is 82us.
I am a bit disappointed by so execution-time variations.
How can we explain that?

A dual kernel system exhibits a permanent conflict between two kernelscompeting for the same hw resources. Considering CPU caches forinstance, the cachelines a sleeping rt thread was previously using canbe evicted by a non-rt thread resuming on the same CPU then treading ona large amount of physical memory. When the rt thread wakes upeventually, it may have to go through a series of cache misses to getthe I/D caches hot again.

Generally speaking, we have a GPOS running side by side a RTOS on thesame hardware, and the former does not care a dime about therequirements of the latter. Mitigating the adverse effects of suchsituation in order to keep latency low and bounded is the basic taskdefining the Xenomai project.

This issue may be aggravated by hw specifics: your imx6d is likelyfitted with a PL3xx outer L2 cache, for which the write-allocate policyis enabled by the kernel. That policy proved to be responsible for uglylatency figures with this cache controller. Can we disable such policy?Maybe, it depends; we used to have some success doing just that withearly imx6 hw, then keeping it enabled became a requirement later withmore recent SoCs (e.g. imx6qp) as we noticed that such policy wasinvolved in cache coherence in multi-core configs. So YMMV.

If you want to give WA disabling a try, just pass l2x0_write_allocate=0to the kernel cmdline. If your SoC ends up not booting with that switch,or dies in mysterious and random ways during runtime, well, this it islikely the sign that a cache coherence issue is biting and you can'thack away with that one.


Then, trying permutations to understand these variations, I decided to put
thread1 on CPU0. Linux, main, thread0 and dohell continue doing their stuff.
Note that there is again the isolcpus=1 argument, so nothing is on CPU1.
I am surprised to have a better execution time statistics. Is it a known
situation and how can we explain that? See "Core0.png".

Reminder of the configuration when plotting "Core0.png":
Core0: Linux stressed + main + thread0 + thread1
Core1: -

Min execution time is 32us.
Max execution time is 65us.


Then, given these results, as I had the feeling that a mono-core processor
performs better that a dual-core one, I tried to delete the isolcpus=1
argument to proof the contrary.

Here is the configuration when plotting "NoIsolation.png":
Core0: Linux stressed + main + thread0
Core1: Linux stressed + thread1

As you can see, the graph looks like the first one, but execution time
is even worse at 94us.

Is there something I do wrong?

You may also need to tell Xenomai that only CPU1 should process rtworkloads (i.e. xenomai.supported_cpus=2). I suspect that serializationon a core Xenomai lock from all CPUs where the local TWDs tickintroduces some jitter. Restricting the set of rt CPUs to CPU1 wouldprevent Xenomai from handling rt timer events on any other CPU, liftingany contention of that lock in the same move.


--
Philippe.

_______________________________________________
Xenomai mailing list
Xenomai@xenomai.org
https://xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] Isolation of CPU (isolcpus=1). Unexpected better performance when RT thread is on core0.

Reply via email to