Il 06/07/2018 11:04, Philippe Gerum ha scritto:
On 07/04/2018 07:06 PM, Federico Sbalchiero wrote:
Hi,
first I want to say thanks to everyone involved in Xenomai for their job.
I'm testing Xenomai 3.0.7 and ipipe-arm/4.14 on Freescale/NXP i.MX6q
sabresd board using Yocto. System boots fine and is stable, but latency
under load (xeno-test) is higher than in my reference system (Xenomai
2.6.5 on Freescale kernel 3.10.17 + ipipe 3.10.18).
This is after disabling power management, frequency scaling, CMA,
graphics, tracing, debug.
I have found that a simple non-realtime user space process writing a
buffer in memory (memwrite) is able to trigger such high latencies.
Latency worsen a lot running a copy of the process on each core.
There is a correlation between buffer size and cache size suggesting
an L2 cache issue, like the L2 write allocate discussed in the mailing
list, but I can confirm L2 WA is disabled (see log).
I'm looking for comments or suggestions.
Thanks,
Federico
"memwrite" test case:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
unsigned char *buffer;
int main(int argc, char **argv)
{
int i;
int count = 0;
int n;
int size = 10 * 1024 * 1024;
volatile unsigned *pt;
printf("load system by writing in memory\n");
buffer = malloc(size);
if (buffer == NULL) {
printf("buffer allocation failed\n");
exit(1);
}
n = size / sizeof(unsigned);
while (1) {
// write some data to memory buffer
pt = (unsigned *) buffer;
for (i = 0; i < n; i++)
*pt++ = i;
count++;
}
return 0;
}
xeno-test on Xenomai 3.0.7 and ipipe-arm/4.14:
RTT| 00:00:01 (periodic user-mode task, 1000 us period, priority 99)
RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat
best|--lat worst
RTD| 18.000| 26.504| 42.667| 0| 0| 18.000| 42.667
RTD| 19.000| 25.198| 41.000| 0| 0| 18.000| 42.667
RTD| 18.999| 25.494| 40.999| 0| 0| 18.000| 42.667
RTD| 18.666| 25.060| 38.999| 0| 0| 18.000| 42.667
RTD| 18.999| 24.464| 38.332| 0| 0| 18.000| 42.667
RTD| 18.332| 24.546| 41.999| 0| 0| 18.000| 42.667
RTD| 13.332| 22.445| 45.665| 0| 0| 13.332| 45.665
RTD| 13.331| 21.164| 43.665| 0| 0| 13.331| 45.665
RTD| 13.331| 21.930| 43.665| 0| 0| 13.331| 45.665
RTD| 13.331| 22.254| 48.664| 0| 0| 13.331| 48.664
RTD| 13.331| 22.037| 46.664| 0| 0| 13.331| 48.664
RTD| 13.330| 21.053| 42.664| 0| 0| 13.330| 48.664
RTD| 13.330| 20.610| 37.330| 0| 0| 13.330| 48.664
RTD| 13.330| 20.520| 34.997| 0| 0| 13.330| 48.664
RTD| 13.330| 20.398| 39.330| 0| 0| 13.330| 48.664
RTD| 13.663| 21.249| 37.996| 0| 0| 13.330| 48.664
RTD| 13.329| 20.983| 35.663| 0| 0| 13.329| 48.664
RTD| 12.996| 20.039| 34.329| 0| 0| 12.996| 48.664
RTD| 13.329| 20.580| 42.662| 0| 0| 12.996| 48.664
RTD| 12.995| 20.518| 39.329| 0| 0| 12.995| 48.664
RTD| 13.328| 20.168| 35.662| 0| 0| 12.995| 48.664
xeno-test on Xenomai 2.6.5 and Freescale Linux 3.10.17 + ipipe 3.10.18:
RTT| 00:00:01 (periodic user-mode task, 1000 us period, priority 99)
RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat
best|--lat worst
RTD| 4.957| 17.575| 28.088| 0| 0| 4.957| 28.088
RTD| 4.904| 17.560| 26.828| 0| 0| 4.904| 28.088
RTD| 4.479| 13.472| 29.767| 0| 0| 4.479| 29.767
RTD| 4.522| 12.724| 23.275| 0| 0| 4.479| 29.767
RTD| 4.512| 12.904| 25.641| 0| 0| 4.479| 29.767
RTD| 4.542| 12.818| 27.878| 0| 0| 4.479| 29.767
RTD| 4.520| 13.068| 27.926| 0| 0| 4.479| 29.767
RTD| 4.409| 12.770| 26.689| 0| 0| 4.409| 29.767
RTD| 4.568| 12.265| 27.065| 0| 0| 4.409| 29.767
RTD| 4.492| 12.017| 25.898| 0| 0| 4.409| 29.767
RTD| 4.469| 12.303| 24.540| 0| 0| 4.409| 29.767
RTD| 4.489| 12.030| 27.924| 0| 0| 4.409| 29.767
RTD| 4.590| 11.851| 23.651| 0| 0| 4.409| 29.767
RTD| 4.479| 13.371| 24.838| 0| 0| 4.409| 29.767
RTD| 4.396| 13.204| 28.797| 0| 0| 4.396| 29.767
RTD| 4.411| 12.454| 26.002| 0| 0| 4.396| 29.767
RTD| 4.560| 12.234| 27.146| 0| 0| 4.396| 29.767
RTD| 4.593| 12.441| 24.686| 0| 0| 4.396| 29.767
RTD| 4.520| 12.510| 24.275| 0| 0| 4.396| 29.767
RTD| 4.568| 11.797| 24.982| 0| 0| 4.396| 29.767
RTD| 4.482| 12.631| 24.972| 0| 0| 4.396| 29.767
Worst-case on 2.6.5 + 3.18.20 is 67 us here, after 10 hrs runtime on
imx6q - definitely not 30 us - stressing the latency test with:
- dd loop (zero -> null, 16M bs)
- switchtest -s 200
30 us worst case are in very short term (1-2 minutes) with just one
instance of
memwrite in background.
Using dd loop and switchtest gives 50 us in short term. I suppose this
compares
reasonably to 67 us after 10 hours.
I also confirm
dd if=/dev/zero of=/dev/null bs=16M
has the same effect on latency as memwrite, thanks
_______________________________________________
Xenomai mailing list
[email protected]
https://xenomai.org/mailman/listinfo/xenomai