Re: [Xenomai] Simple application for invoking rtdm driver

Philippe Gerum Mon, 26 Mar 2018 08:10:00 -0700

On 03/26/2018 03:12 PM, Pintu Kumar wrote:
> Dear Philippe,
> 
> Thank you so much for your reply.
> Please find my comments below.
> 
> 
> On Sun, Mar 25, 2018 at 5:39 PM, Philippe Gerum <[email protected]> wrote:
>> On 03/23/2018 01:40 PM, Pintu Kumar wrote:
>>> Dear Philippe,
>>>
>>> Thank you so much for your detailed explanation.
>>>
>>> First to cross-check, I also tried on ARM BeagleBone (White) with
>>> 256MB RAM, Single core
>>> These are the values I got.
>>
>> After how many samples?
> 
> Just after 3 samples only for each cases. Just an initial run to
> understand the difference.
> 
>>
>>> ===========================
>>> NORMAL KERNEL Driver Build (with xenomai present)
>>> ---------------------------------------------------------------------------
>>> write latency: 8235.083 us
>>
>> Are you sure that any driver (plain Linux or Xenomai) would take up 8.2
>> MILLIseconds for performing a single write with your test module? Either
>> you meant 8235 nanoseconds, or something is really wrong with your
>> system.
> 
> Yes these values are calculated in micro-seconds.
> I have used the same to measure latency for native application, and it
> reports fine.
> These large values are seen only on Beagle bone (white) with just 256MB RAM,
> and model name: ARMv7 Processor rev 2 (v7l)
> I think this is very old board and its very slow in normal usage itself.
> So, these figures could be high.
>


No, these figures do not make sense in dual kernel context even on this
board and clearly denote a problem with the application, given the
figures you got on the same machine with a proper latency test as
illustrated below.

> This is the latency test output from same machine:
> # /usr/xenomai/bin/latency
> == Sampling period: 1000 us
> == Test mode: periodic user-mode task
> == All results in microseconds
> warming up...
> RTT|  00:00:01  (periodic user-mode task, 1000 us period, priority 99)
> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat 
> worst
> RTD|     25.249|     29.711|     63.749|       0|     0|     25.249|     
> 63.749
> RTD|     25.207|     29.589|     60.749|       0|     0|     25.207|     
> 63.749
> RTD|     25.207|     29.701|     61.041|       0|     0|     25.207|     
> 63.749
> RTD|     22.874|     29.263|     54.749|       0|     0|     22.874|     
> 63.749
> RTD|     25.248|     29.542|     78.373|       0|     0|     22.874|     
> 78.373
> RTD|     15.081|     29.050|     55.082|       0|     0|     15.081|     
> 78.373
> RTD|     22.873|     28.940|     57.415|       0|     0|     15.081|     
> 78.373
> RTD|     25.331|     28.972|     55.498|       0|     0|     15.081|     
> 78.373
> RTD|     24.164|     28.071|     56.498|       0|     0|     15.081|     
> 78.373
> ^C---|-----------|-----------|-----------|--------|------|-------------------------
> RTS|     15.081|     29.204|     78.373|       0|     0|    00:00:10/00:00:10
> 
> 
> 

[...]

> 
> I even tried running the whole operation inside a RT task with priority 99.

Regular Linux threads cannot compete with Cobalt threads by priority in
rt mode, those threads are managed by separate schedulers, and Cobalt's
scheduler always runs first. There is no point in raising the priority
of a Cobalt thread if no other Cobalt thread actually competes with it.

> Then in this case, latency values are reduced by almost half, but
> still 2-3 us higher than normal driver.
> 

Again, please read my previous answers, I'm going to rehash them:
your test does NOT measure latency as in "response time", simply because
it does not wait for any event. To respond to an event, you have to wait
for it first. Your test measures the execution time of a dummy write()
system call. The test I provided does the measure execution time for
write() AND the latency of read(), just like the "latency" test bundled
with Xenomai.

The values you got so far with any test are not trustworthy because:

- you don't add any stress load in parallel, so your are not measuring
anything close to a worst case time,

- the test needs to run for much longer than a couple of seconds or even
worse, iterations. It needs to run for hours under load to be
meaningful. The longer it runs with well-chosen stress loads in
parallel, the more trustworthy it can be.

In absence of formal analysis, all we have is a probabilistic approach
for getting close to the real worst-case latency figures: the only way
we can get there is to hammer the target system hard, diversely and long
enough while measuring.

>> Once the two modules, and two test executables are built, just push the
>> modules (they can live together in the kernel, no conflict), then run
>> either of the executables for measuring 1) the execution time on the
>> write() side, and 2) the response time on the read side.
>>
> 
> Anyways, I have build your test application and modules (using my
> Makefile) and verified it
> on my x86_64 skylake machine.
> 
> Here are the results that I obtained:
> 
> # ./posix_test ; ./cobalt_test
> DEVICE: /dev/bar, all microseconds
> 
> [ 0' 0"] RD_MIN | RD_MAX |  R_AVG  | WR_MIN | WR_MAX |  WR_AVG
> --------------------------------------------------------------
>               0 |     16 |   0.518 |      0 |      7 |  0.338
>               0 |     16 |   0.501 |      0 |     16 |  0.337
>               0 |     16 |   0.501 |      0 |     16 |  0.337
>               0 |     16 |   0.501 |      0 |     16 |  0.337
>               0 |     16 |   0.501 |      0 |     16 |  0.337
>               0 |     16 |   0.501 |      0 |     16 |  0.337
>               0 |     16 |   0.501 |      0 |     16 |  0.337
>               0 |     16 |   0.501 |      0 |     16 |  0.337
>               0 |     16 |   0.501 |      0 |     16 |  0.337
> ^C
> DEVICE: /dev/rtdm/foo, all microseconds
> 
> [ 0' 0"] RD_MIN | RD_MAX |  R_AVG  | WR_MIN | WR_MAX |  WR_AVG
> --------------------------------------------------------------
>               0 |      1 |   0.573 |      0 |      1 |  0.241
>               0 |     17 |   0.570 |      0 |     17 |  0.240
>               0 |     17 |   0.570 |      0 |     17 |  0.240
>               0 |     17 |   0.570 |      0 |     17 |  0.240
>               0 |     17 |   0.570 |      0 |     17 |  0.240
>               0 |     17 |   0.570 |      0 |     17 |  0.240
>               0 |     17 |   0.570 |      0 |     17 |  0.240
>               0 |     17 |   0.570 |      0 |     17 |  0.240
> ^C
> 
> Here, I did not run any dd or hackbench loops.
> This is just a plan run on x86 PC.

Which is wrong and totally defeats the purpose of your test, see above.
Really, you do want to run significant stress load in parallel to any of
your test, which must last long enough to be meaningful.

> 
> Here also it looks like read_max is higher for rtdm case.

The information you have from RD_MAX after only a few seconds is
meaningless, the average might be slightly more useful. It says that on
average, it takes 69 nanoseconds more to run the RTDM  write() syscall
compared to a native one, while no other activity is eagerly trying to
grab the CPU, or causing cacheline eviction. Which may definitely be the
case, since Xenomai may be running more code in the syscall path in some
situations.

Taking the argument to the extremes, this basically tells you that you
might want to use a native kernel for running empty write() system calls
on an idle machine. For any other usage, you might want to consider
other factors that may well happen in real world systems.

Metaphorically speaking, this real-time game is not about shooting the
ball through the hoop most of the time, but doing so every time a shot
is taken instead, including when the shooter is facing both the
non-cooperative 250 lbs power forward and 7 ft pivot from the other side.

-- 
Philippe.

_______________________________________________
Xenomai mailing list
[email protected]
https://xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] Simple application for invoking rtdm driver

Reply via email to