On 03/26/2018 03:12 PM, Pintu Kumar wrote: > Dear Philippe, > > Thank you so much for your reply. > Please find my comments below. > > > On Sun, Mar 25, 2018 at 5:39 PM, Philippe Gerum <[email protected]> wrote: >> On 03/23/2018 01:40 PM, Pintu Kumar wrote: >>> Dear Philippe, >>> >>> Thank you so much for your detailed explanation. >>> >>> First to cross-check, I also tried on ARM BeagleBone (White) with >>> 256MB RAM, Single core >>> These are the values I got. >> >> After how many samples? > > Just after 3 samples only for each cases. Just an initial run to > understand the difference. > >> >>> =========================== >>> NORMAL KERNEL Driver Build (with xenomai present) >>> --------------------------------------------------------------------------- >>> write latency: 8235.083 us >> >> Are you sure that any driver (plain Linux or Xenomai) would take up 8.2 >> MILLIseconds for performing a single write with your test module? Either >> you meant 8235 nanoseconds, or something is really wrong with your >> system. > > Yes these values are calculated in micro-seconds. > I have used the same to measure latency for native application, and it > reports fine. > These large values are seen only on Beagle bone (white) with just 256MB RAM, > and model name: ARMv7 Processor rev 2 (v7l) > I think this is very old board and its very slow in normal usage itself. > So, these figures could be high. >
No, these figures do not make sense in dual kernel context even on this board and clearly denote a problem with the application, given the figures you got on the same machine with a proper latency test as illustrated below. > This is the latency test output from same machine: > # /usr/xenomai/bin/latency > == Sampling period: 1000 us > == Test mode: periodic user-mode task > == All results in microseconds > warming up... > RTT| 00:00:01 (periodic user-mode task, 1000 us period, priority 99) > RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat > worst > RTD| 25.249| 29.711| 63.749| 0| 0| 25.249| > 63.749 > RTD| 25.207| 29.589| 60.749| 0| 0| 25.207| > 63.749 > RTD| 25.207| 29.701| 61.041| 0| 0| 25.207| > 63.749 > RTD| 22.874| 29.263| 54.749| 0| 0| 22.874| > 63.749 > RTD| 25.248| 29.542| 78.373| 0| 0| 22.874| > 78.373 > RTD| 15.081| 29.050| 55.082| 0| 0| 15.081| > 78.373 > RTD| 22.873| 28.940| 57.415| 0| 0| 15.081| > 78.373 > RTD| 25.331| 28.972| 55.498| 0| 0| 15.081| > 78.373 > RTD| 24.164| 28.071| 56.498| 0| 0| 15.081| > 78.373 > ^C---|-----------|-----------|-----------|--------|------|------------------------- > RTS| 15.081| 29.204| 78.373| 0| 0| 00:00:10/00:00:10 > > > [...] > > I even tried running the whole operation inside a RT task with priority 99. Regular Linux threads cannot compete with Cobalt threads by priority in rt mode, those threads are managed by separate schedulers, and Cobalt's scheduler always runs first. There is no point in raising the priority of a Cobalt thread if no other Cobalt thread actually competes with it. > Then in this case, latency values are reduced by almost half, but > still 2-3 us higher than normal driver. > Again, please read my previous answers, I'm going to rehash them: your test does NOT measure latency as in "response time", simply because it does not wait for any event. To respond to an event, you have to wait for it first. Your test measures the execution time of a dummy write() system call. The test I provided does the measure execution time for write() AND the latency of read(), just like the "latency" test bundled with Xenomai. The values you got so far with any test are not trustworthy because: - you don't add any stress load in parallel, so your are not measuring anything close to a worst case time, - the test needs to run for much longer than a couple of seconds or even worse, iterations. It needs to run for hours under load to be meaningful. The longer it runs with well-chosen stress loads in parallel, the more trustworthy it can be. In absence of formal analysis, all we have is a probabilistic approach for getting close to the real worst-case latency figures: the only way we can get there is to hammer the target system hard, diversely and long enough while measuring. >> Once the two modules, and two test executables are built, just push the >> modules (they can live together in the kernel, no conflict), then run >> either of the executables for measuring 1) the execution time on the >> write() side, and 2) the response time on the read side. >> > > Anyways, I have build your test application and modules (using my > Makefile) and verified it > on my x86_64 skylake machine. > > Here are the results that I obtained: > > # ./posix_test ; ./cobalt_test > DEVICE: /dev/bar, all microseconds > > [ 0' 0"] RD_MIN | RD_MAX | R_AVG | WR_MIN | WR_MAX | WR_AVG > -------------------------------------------------------------- > 0 | 16 | 0.518 | 0 | 7 | 0.338 > 0 | 16 | 0.501 | 0 | 16 | 0.337 > 0 | 16 | 0.501 | 0 | 16 | 0.337 > 0 | 16 | 0.501 | 0 | 16 | 0.337 > 0 | 16 | 0.501 | 0 | 16 | 0.337 > 0 | 16 | 0.501 | 0 | 16 | 0.337 > 0 | 16 | 0.501 | 0 | 16 | 0.337 > 0 | 16 | 0.501 | 0 | 16 | 0.337 > 0 | 16 | 0.501 | 0 | 16 | 0.337 > ^C > DEVICE: /dev/rtdm/foo, all microseconds > > [ 0' 0"] RD_MIN | RD_MAX | R_AVG | WR_MIN | WR_MAX | WR_AVG > -------------------------------------------------------------- > 0 | 1 | 0.573 | 0 | 1 | 0.241 > 0 | 17 | 0.570 | 0 | 17 | 0.240 > 0 | 17 | 0.570 | 0 | 17 | 0.240 > 0 | 17 | 0.570 | 0 | 17 | 0.240 > 0 | 17 | 0.570 | 0 | 17 | 0.240 > 0 | 17 | 0.570 | 0 | 17 | 0.240 > 0 | 17 | 0.570 | 0 | 17 | 0.240 > 0 | 17 | 0.570 | 0 | 17 | 0.240 > ^C > > Here, I did not run any dd or hackbench loops. > This is just a plan run on x86 PC. Which is wrong and totally defeats the purpose of your test, see above. Really, you do want to run significant stress load in parallel to any of your test, which must last long enough to be meaningful. > > Here also it looks like read_max is higher for rtdm case. The information you have from RD_MAX after only a few seconds is meaningless, the average might be slightly more useful. It says that on average, it takes 69 nanoseconds more to run the RTDM write() syscall compared to a native one, while no other activity is eagerly trying to grab the CPU, or causing cacheline eviction. Which may definitely be the case, since Xenomai may be running more code in the syscall path in some situations. Taking the argument to the extremes, this basically tells you that you might want to use a native kernel for running empty write() system calls on an idle machine. For any other usage, you might want to consider other factors that may well happen in real world systems. Metaphorically speaking, this real-time game is not about shooting the ball through the hoop most of the time, but doing so every time a shot is taken instead, including when the shooter is facing both the non-cooperative 250 lbs power forward and 7 ft pivot from the other side. -- Philippe. _______________________________________________ Xenomai mailing list [email protected] https://xenomai.org/mailman/listinfo/xenomai
