[Xenomai-core] Re: 2.4 vs 2.6 in embedded space

2005-10-13 Thread Wolfgang Grandegger
On 10/12/2005 04:39 PM Philippe Gerum wrote:
 Wolfgang Grandegger wrote:
 We have linux-2.4.14-rc3 running on all AMCC eval boards (see
 http://www.denx.de). But the kernel supported by RTAI/Fusion,
 linuxppc-2.6.10rc3, does not boot on Ebony. The main problem is the
 missing support for U-Boot but there might be others. And it's simply
 not worth the effort to port it, I think.
 
 Open question: to your opinion, is 2.6 on low-end embedded hw doomed by 
 design 
 and why, or do you think that part of the reluctance to move to 2.6 is mostly 
 explained because 2.4 is just fine and up to the task, IOW it's kind of a 
 don't 
 fix if it ain't broken perception?

As Wolfgang (Denk) already pointed out, 2.6 is less attractive on low
end systems, because it's bigger, slower, ... This is also true for
Xenomai (RTAI/fusion). It's difficult to beat the latency value of the
old RTAI/RTHAL under 2.4. You need more CPU power and resources, that's
how thing are going. Nevertheless, compared to the realtime preemption
patch, Xenomai is _lightweight_ :-).

Furthermore I think, that part of the reluctance is also due to
development in progress including features like the realtime
preemption patch, especially on embedded PowerPC systems. People are
waiting that things get available and stable.

Wolfgang.




[Xenomai-core] Re: 2.4 vs 2.6 in embedded space

2005-10-13 Thread Philippe Gerum

Wolfgang Grandegger wrote:

On 10/12/2005 04:39 PM Philippe Gerum wrote:


Wolfgang Grandegger wrote:


We have linux-2.4.14-rc3 running on all AMCC eval boards (see
http://www.denx.de). But the kernel supported by RTAI/Fusion,
linuxppc-2.6.10rc3, does not boot on Ebony. The main problem is the
missing support for U-Boot but there might be others. And it's simply
not worth the effort to port it, I think.


Open question: to your opinion, is 2.6 on low-end embedded hw doomed by design 
and why, or do you think that part of the reluctance to move to 2.6 is mostly 
explained because 2.4 is just fine and up to the task, IOW it's kind of a don't 
fix if it ain't broken perception?



As Wolfgang (Denk) already pointed out, 2.6 is less attractive on low
end systems, because it's bigger, slower, ... This is also true for
Xenomai (RTAI/fusion). It's difficult to beat the latency value of the
old RTAI/RTHAL under 2.4. You need more CPU power and resources, that's
how thing are going. Nevertheless, compared to the realtime preemption
patch, Xenomai is _lightweight_ :-).


I think so too; that's the problem with strictly native real-time support in the 
kernel: you must end up with some kind of SMPish structure which virtually 
exhibits an infinite number of processors (one per task basically), so it's not 
going to help reducing the cpu footprints and the various noisy artefacts 
implied by the generalized mutex approach (which is otherwise sound, that's no 
the issue). This is also why there is still space for real-time extensions, 
provided - I think - they run as symbiotically as possible with Linux, so that 
we don't end up telling people to ignore they have Linux while running their 
apps over it.


As far as Xeno is concerned, we should be able to continue to reduce those 
footprints. From my window, I see two aspects we need to work on:

- impact of the Adeos pipelining on cache especially for hw with sluggish
memory bandwidth
- a better placement of the hot data that are accessed inside the fast interrupt 
path (mainly those of the scheduler).


Looking at the ppc figures since early 2005 or so, the raw latency has 
continuously been reduced, i.e. we went from ~120 us on a Freescale's Icecube 
running the user-space test, to 53 us as measured recently with 0.9.1+r8c4. I 
did not manage to check again on the Sandpoint (connection problem to the Vlab) 
which is very representative of the low-end hw issues we could face [and 
basically made me cry when I first looked at the latency reports], but I suspect 
that thing might have progressed there too. I've recently ported 0.9.1 over a 
Mvista kernel (experimental PREEMPT_RT-like stuff + other patches) on a mpc8541, 
and the figures for user-space are ~22 us worst-cast lat.
Of course, this is not what one would call a sluggish low-end hw and I agree 
that a more structured design like Xeno can't beat a flat ISR-based design, but 
still, in any case, I'm optimistic enough to think that we likely have a margin 
of improvement there.




Furthermore I think, that part of the reluctance is also due to
development in progress including features like the realtime
preemption patch, especially on embedded PowerPC systems. People are
waiting that things get available and stable.



Well, we might all have the same problem here...

--

Philippe.



[Xenomai-core] Re: 2.4 vs 2.6 in embedded space

2005-10-13 Thread Wolfgang Grandegger
On 10/13/2005 11:11 AM Philippe Gerum wrote:
 Wolfgang Grandegger wrote:
 On 10/12/2005 04:39 PM Philippe Gerum wrote:
 
Wolfgang Grandegger wrote:

We have linux-2.4.14-rc3 running on all AMCC eval boards (see
http://www.denx.de). But the kernel supported by RTAI/Fusion,
linuxppc-2.6.10rc3, does not boot on Ebony. The main problem is the
missing support for U-Boot but there might be others. And it's simply
not worth the effort to port it, I think.

Open question: to your opinion, is 2.6 on low-end embedded hw doomed by 
design 
and why, or do you think that part of the reluctance to move to 2.6 is 
mostly 
explained because 2.4 is just fine and up to the task, IOW it's kind of a 
don't 
fix if it ain't broken perception?
 
 
 As Wolfgang (Denk) already pointed out, 2.6 is less attractive on low
 end systems, because it's bigger, slower, ... This is also true for
 Xenomai (RTAI/fusion). It's difficult to beat the latency value of the
 old RTAI/RTHAL under 2.4. You need more CPU power and resources, that's
 how thing are going. Nevertheless, compared to the realtime preemption
 patch, Xenomai is _lightweight_ :-).
 
 I think so too; that's the problem with strictly native real-time support in 
 the 
 kernel: you must end up with some kind of SMPish structure which virtually 
 exhibits an infinite number of processors (one per task basically), so it's 
 not 
 going to help reducing the cpu footprints and the various noisy artefacts 
 implied by the generalized mutex approach (which is otherwise sound, that's 
 no 
 the issue). This is also why there is still space for real-time extensions, 
 provided - I think - they run as symbiotically as possible with Linux, so 
 that 
 we don't end up telling people to ignore they have Linux while running their 
 apps over it.

I agree and I'm really interested to get the benchmark comparison tests
http://www.opersys.com/lrtbf/index.html running on a low-end PowerPC system.

 
 As far as Xeno is concerned, we should be able to continue to reduce those 
 footprints. From my window, I see two aspects we need to work on:
 - impact of the Adeos pipelining on cache especially for hw with sluggish
 memory bandwidth
 - a better placement of the hot data that are accessed inside the fast 
 interrupt 
 path (mainly those of the scheduler).

That would be nice, indeed. I also understood, that iPIPE is already
lighter than ADEOS.

 Looking at the ppc figures since early 2005 or so, the raw latency has 
 continuously been reduced, i.e. we went from ~120 us on a Freescale's Icecube 
 running the user-space test, to 53 us as measured recently with 0.9.1+r8c4. I 
 did not manage to check again on the Sandpoint (connection problem to the 
 Vlab) 
 which is very representative of the low-end hw issues we could face [and 
 basically made me cry when I first looked at the latency reports], but I 
 suspect 
 that thing might have progressed there too. I've recently ported 0.9.1 over a 
 Mvista kernel (experimental PREEMPT_RT-like stuff + other patches) on a 
 mpc8541, 
 and the figures for user-space are ~22 us worst-cast lat.
 Of course, this is not what one would call a sluggish low-end hw and I agree 
 that a more structured design like Xeno can't beat a flat ISR-based design, 
 but 
 still, in any case, I'm optimistic enough to think that we likely have a 
 margin 
 of improvement there.

When the iPIPE-Patch for PowerPC is available for a recent 2.6 kernel
version, I could run benchmark tests on various PowerPC systems, e.g. on
4xx processors from AMCC, including a rather low-end 405 at 200 MHz.

 Furthermore I think, that part of the reluctance is also due to
 development in progress including features like the realtime
 preemption patch, especially on embedded PowerPC systems. People are
 waiting that things get available and stable.
 
 
 Well, we might all have the same problem here...

Wolfgang.





[Xenomai-core] Re: 2.4 vs 2.6 in embedded space

2005-10-13 Thread Philippe Gerum

Wolfgang Grandegger wrote:

On 10/13/2005 11:11 AM Philippe Gerum wrote:


Wolfgang Grandegger wrote:


On 10/12/2005 04:39 PM Philippe Gerum wrote:



Wolfgang Grandegger wrote:



We have linux-2.4.14-rc3 running on all AMCC eval boards (see
http://www.denx.de). But the kernel supported by RTAI/Fusion,
linuxppc-2.6.10rc3, does not boot on Ebony. The main problem is the
missing support for U-Boot but there might be others. And it's simply
not worth the effort to port it, I think.


Open question: to your opinion, is 2.6 on low-end embedded hw doomed by design 
and why, or do you think that part of the reluctance to move to 2.6 is mostly 
explained because 2.4 is just fine and up to the task, IOW it's kind of a don't 
fix if it ain't broken perception?



As Wolfgang (Denk) already pointed out, 2.6 is less attractive on low
end systems, because it's bigger, slower, ... This is also true for
Xenomai (RTAI/fusion). It's difficult to beat the latency value of the
old RTAI/RTHAL under 2.4. You need more CPU power and resources, that's
how thing are going. Nevertheless, compared to the realtime preemption
patch, Xenomai is _lightweight_ :-).


I think so too; that's the problem with strictly native real-time support in the 
kernel: you must end up with some kind of SMPish structure which virtually 
exhibits an infinite number of processors (one per task basically), so it's not 
going to help reducing the cpu footprints and the various noisy artefacts 
implied by the generalized mutex approach (which is otherwise sound, that's no 
the issue). This is also why there is still space for real-time extensions, 
provided - I think - they run as symbiotically as possible with Linux, so that 
we don't end up telling people to ignore they have Linux while running their 
apps over it.



I agree and I'm really interested to get the benchmark comparison tests
http://www.opersys.com/lrtbf/index.html running on a low-end PowerPC system.



Actually, those results are pretty bad compared to what we have now x86-wise: a 
dual 750 Mhz exhibits a worst-case latency of 42 us, and a dual 2.4 Ghz is under 
the 20 us thereshold, which includes a complete tasking in user-space, which was 
not accounted for in these tests.


It's one reason more to have this benchmarking infrastructure, so that the 
numbers keep being updated regularly, whichever way they are progressing/regressing.




As far as Xeno is concerned, we should be able to continue to reduce those 
footprints. From my window, I see two aspects we need to work on:

- impact of the Adeos pipelining on cache especially for hw with sluggish
memory bandwidth
- a better placement of the hot data that are accessed inside the fast interrupt 
path (mainly those of the scheduler).



That would be nice, indeed. I also understood, that iPIPE is already
lighter than ADEOS.



It is, yes. It has been alleviated from all the cruft needed to have it as a 
module on option, which was a genuinely BAD IDEA (tm, (C) 2002 rpm, ludicrous 
patent pending hey, guys, I'm _kidding_). The arch that would benefit the most 
of the implied simplifications is x86 since this also solves a design issue 
there, but at least we now have a saner ground to build over and optimize it for 
other archs.




Looking at the ppc figures since early 2005 or so, the raw latency has 
continuously been reduced, i.e. we went from ~120 us on a Freescale's Icecube 
running the user-space test, to 53 us as measured recently with 0.9.1+r8c4. I 
did not manage to check again on the Sandpoint (connection problem to the Vlab) 
which is very representative of the low-end hw issues we could face [and 
basically made me cry when I first looked at the latency reports], but I suspect 
that thing might have progressed there too. I've recently ported 0.9.1 over a 
Mvista kernel (experimental PREEMPT_RT-like stuff + other patches) on a mpc8541, 
and the figures for user-space are ~22 us worst-cast lat.
Of course, this is not what one would call a sluggish low-end hw and I agree 
that a more structured design like Xeno can't beat a flat ISR-based design, but 
still, in any case, I'm optimistic enough to think that we likely have a margin 
of improvement there.



When the iPIPE-Patch for PowerPC is available for a recent 2.6 kernel
version, I could run benchmark tests on various PowerPC systems, e.g. on
4xx processors from AMCC, including a rather low-end 405 at 200 MHz.



It mostly runs already, I just need to figure out why Xenomai's klatency test 
breaks my IceCube instead of quietly running like the latency one does...





Furthermore I think, that part of the reluctance is also due to
development in progress including features like the realtime
preemption patch, especially on embedded PowerPC systems. People are
waiting that things get available and stable.



Well, we might all have the same problem here...



Wolfgang.






--

Philippe.