On 02/28/2013 09:30 PM, Ronny Meeus wrote:
> On Thu, Feb 28, 2013 at 9:10 PM, Gilles Chanteperdrix
> <[email protected]> wrote:
>> On 02/28/2013 08:19 PM, Ronny Meeus wrote:
>>
>>> Hello
>>>
>>> we are using the PSOS interface of Xenomai forge, running completely
>>> in user-space using the mercury code.
>>> We deploy our application on different processors, one product is
>>> running on PPC multicore (P4040, P4080, P4034) and another one on
>>> Cavium (8 core device).
>>> The Linux version we use is 2.6.32 but I would assume that this is not
>>> so relevant.
>>>
>>> Our Xenomai application is running on one of the cores (affinity is
>>> set), while the other cores are running other code.
>>>
>>> On both architectures we recently start to see issues that one thread
>>> is consuming 100% of the core on which the application is pinned.
>>> The thread that monopolizes the core is the thread internally used to
>>> manage the timers, running at the highest priority.
>>> The trigger for running into this behavior is currently unclear.
>>> If we only start a part of the application (platform management only),
>>> the issue is not observed.
>>> We see this on both an old version of Xenomai and a very recent one
>>> (pulled from the git repo yesterday).
>>>
>>> I will continue to debug this issue in the coming days and try isolate
>>> the code that is triggering it, but I can use hints from the
>>> community.
>>> Debugging is complex since once the load starts, the debugger is not
>>> reacting anymore.
>>> If I put breakpoints in the functions that are called when the timer
>>> expires (both oneshot and periodic), the process starts to clone
>>> itself and I endup with tens of them.
>>>
>>> Has anybody seen an issue like this before or does somebody has some
>>> hints on how to debug this problem?
>>
>>
>> First enable the watchdog. It will send a signal to the application when
>> detecting a problem, then you can use the watchdog to trigger an I-pipe
>> tracer trace when the bug happens. You will probably have to increase
>> the watchdog polling frequency, in order to have a meaningful trace.
>>
>> --
>> Gilles.
>
> Gilles,
>
> We are running completely in user-space (mercury) .
cobalt also runs in user-space.
> I thought that the watchdog and I-pipe tracer are only relevant when
> using the cobalt code.
> In case my assumption is wrong, please correct me and let me know how
> to enable it.
Yes, if you are using plain linux, there are even more tools to debug
the problem:
- you can enable RT throttling to avoid the machine lockup by the buggy
thread
- you can enable the kernel detection for just your case
(CONFIG_LOCKUP_DETECTOR)
- if you are on x86 you can use the NMI watchdog
- you can use FTRACE instead of the I-pipe tracer
- or you can decide to compile the kernel with CONFIG_IPIPE and
CONFIG_IPIPE_TRACE to use the I-pipe tracer without Xenomai.
- maybe xenomai-forge's "slackspot" tool works for mecury?
--
Gilles.
_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai