On 05/09/2013 06:13 PM, Gilles Chanteperdrix wrote:

> On 05/02/2013 08:45 PM, Kai Bollue wrote:
> 
>> Hello,
>>
>> we experience a crash upon unbinding of a previously deleted (and 
>> cleaned up) shared heap.
>> Scheme:
>> - Process A calls rt_heap_create() (with H_SHARED flag), waits for some 
>> time and then terminates.
>> - Process B calls rt_heap_bind() on that heap, uses it and calls 
>> rt_heap_unbind() (or terminates) after process A has terminated.
>>
>> Then the system crashes after the output of "Xenomai: removing 
>> non-linked element, holder=ffffc900125e4940, qslot=ffff880427aa90f8 at 
>> kernel/xenomai/skins/native/heap.c:374".
>>
>> The crash does not always happen, but can quite reliably be reproduced 
>> by starting process A in a loop from bash (while [ TRUE ]; do ...) and 
>> keeping process B running.
>>
>> Two aspects seem to be crucial:
>> - Calling rt_heap_delete() in process A is not sufficient to reproduce 
>> the problem, the process has to terminate (the cleaning up seems to be 
>> relevant).
>> - We could only reproduce the crash as long as process B accessed the 
>> heap after process A had terminated (e.g. using memcpy).
>>
>> As a workaround, it could be tried to avoid access to a deleted heap, 
>> but it is not always possible to detect the termination of process A on 
>> time in such a constellation.
>>
>> The system:
>> - AMD AM3 FX-8350
>> - Debian 6.0
>> - Kernel 3.5.7
>> - Xenomai 2.6.2.1
>>
>> We also tested this on an older system (Xenomai 2.6.0, Kernel 2.6.37): 
>> Here, both processes hung indefinitely and could not be killed, but the 
>> system did not crash.
>>
>> Any hints are appreciated.
>>
>> Attachments:
>> - Console output
>> - Code of process A
>> - Code of process B
> 
> 
> Hi Kai,
> 
> thank you very much for your test case, it allowed to reproduce the
> issue and try and understand what happens.
> 
> From what I understand, processA creates the shared heap which is added
> to the list of the objects it holds (xeno_get_rholder()), when processA
> dies, the heap is removed from the list, but not destroyed because it is
> also bound to processB.
> 
> Then processB unbinds the heap, which triggers an auto-destruction,
> which tries to remove the heap from processA list again. If processA
> control block has not been re-used, this works, because the list is
> still there, if processA has be re-launched, the control block has been
> reinitialized, as well as the list, so removing the element from the
> list fails.
> 
> I see several possible corrections:
> - get rt_heap_delete to return an error when the heap is currently bound
> to another process (EBUSY for instance), while still unmapping it from
> the current process. This will cause __xeno_flush_rq to move the heap to
> the "global" ressource holder, where it can safely be deleted later
> - put any rt_heap with the H_MAPPABLE flag directly on the global
> ressource holder, as it is a global object anyway, this means that when
> a process which created a mappable heap dies, the heap survives, but
> this is maybe what should be expected from shareable heaps.


- or remove the rt_heap from the list directly in rt_heap_delete, it
does not seem to make sense to keep it in the list after it has been
deleted: it will be automatically deleted when the last process bound to
it unbinds it anyway.

-- 
                                                                Gilles.

_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai

Reply via email to