On 09.05.2013 18:31, Gilles Chanteperdrix wrote:
On 05/09/2013 06:13 PM, Gilles Chanteperdrix wrote:

On 05/02/2013 08:45 PM, Kai Bollue wrote:

Hello,

we experience a crash upon unbinding of a previously deleted (and
cleaned up) shared heap.
Scheme:
- Process A calls rt_heap_create() (with H_SHARED flag), waits for some
time and then terminates.
- Process B calls rt_heap_bind() on that heap, uses it and calls
rt_heap_unbind() (or terminates) after process A has terminated.

Then the system crashes after the output of "Xenomai: removing
non-linked element, holder=ffffc900125e4940, qslot=ffff880427aa90f8 at
kernel/xenomai/skins/native/heap.c:374".

The crash does not always happen, but can quite reliably be reproduced
by starting process A in a loop from bash (while [ TRUE ]; do ...) and
keeping process B running.

Two aspects seem to be crucial:
- Calling rt_heap_delete() in process A is not sufficient to reproduce
the problem, the process has to terminate (the cleaning up seems to be
relevant).
- We could only reproduce the crash as long as process B accessed the
heap after process A had terminated (e.g. using memcpy).

As a workaround, it could be tried to avoid access to a deleted heap,
but it is not always possible to detect the termination of process A on
time in such a constellation.

The system:
- AMD AM3 FX-8350
- Debian 6.0
- Kernel 3.5.7
- Xenomai 2.6.2.1

We also tested this on an older system (Xenomai 2.6.0, Kernel 2.6.37):
Here, both processes hung indefinitely and could not be killed, but the
system did not crash.

Any hints are appreciated.

Attachments:
- Console output
- Code of process A
- Code of process B

Hi Kai,

thank you very much for your test case, it allowed to reproduce the
issue and try and understand what happens.

 From what I understand, processA creates the shared heap which is added
to the list of the objects it holds (xeno_get_rholder()), when processA
dies, the heap is removed from the list, but not destroyed because it is
also bound to processB.

Then processB unbinds the heap, which triggers an auto-destruction,
which tries to remove the heap from processA list again. If processA
control block has not been re-used, this works, because the list is
still there, if processA has be re-launched, the control block has been
reinitialized, as well as the list, so removing the element from the
list fails.

Hi Gilles,

thank you very much for your analysis and suggestions.

I see several possible corrections:
- get rt_heap_delete to return an error when the heap is currently bound
to another process (EBUSY for instance), while still unmapping it from
the current process. This will cause __xeno_flush_rq to move the heap to
the "global" ressource holder, where it can safely be deleted later

I am not sure if this is the best solution as the the heap object itself can actually be deleted, only the underlying xnheap remains.

- put any rt_heap with the H_MAPPABLE flag directly on the global
ressource holder, as it is a global object anyway, this means that when
a process which created a mappable heap dies, the heap survives, but
this is maybe what should be expected from shareable heaps.

This is probably better, but:


- or remove the rt_heap from the list directly in rt_heap_delete, it
does not seem to make sense to keep it in the list after it has been
deleted: it will be automatically deleted when the last process bound to
it unbinds it anyway.


This is IMHO the most consistent solution. With the following change, we cannot reproduce the crash anymore:

diff --git a/ksrc/skins/native/heap.c b/ksrc/skins/native/heap.c
index 4a39d07..be4aee9 100644
--- a/ksrc/skins/native/heap.c
+++ b/ksrc/skins/native/heap.c
@@ -371,8 +371,6 @@ static void __heap_post_release(struct xnheap *h)

        xnlock_get_irqsave(&nklock, s);

-       removeq(heap->rqueue, &heap->rlink);
-
        if (heap->handle)
                xnregistry_remove(heap->handle);

@@ -442,6 +440,8 @@ int rt_heap_delete_inner(RT_HEAP *heap, void __user *mapaddr)

        xeno_mark_deleted(heap);

+       removeq(heap->rqueue, &heap->rlink);
+
        /* Get out of the nklocked section before releasing the heap
           memory, since we are about to invoke Linux kernel
           services. */


Thank you for your help.
Best regards,
Kai

_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai

Reply via email to