Le 30/07/2020 à 00:08, Jan Kiszka a écrit :
On 28.07.20 15:28, Stéphane Ancelot wrote:
Le 27/07/2020 à 15:17, Jan Kiszka a écrit :
On 27.07.20 14:44, Stéphane Ancelot via Xenomai wrote:
Hi,
Using pipe created with poolsize = 0, meaning all message
allocations for this pipe are performed on the Cobalt core heap.
Unfortunately, using rt_pipe_write(), when no user task is
consuming it, we discovered after almost many rt_pipe_write()
cycles (700000 at least in our process) , that the cobalt heap and
system heap seem being corrupted.
Leading to system issues like unattended task crashes .....
"3.x" implies both 3.1 and 3.0 are affected?
Do you see a constantly growing use of system heap (leak)? If that
is not the case, we might have some wrap-around issue somewhere.
The version we are using is based on release b3e18b6d of master
branch.
We don't sea system memory increasing (using top).
Comparing it to the latest releases, we have not found any big
differences in xddp code .
Using other releases , applications and compiled kernel does not
warranty to identify it has been solved , since the memory mapping
to reproduce it , changes.
For certifications reasons, we can't validate the latest source code,
but only cherry pick a localised hotfix in the xenomai code.
Reproduction case would be nice.
It is not easy, the initial problem was reported by one of our users
, we spent lot of time to achieve to reproduce it in our context.
Some graphics user tasks were locking or crashing after some days
usage and production .
At first, we went in wrong directions in order to identify from
where it could happen.
In our system, we had to test each code commits back....in order to
isolate the problem, and understand that it was visible after almost
700000 rt_pipe_write calls in our case.
As a unittest, we can provide the enclosed snippet.That is the
extracted code that would cause problem.
Under which condition does that test_pipe.cpp cause the issue? I've
given it a quick try, and as it's late, I disabled the delay in the
loop. That so far did not trigger an issue. Is the delay important?
The delay is not important , this is the rt_pripe_write() number of
calls, that are not consumed.
Not easy to identify the memory leak in the heap.
Either use a system with low memory.
I have not tried it, but I suppose filling system memory, at a moment it
will crash it overwriting importing system data.
Jan