Hi Dan, made some progress on the investigation (not definitive, but
still it helps us to continue with the SRU process).

By using Valgrind memcheck analyzer I couldn't observe any non-constant
leaks after Seyeong's patch gets applied. Then, I started to use two
more analyzers in order to obtain more information of lrmd's memory
behavior: DHAT and massif, both contained in Valgrind's pack of
analyzers.

DHAT (dynamic heap analyzer tool) allowed me to confirm some hypothesis.
By running the experiment    3 times, the first for 10 minutes, then for
20 minutes and finally for 30 minutes, I observed that:

1) lrmd allocated a total of 20,671,285 bytes during 10 minutes,
38,486,917 in the 20 minute run and 56,338,077 when running for 30
minutes;

2) In all the 3 cases, the total leaked memory (from Glib library) was
104,146 bytes, a constant value;

3) Also, it measured that in all cases 20,673 bytes have lived for more
than half of the run, meaning the application is allocating and de-
allocating memory in reduced intervals of time, not keeping allocated
memory until its end (when it would free all chunks);

Number #1 above indicates the total memory allocated - it doesn't mean
this entire amount was living at same time. It basically sums all the
calls to malloc-like functions during the program execution.

Number #2 indicates we have a constant amount of leaked memory, that is
not increasing and so is not responsible for the slow memory increase
we're observing.

Finally, number #3 shows us that this is not a case of a program
allocating memory constantly and only de-allocating all chunks in the
end of application at once. This was one of my hypothesis, now proved
false.

That said, it's clear that heap-wise the application is not leaking an
increasing amount of memory. From the stack point-of-view, by running
application through massif analyzer it's possible to observe the stack
behavior - the maximum size from stack was 5008 bytes, the minimum was
744 bytes. It floated between those 2 limits in a non-constant ratio,
meaning it had increased and decreased over time, multiple times. This
proves the stack has not much influence in the issue.

So, after that I started observing the /proc/smaps of the application,
and it showed an important data point: the "area" that is growing is an
anonymous non-heap map, so it was allocated through the mmap() syscall.
Valgrind cannot capture mmap() syscalls, so it's likely to miss a
possible leak if the memory in question was allocated through mmap(). By
"stracing" the application, I saw many mmap() calls, more then munmap().
And by inspecting GLib code, I could see mmap() calls there (whereas
lrmd code has none itself). So, it could be a GLib wrapper causing this
slow increase of memory.

My last hypothesis is memory fragmentation, but I'd like to first
exclude or confirm the mmap() idea before going with memory
fragmentation hypothesis.

That all said, I believe we should continue the SRU process since Seyeong's 
patch was proved a valid fix for the heap leaks we had. I intend to continue 
the investigation to understand exactly what kind of memory behavior lrmd has 
to justify this slow but steady memory growth now.
Thanks,

Guilherme

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1316970

Title:
  g_dbus memory leak in lrmd

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/pacemaker/+bug/1316970/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to