Hi,

I'm afraid I got a little stuck looking into upstream ticket
https://pagure.io/SSSD/sssd/issue/3465

The reporter is seeing sssd memory usage increasing on RHEL-6 and
RHEL-7. There is a valgrind log from RHEL-6 attached to the ticket which
does show some leaks, the three biggest ones are:

==14913== 4,715,355 (74,383 direct, 4,640,972 indirect) bytes in 599 blocks are 
definitely lost in loss record 1,113 of 1,115
==14913==    at 0x4C28A2E: malloc (vg_replace_malloc.c:270)
==14913==    by 0x8D76DCA: _talloc_array (talloc.c:668)
==14913==    by 0x56D0A48: ldb_val_dup (ldb_msg.c:106)
==14913==    by 0x52668B4: sysdb_attrs_add_val_int (sysdb.c:555)
==14913==    by 0x1264C964: sdap_parse_entry (sdap.c:576)
==14913==    by 0x1261C702: sdap_get_and_parse_generic_parse_entry 
(sdap_async.c:1749)
==14913==    by 0x1261FEB3: sdap_get_generic_op_finished (sdap_async.c:1487)
==14913==    by 0x1262155E: sdap_process_result (sdap_async.c:352)
==14913==    by 0x8B6DEA5: epoll_event_loop_once (tevent_epoll.c:728)
==14913==    by 0x8B6C2D5: std_event_loop_once (tevent_standard.c:114)
==14913==    by 0x8B67C3C: _tevent_loop_once (tevent.c:533)
==14913==    by 0x8B67CBA: tevent_common_loop_wait (tevent.c:637)
==14913== 
==14913== 7,998,231 bytes in 64,275 blocks are indirectly lost in loss record 
1,114 of 1,115
==14913==    at 0x4C28A2E: malloc (vg_replace_malloc.c:270)
==14913==    by 0x8D76DCA: _talloc_array (talloc.c:668)
==14913==    by 0x56D0A48: ldb_val_dup (ldb_msg.c:106)
==14913==    by 0x52668B4: sysdb_attrs_add_val_int (sysdb.c:555)
==14913==    by 0x1264C964: sdap_parse_entry (sdap.c:576)
==14913==    by 0x1261C702: sdap_get_and_parse_generic_parse_entry 
(sdap_async.c:1749)
==14913==    by 0x1261FEB3: sdap_get_generic_op_finished (sdap_async.c:1487)
==14913==    by 0x1262155E: sdap_process_result (sdap_async.c:352)
==14913==    by 0x8B6DEA5: epoll_event_loop_once (tevent_epoll.c:728)
==14913==    by 0x8B6C2D5: std_event_loop_once (tevent_standard.c:114)
==14913==    by 0x8B67C3C: _tevent_loop_once (tevent.c:533)
==14913==    by 0x8B67CBA: tevent_common_loop_wait (tevent.c:637)
==14913== 
==14913== 33,554,430 bytes in 2 blocks are still reachable in loss record 1,115 
of 1,115
==14913==    at 0x4C28A2E: malloc (vg_replace_malloc.c:270)
==14913==    by 0x9FA4CCD: _plug_decode (plugin_common.c:666)
==14913==    by 0x1453036A: gssapi_decode (gssapi.c:497)
==14913==    by 0x9F9B783: sasl_decode (common.c:621)
==14913==    by 0x69AC69C: sb_sasl_cyrus_decode (cyrus.c:188)
==14913==    by 0x69AEF2F: sb_sasl_generic_read (sasl.c:711)
==14913==    by 0x6790AEB: sb_debug_read (sockbuf.c:829)
==14913==    by 0x67906AE: ber_int_sb_read (sockbuf.c:423)
==14913==    by 0x678D789: ber_get_next (io.c:532)
==14913==    by 0x69A6D4D: wait4msg (result.c:491)
==14913==    by 0x12620EB4: sdap_process_result (sdap_async.c:165)
==14913==    by 0x8B6DEA5: epoll_event_loop_once (tevent_epoll.c:728)

The biggest one almost looks like a leak in libldap, because this line:
    ==14913==    by 0x12620EB4: sdap_process_result (sdap_async.c:165)
is a call to ldap_result().

But even the other leaks make little sense to me, like this one:
==14913== 7,998,231 bytes in 64,275 blocks are indirectly lost in loss record 
1,114 of 1,115
==14913==    at 0x4C28A2E: malloc (vg_replace_malloc.c:270)
==14913==    by 0x8D76DCA: _talloc_array (talloc.c:668)
==14913==    by 0x56D0A48: ldb_val_dup (ldb_msg.c:106)
==14913==    by 0x52668B4: sysdb_attrs_add_val_int (sysdb.c:555)
==14913==    by 0x1264C964: sdap_parse_entry (sdap.c:576)
==14913==    by 0x1261C702: sdap_get_and_parse_generic_parse_entry 
(sdap_async.c:1749)
==14913==    by 0x1261FEB3: sdap_get_generic_op_finished (sdap_async.c:1487)
==14913==    by 0x1262155E: sdap_process_result (sdap_async.c:352)
==14913==    by 0x8B6DEA5: epoll_event_loop_once (tevent_epoll.c:728)
==14913==    by 0x8B6C2D5: std_event_loop_once (tevent_standard.c:114)
==14913==    by 0x8B67C3C: _tevent_loop_once (tevent.c:533)
==14913==    by 0x8B67CBA: tevent_common_loop_wait (tevent.c:637)

In sdap_parse_entry, we allocate sysdb_attrs on state and all the internal
ldb structures hang off sysdb_attrs. The context is stolen to caller's
provided context which is typically state of a tevent request which is
passed upwards. So the only idea I have is that somewhere, we steal the
request to NULL.. But I don't know where or how to look for this.

Also, it seems odd that with so many people running sssd, this is the
only user (so far?) who reported this issue.

Does anyone have an idea how to continue?
_______________________________________________
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org

Reply via email to