[SSSD] Re: stuck with ticket #3465

Sumit Bose Wed, 30 Aug 2017 23:25:19 -0700

On Wed, Aug 30, 2017 at 05:30:02PM +0200, Jakub Hrozek wrote:
> Hi,
> 
> I'm afraid I got a little stuck looking into upstream ticket
> https://pagure.io/SSSD/sssd/issue/3465
> 
> The reporter is seeing sssd memory usage increasing on RHEL-6 and
> RHEL-7. There is a valgrind log from RHEL-6 attached to the ticket which
> does show some leaks, the three biggest ones are:
> 
> ==14913== 4,715,355 (74,383 direct, 4,640,972 indirect) bytes in 599 blocks 
> are definitely lost in loss record 1,113 of 1,115
> ==14913==    at 0x4C28A2E: malloc (vg_replace_malloc.c:270)
> ==14913==    by 0x8D76DCA: _talloc_array (talloc.c:668)
> ==14913==    by 0x56D0A48: ldb_val_dup (ldb_msg.c:106)
> ==14913==    by 0x52668B4: sysdb_attrs_add_val_int (sysdb.c:555)
> ==14913==    by 0x1264C964: sdap_parse_entry (sdap.c:576)
> ==14913==    by 0x1261C702: sdap_get_and_parse_generic_parse_entry 
> (sdap_async.c:1749)
> ==14913==    by 0x1261FEB3: sdap_get_generic_op_finished (sdap_async.c:1487)
> ==14913==    by 0x1262155E: sdap_process_result (sdap_async.c:352)
> ==14913==    by 0x8B6DEA5: epoll_event_loop_once (tevent_epoll.c:728)
> ==14913==    by 0x8B6C2D5: std_event_loop_once (tevent_standard.c:114)
> ==14913==    by 0x8B67C3C: _tevent_loop_once (tevent.c:533)
> ==14913==    by 0x8B67CBA: tevent_common_loop_wait (tevent.c:637)
> ==14913== 
> ==14913== 7,998,231 bytes in 64,275 blocks are indirectly lost in loss record 
> 1,114 of 1,115
> ==14913==    at 0x4C28A2E: malloc (vg_replace_malloc.c:270)
> ==14913==    by 0x8D76DCA: _talloc_array (talloc.c:668)
> ==14913==    by 0x56D0A48: ldb_val_dup (ldb_msg.c:106)
> ==14913==    by 0x52668B4: sysdb_attrs_add_val_int (sysdb.c:555)
> ==14913==    by 0x1264C964: sdap_parse_entry (sdap.c:576)
> ==14913==    by 0x1261C702: sdap_get_and_parse_generic_parse_entry 
> (sdap_async.c:1749)
> ==14913==    by 0x1261FEB3: sdap_get_generic_op_finished (sdap_async.c:1487)
> ==14913==    by 0x1262155E: sdap_process_result (sdap_async.c:352)
> ==14913==    by 0x8B6DEA5: epoll_event_loop_once (tevent_epoll.c:728)
> ==14913==    by 0x8B6C2D5: std_event_loop_once (tevent_standard.c:114)
> ==14913==    by 0x8B67C3C: _tevent_loop_once (tevent.c:533)
> ==14913==    by 0x8B67CBA: tevent_common_loop_wait (tevent.c:637)
> ==14913== 
> ==14913== 33,554,430 bytes in 2 blocks are still reachable in loss record 
> 1,115 of 1,115
> ==14913==    at 0x4C28A2E: malloc (vg_replace_malloc.c:270)
> ==14913==    by 0x9FA4CCD: _plug_decode (plugin_common.c:666)
> ==14913==    by 0x1453036A: gssapi_decode (gssapi.c:497)
> ==14913==    by 0x9F9B783: sasl_decode (common.c:621)
> ==14913==    by 0x69AC69C: sb_sasl_cyrus_decode (cyrus.c:188)
> ==14913==    by 0x69AEF2F: sb_sasl_generic_read (sasl.c:711)
> ==14913==    by 0x6790AEB: sb_debug_read (sockbuf.c:829)
> ==14913==    by 0x67906AE: ber_int_sb_read (sockbuf.c:423)
> ==14913==    by 0x678D789: ber_get_next (io.c:532)
> ==14913==    by 0x69A6D4D: wait4msg (result.c:491)
> ==14913==    by 0x12620EB4: sdap_process_result (sdap_async.c:165)
> ==14913==    by 0x8B6DEA5: epoll_event_loop_once (tevent_epoll.c:728)
> 
> The biggest one almost looks like a leak in libldap, because this line:
>     ==14913==    by 0x12620EB4: sdap_process_result (sdap_async.c:165)
> is a call to ldap_result().
> 
> But even the other leaks make little sense to me, like this one:
> ==14913== 7,998,231 bytes in 64,275 blocks are indirectly lost in loss record 
> 1,114 of 1,115
> ==14913==    at 0x4C28A2E: malloc (vg_replace_malloc.c:270)
> ==14913==    by 0x8D76DCA: _talloc_array (talloc.c:668)
> ==14913==    by 0x56D0A48: ldb_val_dup (ldb_msg.c:106)
> ==14913==    by 0x52668B4: sysdb_attrs_add_val_int (sysdb.c:555)
> ==14913==    by 0x1264C964: sdap_parse_entry (sdap.c:576)
> ==14913==    by 0x1261C702: sdap_get_and_parse_generic_parse_entry 
> (sdap_async.c:1749)
> ==14913==    by 0x1261FEB3: sdap_get_generic_op_finished (sdap_async.c:1487)
> ==14913==    by 0x1262155E: sdap_process_result (sdap_async.c:352)
> ==14913==    by 0x8B6DEA5: epoll_event_loop_once (tevent_epoll.c:728)
> ==14913==    by 0x8B6C2D5: std_event_loop_once (tevent_standard.c:114)
> ==14913==    by 0x8B67C3C: _tevent_loop_once (tevent.c:533)
> ==14913==    by 0x8B67CBA: tevent_common_loop_wait (tevent.c:637)
> 
> In sdap_parse_entry, we allocate sysdb_attrs on state and all the internal
> ldb structures hang off sysdb_attrs. The context is stolen to caller's
> provided context which is typically state of a tevent request which is
> passed upwards. So the only idea I have is that somewhere, we steal the
> request to NULL.. But I don't know where or how to look for this.
> 
> Also, it seems odd that with so many people running sssd, this is the
> only user (so far?) who reported this issue.


According to the ps output in the ticket the increase is not that big
122520 -> 195176 in RSS after 48h. I guess this might be the reason many
people do not recognize the increase.

I did a small test where I loaded the members of a larger group in a
plain LDAP setup and saw an increase from 16100 to 42224 which stayed
over night. I try to get more data from my setup.

bye,
Sumit

> 
> Does anyone have an idea how to continue?
> _______________________________________________
> sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
> To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org
_______________________________________________
sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org
To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org

[SSSD] Re: stuck with ticket #3465

Reply via email to