On Wed, Aug 30, 2017 at 05:30:02PM +0200, Jakub Hrozek wrote: > Hi, > > I'm afraid I got a little stuck looking into upstream ticket > https://pagure.io/SSSD/sssd/issue/3465 > > The reporter is seeing sssd memory usage increasing on RHEL-6 and > RHEL-7. There is a valgrind log from RHEL-6 attached to the ticket which > does show some leaks, the three biggest ones are: > > ==14913== 4,715,355 (74,383 direct, 4,640,972 indirect) bytes in 599 blocks > are definitely lost in loss record 1,113 of 1,115 > ==14913== at 0x4C28A2E: malloc (vg_replace_malloc.c:270) > ==14913== by 0x8D76DCA: _talloc_array (talloc.c:668) > ==14913== by 0x56D0A48: ldb_val_dup (ldb_msg.c:106) > ==14913== by 0x52668B4: sysdb_attrs_add_val_int (sysdb.c:555) > ==14913== by 0x1264C964: sdap_parse_entry (sdap.c:576) > ==14913== by 0x1261C702: sdap_get_and_parse_generic_parse_entry > (sdap_async.c:1749) > ==14913== by 0x1261FEB3: sdap_get_generic_op_finished (sdap_async.c:1487) > ==14913== by 0x1262155E: sdap_process_result (sdap_async.c:352) > ==14913== by 0x8B6DEA5: epoll_event_loop_once (tevent_epoll.c:728) > ==14913== by 0x8B6C2D5: std_event_loop_once (tevent_standard.c:114) > ==14913== by 0x8B67C3C: _tevent_loop_once (tevent.c:533) > ==14913== by 0x8B67CBA: tevent_common_loop_wait (tevent.c:637) > ==14913== > ==14913== 7,998,231 bytes in 64,275 blocks are indirectly lost in loss record > 1,114 of 1,115 > ==14913== at 0x4C28A2E: malloc (vg_replace_malloc.c:270) > ==14913== by 0x8D76DCA: _talloc_array (talloc.c:668) > ==14913== by 0x56D0A48: ldb_val_dup (ldb_msg.c:106) > ==14913== by 0x52668B4: sysdb_attrs_add_val_int (sysdb.c:555) > ==14913== by 0x1264C964: sdap_parse_entry (sdap.c:576) > ==14913== by 0x1261C702: sdap_get_and_parse_generic_parse_entry > (sdap_async.c:1749) > ==14913== by 0x1261FEB3: sdap_get_generic_op_finished (sdap_async.c:1487) > ==14913== by 0x1262155E: sdap_process_result (sdap_async.c:352) > ==14913== by 0x8B6DEA5: epoll_event_loop_once (tevent_epoll.c:728) > ==14913== by 0x8B6C2D5: std_event_loop_once (tevent_standard.c:114) > ==14913== by 0x8B67C3C: _tevent_loop_once (tevent.c:533) > ==14913== by 0x8B67CBA: tevent_common_loop_wait (tevent.c:637) > ==14913== > ==14913== 33,554,430 bytes in 2 blocks are still reachable in loss record > 1,115 of 1,115 > ==14913== at 0x4C28A2E: malloc (vg_replace_malloc.c:270) > ==14913== by 0x9FA4CCD: _plug_decode (plugin_common.c:666) > ==14913== by 0x1453036A: gssapi_decode (gssapi.c:497) > ==14913== by 0x9F9B783: sasl_decode (common.c:621) > ==14913== by 0x69AC69C: sb_sasl_cyrus_decode (cyrus.c:188) > ==14913== by 0x69AEF2F: sb_sasl_generic_read (sasl.c:711) > ==14913== by 0x6790AEB: sb_debug_read (sockbuf.c:829) > ==14913== by 0x67906AE: ber_int_sb_read (sockbuf.c:423) > ==14913== by 0x678D789: ber_get_next (io.c:532) > ==14913== by 0x69A6D4D: wait4msg (result.c:491) > ==14913== by 0x12620EB4: sdap_process_result (sdap_async.c:165) > ==14913== by 0x8B6DEA5: epoll_event_loop_once (tevent_epoll.c:728) > > The biggest one almost looks like a leak in libldap, because this line: > ==14913== by 0x12620EB4: sdap_process_result (sdap_async.c:165) > is a call to ldap_result(). > > But even the other leaks make little sense to me, like this one: > ==14913== 7,998,231 bytes in 64,275 blocks are indirectly lost in loss record > 1,114 of 1,115 > ==14913== at 0x4C28A2E: malloc (vg_replace_malloc.c:270) > ==14913== by 0x8D76DCA: _talloc_array (talloc.c:668) > ==14913== by 0x56D0A48: ldb_val_dup (ldb_msg.c:106) > ==14913== by 0x52668B4: sysdb_attrs_add_val_int (sysdb.c:555) > ==14913== by 0x1264C964: sdap_parse_entry (sdap.c:576) > ==14913== by 0x1261C702: sdap_get_and_parse_generic_parse_entry > (sdap_async.c:1749) > ==14913== by 0x1261FEB3: sdap_get_generic_op_finished (sdap_async.c:1487) > ==14913== by 0x1262155E: sdap_process_result (sdap_async.c:352) > ==14913== by 0x8B6DEA5: epoll_event_loop_once (tevent_epoll.c:728) > ==14913== by 0x8B6C2D5: std_event_loop_once (tevent_standard.c:114) > ==14913== by 0x8B67C3C: _tevent_loop_once (tevent.c:533) > ==14913== by 0x8B67CBA: tevent_common_loop_wait (tevent.c:637) > > In sdap_parse_entry, we allocate sysdb_attrs on state and all the internal > ldb structures hang off sysdb_attrs. The context is stolen to caller's > provided context which is typically state of a tevent request which is > passed upwards. So the only idea I have is that somewhere, we steal the > request to NULL.. But I don't know where or how to look for this. > > Also, it seems odd that with so many people running sssd, this is the > only user (so far?) who reported this issue.
According to the ps output in the ticket the increase is not that big 122520 -> 195176 in RSS after 48h. I guess this might be the reason many people do not recognize the increase. I did a small test where I loaded the members of a larger group in a plain LDAP setup and saw an increase from 16100 to 42224 which stayed over night. I try to get more data from my setup. bye, Sumit > > Does anyone have an idea how to continue? > _______________________________________________ > sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org > To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org _______________________________________________ sssd-devel mailing list -- sssd-devel@lists.fedorahosted.org To unsubscribe send an email to sssd-devel-le...@lists.fedorahosted.org