On Tue, Oct 06, 2015 at 12:54:41PM +0200, Jakub Hrozek wrote: > On Mon, Oct 05, 2015 at 04:46:55PM +0200, Jakub Hrozek wrote: > > On Mon, Oct 05, 2015 at 12:33:48PM +0200, Sumit Bose wrote: > > > On Mon, Oct 05, 2015 at 12:08:26PM +0200, Jakub Hrozek wrote: > > > > On Fri, Oct 02, 2015 at 10:32:27AM +0200, Jakub Hrozek wrote: > > > > > On Thu, Oct 01, 2015 at 01:41:07PM +0200, Jakub Hrozek wrote: > > > > > > I don't have a good idea how to reproduce except simulate the > > > > > > failure in > > > > > > gdb, sorry... at least I verified that after setting the context to > > > > > > NULL: > > > > > > (gdb) set clist[1] = 0 > > > > > > the request runs to completion and SSSD doesn't crash anymore. So I > > > > > > think > > > > > > we should just add a check.. > > > > > > > > > > Turns out this is easier to reproduce and hence more embarassing. Just > > > > > add POSIX attributes to AD but don't replicate them to GC.. > > > > > > > > Any takers for review? > > > > > > (I already took it some time ago in patchworks :-) > > > > > > While the patch fixes the issue and is also a right way to fix the issue > > > I would recommend to change it a bit to make the reason more clear. > > > > > > As you said to reproduce the issue you need trust to AD configured which > > > POSIX attributes but not replicate them to the Global Catalog. If you > > > new start with an empty cache a sequence like > > > > > > getent group group_with_posix_gid@ad.domain > > > sss_cache -E > > > getent group group_with_posix_gid@ad.domain > > > > > > will trigger the crash. > > > > > > What happens is that during the first lookup SSSD sees that the GC does > > > not have the POSIX attributes and disable the GC lookup. As a > > > consequence ad_gc_conn_list() will return only a single entry for the > > > second lookup and unconditionally accessing the second entry will fail. > > > While the fix is correct, I would recommend to add a boolean option to > > > ad_gc_conn_list() and set ignore_mark_offline in ad_gc_conn_list(). > > > Otherwise you have to add extra logic to ipa_get_ad_acct_send() to make > > > sure ignore_mark_offline is set even it ad_gc_conn_list() only returns > > > one element. > > > > You're right that the logic was scattered all over the place. The > > attached patches consolidate it into ad_common.c so all callers use the > > same code. It's a larger change than before, but hopefully it's still > > digestable and as a bonus we can have tests for different combinations > > of contexts, GC status and master/sub domains. > > > > The second patch is not really needed, the only reason I included it is > > that I think it's better to have similar code grouped together in small > > functions and again, tests. > > > > If you think these are too invasive, we can only apply them to master > > and do exactly what you proposed for sssd-1-13. > > > > btw I had a bit of trouble with downstream tests, the seem to fail in > > unrelated way, so I'm sending the patches to the list after some manual > > testing and will resume the downstream tests tomorrow.. > > The attached patches fix some more Sumit's review comments. With Lukas' > help I was also able to run downstream ad_forest tests and didn't see > any issues even with the patches.
The patches are looking good. I think they are not too invasive and can be pushed to sssd-1-13 as well. As expected as I wasn't able to trigger the crash anymore and CI (http://sssd-ci.duckdns.org/logs/job/29/39/summary.html) passed as well. ACK bye, Sumit _______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-devel