On Thu, Jun 20, 2013 at 01:46:34PM +0200, Jakub Hrozek wrote: > On Thu, Jun 20, 2013 at 10:27:13AM +0200, Pavel Březina wrote: > > On 06/19/2013 09:22 PM, Jakub Hrozek wrote: > > >On Tue, Jun 18, 2013 at 01:51:15PM +0200, Pavel Březina wrote: > > >>https://fedorahosted.org/sssd/ticket/1947 > > >> > > >>A fast explanation how _srv_ expansion works. _srv_ is inserted into > > >>server > > >>list as so called meta server. Let us consider following configuration: > > >> > > >>*Setup* > > >>ipa_server = _srv_, ipa.pb > > >>server list will contain: meta -> ipa.pb > > >> > > >>*Expansion* > > >>meta -> ipa.pb:389 -> ipa.pb ; remove meta > > >>ipa.pb:389 -> ipa.pb ; meta > > >> > > >>*Collapse* > > >>remove ipa.pb:389 ; insert meta > > >>meta -> ipa.pb > > >> > > >>The main problem is that expanded SRV servers are marked as NEUTRAL during > > >>online check, but they don't collapse back into a meta server. > > >> > > >>This will trigger another SRV expansion, leaving the old server in the > > >>list > > >>and trying to add the servers again. This is present in both master and > > >>1.9 > > >>(and probably older versions), although the result is slightly different. > > >> > > >>In master, we don't insert a server into server list if it is already > > >>present. Because state->meta is orphaned from the previous SRV expansion, > > >>state->meta->next is NULL and SSSD crashes later. > > >> > > >>In 1.9, we simply insert duplicate servers. Those servers are inserted > > >>after > > >>orphaned state->meta, state->meta is orphaned again, leaving those servers > > >>globally unreachable. However, it seemingly does not affect the fail over. > > >>You just run into d25e7c65. > > >> > > >>Here are four patches for master, and two patches for 1.9. > > > > > >One question: > > > > > >> case SRV_NEUTRAL: /* Request SRV lookup */ > > >>+ if (server != NULL && server != state->meta) { > > >>+ /* A server created by expansion of meta server was marked as > > >>+ * neutral. We have to collapse the servers and issue new > > >>+ * SRV resolution. */ > > >>+ state->meta = collapse_srv_lookup(&server); > > >>+ } > > >>+ > > > > > >Wouldn't a more systematic solution be to collapse the lookup from > > >functions that are setting the status? Or were you afraid of race > > >conditions where the server might still be in use by another request? > > > > This is currently done only in fo_reset_services() which iterates over > > services and servers and sets them to neutral. I think it would be safe to > > do the collapse there, I'm not entirely sure though. > > > > However, since resolve_srv_send() allows non meta server as input parameter, > > it should not expect it to have correct (!SRV_NEUTRAL) state. It is safer > > this way. > > That's true. > > Ack.
I tested both positive and negative resolution and SRV recods collapsing. The crash itself was tested by lukas. Pushed to master and sssd-1-9 as appropriate. _______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/mailman/listinfo/sssd-devel