On Fri, Aug 12, 2016 at 03:41:26PM +0200, Petr Cech wrote: > On 08/12/2016 03:07 PM, Jakub Hrozek wrote: > > > Logs now look like: > > > > > > > > [root@albireo sssd]# grep 'child' sssd_ipa.cygnus.dev.log > > > > [child_handler_setup] (0x2000): Setting up signal handler up for pid > > > > [18835] > > > > [child_handler_setup] (0x2000): Signal handler set up for pid [18835] > > > > [set_tgt_child_timeout] (0x0400): Setting 6 seconds timeout for tgt > > > > child > > > > > > > > (6 seconds later) > > > > > > > > [get_tgt_timeout_handler] (0x4000): timeout for sending SIGTERM to tgt > > > > child > > > > [18835] reached. > > > > [get_tgt_timeout_handler] (0x0400): Setting 2 seconds timeout for > > > > sending > > > > SIGKILL to tgt child > > > > [sdap_get_tgt_recv] (0x0020): Cannot parse child response: [22][Invalid > > > > argument] > > > > [sdap_kinit_done] (0x0020): child failed (22 [Invalid argument]) > > > > [child_sig_handler] (0x1000): Waiting for child [18835]. > > > > [child_sig_handler] (0x0020): child [18835] failed with status [7]. > > > > [child_callback] (0x0020): LDAP child was terminated due to timeout > > I'm sorry, but these patches still don't fix the issue I was seeing. > > Before the patches, when I timed out the child process, I saw: > > (Fri Aug 12 14:54:24 2016) [[sssd[ldap_child[31721]]]] > > [ldap_child_get_tgt_sync] (0x0100): Principal name is: > > [host/client.ipa.t...@ipa.test] > > (Fri Aug 12 14:54:24 2016) [[sssd[ldap_child[31721]]]] > > [ldap_child_get_tgt_sync] (0x0100): Using keytab [MEMORY:/etc/krb5.keytab] > > (Fri Aug 12 14:54:30 2016) [sssd[be[ipaldap]]] [get_tgt_timeout_handler] > > (0x4000): timeout for tgt child [31721] reached. > > (Fri Aug 12 14:54:30 2016) [sssd[be[ipaldap]]] [sdap_kinit_done] (0x0080): > > Communication with KDC timed out, trying the next one > > (Fri Aug 12 14:54:30 2016) [sssd[be[ipaldap]]] [_be_fo_set_port_status] > > (0x8000): Setting status: PORT_NOT_WORKING. Called from: > > /sssd/src/providers/ldap/sdap_async_connection.c: sdap_kinit_done: 1207 > > (Fri Aug 12 14:54:30 2016) [sssd[be[ipaldap]]] [fo_set_port_status] > > (0x0100): Marking port 0 of server 'unidirect.ipa.test' as 'not working' > > (Fri Aug 12 14:54:30 2016) [sssd[be[ipaldap]]] [fo_set_port_status] > > (0x0400): Marking port 0 of duplicate server 'unidirect.ipa.test' as 'not > > working' > > (Fri Aug 12 14:54:30 2016) [sssd[be[ipaldap]]] [sdap_kinit_next_kdc] > > (0x1000): Resolving next KDC for service KERBEROS > > (Fri Aug 12 14:54:30 2016) [sssd[be[ipaldap]]] [fo_resolve_service_send] > > (0x0100): Trying to resolve service 'KERBEROS' > > (Fri Aug 12 14:54:30 2016) [sssd[be[ipaldap]]] [get_server_status] > > (0x1000): Status of server 'unidirect.ipa.test' is 'name resolved' > > (Fri Aug 12 14:54:30 2016) [sssd[be[ipaldap]]] [get_port_status] (0x1000): > > Port status of port 0 for server 'unidirect.ipa.test' is 'not working' > > (Fri Aug 12 14:54:30 2016) [sssd[be[ipaldap]]] [fo_resolve_service_send] > > (0x0020): No available servers for service 'KERBEROS' > > (Fri Aug 12 14:54:30 2016) [sssd[be[ipaldap]]] [be_resolve_server_done] > > (0x1000): Server resolution failed: [5]: Input/output error > > (Fri Aug 12 14:54:30 2016) [sssd[be[ipaldap]]] [sdap_cli_kinit_done] > > (0x0400): Cannot get a TGT: ret [1432158228](Network I/O Error) > > (Fri Aug 12 14:54:30 2016) [sssd[be[ipaldap]]] [sdap_cli_connect_recv] > > (0x0040): Unable to establish connection [13]: Permission denied > > (Fri Aug 12 14:54:30 2016) [sssd[be[ipaldap]]] [_be_fo_set_port_status] > > (0x8000): Setting status: PORT_NOT_WORKING. Called from: > > /sssd/src/providers/ldap/sdap_async_connection.c: sdap_cli_connect_recv: > > 2048 > > > > After the patch, I see: > > (Fri Aug 12 15:01:05 2016) [sssd[be[ipaldap]]] [get_tgt_timeout_handler] > > (0x4000): timeout for sending SIGTERM to tgt child [17291] reached. > > (Fri Aug 12 15:01:05 2016) [sssd[be[ipaldap]]] [get_tgt_timeout_handler] > > (0x0400): Setting 2 seconds timeout for sending SIGKILL to tgt child > > (Fri Aug 12 15:01:05 2016) [[sssd[ldap_child[17291]]]] [sig_term_handler] > > (0x0010): Received signal [Terminated] [15], shutting down > > (Fri Aug 12 15:01:05 2016) [sssd[be[ipaldap]]] [read_pipe_handler] > > (0x0400): EOF received, client finished > > (Fri Aug 12 15:01:05 2016) [sssd[be[ipaldap]]] [sdap_get_tgt_recv] > > (0x0020): Cannot parse child response: [22][Invalid argument] > > here ---^ > > > > This is the part I don't like, we try to read the response from the child's > > pipe and fail with a bad error message. I thought this was because with > > the preivous patch, we exit the child with zero, but I guess this was not > > the case.. Anyway, we still should fix this, the message would be really > > confusing to admins. > > Hi Jakub, > > I know I had the same > 'Cannot parse child response: [22][Invalid argument]' > in my logs too, see my last mail above. > > Some lines after it I have new message > '[child_callback] (0x0020): LDAP child was terminated due to timeout' > > I understand that it is not satisfying. The main purpose of this patch was > deleting temporary files if timeout reached. And this is IMO solved I think. > It is true that we have big effort to have understandable logs. > > Yesterday I looked deeper at mechanism how signals, child handler and all > such stuff works in ldap_child case. I wasn't able to tell to > sdap_get_tgt_recv() that the state is changed and we shouldn't parse the > answer. > > If I understand handler stuff right I need introduce in ldap_child handler > whole mechanism like in src/providers/be_dyndns.c > > The mechanism is calling the child_handler_setup() with all parameters which > means rewrite sdap_child_helpers in appropriate manner.
I think it would be enough to pass the tevent_req to the child_callback() and cancel it with tevent_req_error(req, ETIMEDOUT) The problem is that with the current patches, the whole request is marked as 'succeeded', that's why the _recv() function tries to parse the response. If you look at sdap_get_tgt_recv(), if the request is marked as failure, the processing aborts in TEVENT_REQ_RETURN_ON_ERROR.. > > Maybe there is simpler way please tell me if. But if this ticket is urgent > (removing of temporary files) we could push it and solve logs (I mean > rewrite handler mechanism) in next ticket. > > Regards > > -- > Petr^4 Čech > _______________________________________________ > sssd-devel mailing list > sssd-devel@lists.fedorahosted.org > https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org _______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org