On 08/12/2016 11:06 AM, Jakub Hrozek wrote:
On Thu, Aug 11, 2016 at 02:23:39PM +0200, Petr Cech wrote:
On 08/09/2016 02:49 PM, Jakub Hrozek wrote:
On Tue, Aug 09, 2016 at 12:57:58PM +0200, Petr Cech wrote:
On 08/09/2016 11:26 AM, Jakub Hrozek wrote:
On Mon, Aug 08, 2016 at 09:46:55AM +0200, Petr Cech wrote:
On 08/04/2016 05:01 PM, Petr Cech wrote:
On 08/04/2016 04:35 PM, Petr Cech wrote:
Hi list,

there is the first version of patch for [1]. I need
to investigate if we have the same issue in other
*_childs.

I tested it with sleep() added into ldap_child code.

[1] https://fedorahosted.org/sssd/ticket/3106


I know, it is not perfect. For example 2 seconds between SIGTERM and
SIGKILL should be constant. I will send the second version with other
childs, but not today.

Regards

Hello all,

there is fixed patch set. There is only one minor change -- constant for
time between signals.

Is there better name for global_ccname_file_dummy? I did not want to break
the naming... but it looks really long.
I think this name is fine, but I would prefer to not use the variable in
the code, except for setting it after the temporary file is created and
then set the variable back to NULL after the temporary file is renamed
to the real one.

Addressed.

Thanks Jakub, it looks better. :-)

OK, now I actually tested the patch and it regresses the way we handle
responses, because the handler exits the process with success error
code, the provider then tries to parse the child output, which is not
there..

I think the easiest way would be to define a return code of ldap_child
that would be used in exit in the signal handler and the back end code
would print a nice debug message telling the admin that the child timed
out.

Thanks Jakub,
improved patch set is attached.

Regards

--
Petr^4 Čech

From f84d4d2fcc7c1f2cb404a02aeea7abd12c14dd61 Mon Sep 17 00:00:00 2001
From: Petr Cech <pc...@redhat.com>
Date: Fri, 22 Jul 2016 14:28:54 +0200
Subject: [PATCH] LDAP: Fixing of removing netgroup from cache

This patch is from a different set.

Sorry, the right one is attached.

From 1f40587cbb256bab98e8e35670ca76426b41e6b1 Mon Sep 17 00:00:00 2001
From: Petr Cech <pc...@redhat.com>
Date: Thu, 4 Aug 2016 16:27:40 +0200
Subject: [PATCH 2/3] LDAP: Adding SIGTERM signal before SIGKILL

We add better termination of ldap_child. If ldap_child reaches
the timeout for termination parent sents SIGTERM signal. Child
has 2 seconds for removing temporary file and exit.
If it is not sufficient there is SIGKILL send to the child.

Resolves:
https://fedorahosted.org/sssd/ticket/3106

[...]

+static void get_tgt_timeout_handler(struct tevent_context *ev,
+                                      struct tevent_timer *te,
+                                      struct timeval tv, void *pvt)
+{
+    struct tevent_req *req = talloc_get_type(pvt, struct tevent_req);
+    struct sdap_get_tgt_state *state = tevent_req_data(req,
+                                            struct sdap_get_tgt_state);
+    int ret;
+
+    DEBUG(SSSDBG_TRACE_ALL,
+          "timeout for sending SIGTERM to tgt child [%d] reached.\n",
+          state->child->pid);
+
+    ret = kill(state->child->pid, SIGTERM);
+    if (ret == -1) {

Please use:
    ret = errno
    and then use ret instead of errno in the DEBUG statement to avoid
    overwriting errno if we ever change this branch in the future.

Thanks, addressed.

+        DEBUG(SSSDBG_CRIT_FAILURE,
+              "Sending SIGTERM failed [%d][%s].\n", errno, strerror(errno));
+    }

From 4ca2a6b90dd7d7ee792b0e144a4e6d432f4693a4 Mon Sep 17 00:00:00 2001
From: Petr Cech <pc...@redhat.com>
Date: Thu, 11 Aug 2016 14:18:15 +0200
Subject: [PATCH 3/3] LDAP: Adding SIGCHLD callback

This patch adds SIGCHLD callback for ldap_child. So if timeout is
reached and ldap_child is terminated by handler we have debug message
about it.

Resolves:
https://fedorahosted.org/sssd/ticket/3106

This patch does not apply for me, so I'll just comment on the diff.

I applied it on top of master. If there is some issue please tell me.

---
 src/providers/ldap/ldap_child.c         |  2 +-
 src/providers/ldap/sdap_child_helpers.c | 10 +++++++++-
 src/util/child_common.h                 |  1 +
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/providers/ldap/ldap_child.c b/src/providers/ldap/ldap_child.c
index 
2edfc30ec7b4a3ff03d86ae41f086a6128451a9b..b6913fdaa4c8f87f33cc8c10f9a029d63626f279
 100644
--- a/src/providers/ldap/ldap_child.c
+++ b/src/providers/ldap/ldap_child.c
@@ -54,7 +54,7 @@ static void sig_term_handler(int sig)
         }
     }

-    _exit(0);
+    _exit(SIGNAL_EXIT_CODE);

I would prefer CHILD_SIGTERM_EXIT_CODE or CHILD_TIMEOUT_EXIT_CODE
(prefix and change the name)

Addressed. Naming is one big issue :-)

 }

 static krb5_context krb5_error_ctx;
diff --git a/src/providers/ldap/sdap_child_helpers.c 
b/src/providers/ldap/sdap_child_helpers.c
index 
9f9c9b57a7ca5e1217d56ab560cc941432084df8..98b0fe777fd18fad6ffdaefc48ea156a45f4e803
 100644
--- a/src/providers/ldap/sdap_child_helpers.c
+++ b/src/providers/ldap/sdap_child_helpers.c
@@ -69,6 +69,14 @@ static void sdap_close_fd(int *fd)
     *fd = -1;
 }

+void child_callback(int child_status, struct tevent_signal *sige, void *pvt)
+{
+    if (WEXITSTATUS(child_status) == SIGNAL_EXIT_CODE) {
+        DEBUG(SSSDBG_CRIT_FAILURE,
+              "LDAP child was terminated due to timeout\n");
+    }
+}
+
 static errno_t sdap_fork_child(struct tevent_context *ev,
                                struct sdap_child *child)
 {
@@ -110,7 +118,7 @@ static errno_t sdap_fork_child(struct tevent_context *ev,
         sss_fd_nonblocking(child->io->read_from_child_fd);
         sss_fd_nonblocking(child->io->write_to_child_fd);

-        ret = child_handler_setup(ev, pid, NULL, NULL, NULL);
+        ret = child_handler_setup(ev, pid, child_callback, NULL, NULL);
         if (ret != EOK) {
             goto fail;
         }
diff --git a/src/util/child_common.h b/src/util/child_common.h
index 
eb7623987238d5d9e9032b9bbbbef49188f5b1d2..e88661349a1568ffc981e0a3134f7ccea61b394e
 100644
--- a/src/util/child_common.h
+++ b/src/util/child_common.h
@@ -36,6 +36,7 @@
 #define CHILD_MSG_CHUNK     256

 #define SIGTERM_TO_SIGKILL_TIME 2
+#define SIGNAL_EXIT_CODE EINTR

I'm not sure if we want to use errno constants here, I would say just
defining our own constant is better, at least to avoid overflowing the
maximum exit value of 255. I don't think we even have to worry about
clashing with constants from sysexits.h, this is our internal API anyway.

I choose 7.

Logs now look like:

[root@albireo sssd]# grep 'child' sssd_ipa.cygnus.dev.log
[child_handler_setup] (0x2000): Setting up signal handler up for pid [18835]
[child_handler_setup] (0x2000): Signal handler set up for pid [18835]
[set_tgt_child_timeout] (0x0400): Setting 6 seconds timeout for tgt child

(6 seconds later)

[get_tgt_timeout_handler] (0x4000): timeout for sending SIGTERM to tgt child [18835] reached. [get_tgt_timeout_handler] (0x0400): Setting 2 seconds timeout for sending SIGKILL to tgt child [sdap_get_tgt_recv] (0x0020): Cannot parse child response: [22][Invalid argument]
[sdap_kinit_done] (0x0020): child failed (22 [Invalid argument])
[child_sig_handler] (0x1000): Waiting for child [18835].
[child_sig_handler] (0x0020): child [18835] failed with status [7].
[child_callback] (0x0020): LDAP child was terminated due to timeout


Regards

--
Petr^4 Čech
>From 5331972de644a13827ada192f7883de9e9fccb9d Mon Sep 17 00:00:00 2001
From: Petr Cech <pc...@redhat.com>
Date: Thu, 4 Aug 2016 16:25:28 +0200
Subject: [PATCH 1/4] LDAP: Adding support for SIGTERM signal

We add support for handling SIGTERM signal. If ldap_child receives
SIGTERM signal it removes temporary file.

Resolves:
https://fedorahosted.org/sssd/ticket/3106
---
 src/providers/ldap/ldap_child.c | 29 +++++++++++++++++++++++++++++
 src/util/child_common.h         |  2 ++
 2 files changed, 31 insertions(+)

diff --git a/src/providers/ldap/ldap_child.c b/src/providers/ldap/ldap_child.c
index 52c271f36cb57b774e6091e7322e4b1394c78969..ffcbc3985691b965c76a06805068118628adc198 100644
--- a/src/providers/ldap/ldap_child.c
+++ b/src/providers/ldap/ldap_child.c
@@ -33,6 +33,30 @@
 #include "providers/backend.h"
 #include "providers/krb5/krb5_common.h"
 
+char *global_ccname_file_dummy = NULL;
+
+static void sig_term_handler(int sig)
+{
+    int ret;
+
+    DEBUG(SSSDBG_FATAL_FAILURE, "Received signal [%s] [%i], shutting down\n",
+                                strsignal(sig), sig);
+
+    if (global_ccname_file_dummy != NULL) {
+        ret = unlink(global_ccname_file_dummy);
+        if (ret != 0) {
+            DEBUG(SSSDBG_FATAL_FAILURE, "Unlink file [%s] failed [%i][%s]\n",
+                                        global_ccname_file_dummy,
+                                        errno, strerror(errno));
+        } else {
+            DEBUG(SSSDBG_FATAL_FAILURE, "Unlink file [%s]\n",
+                                        global_ccname_file_dummy);
+        }
+    }
+
+    _exit(CHILD_TIMEOUT_EXIT_CODE);
+}
+
 static krb5_context krb5_error_ctx;
 #define LDAP_CHILD_DEBUG(level, error) KRB5_DEBUG(level, krb5_error_ctx, error)
 
@@ -405,6 +429,7 @@ static krb5_error_code ldap_child_get_tgt_sync(TALLOC_CTX *memctx,
               strerror(krberr), krberr);
         goto done;
     }
+    global_ccname_file_dummy = ccname_file_dummy;
 
     ret = sss_unique_filename(tmp_ctx, ccname_file_dummy);
     if (ret != EOK) {
@@ -490,6 +515,7 @@ static krb5_error_code ldap_child_get_tgt_sync(TALLOC_CTX *memctx,
               "rename failed [%d][%s].\n", ret, strerror(ret));
         goto done;
     }
+    global_ccname_file_dummy = NULL;
 
     krberr = 0;
     *ccname_out = talloc_steal(memctx, ccname);
@@ -631,6 +657,9 @@ int main(int argc, const char *argv[])
         }
     }
 
+    BlockSignals(false, SIGTERM);
+    CatchSignal(SIGTERM, sig_term_handler);
+
     DEBUG(SSSDBG_TRACE_FUNC, "ldap_child started.\n");
 
     main_ctx = talloc_new(NULL);
diff --git a/src/util/child_common.h b/src/util/child_common.h
index 2a62869034a466b465a481286950678af73667ab..d843cc36d20f9b0214ea1d2b1fa26251080303a7 100644
--- a/src/util/child_common.h
+++ b/src/util/child_common.h
@@ -35,6 +35,8 @@
 #define IN_BUF_SIZE         512
 #define CHILD_MSG_CHUNK     256
 
+#define CHILD_TIMEOUT_EXIT_CODE 7
+
 struct response {
     uint8_t *buf;
     size_t size;
-- 
2.7.4

>From 625191d82cf47f3eded4189d4ef01b379a14fef6 Mon Sep 17 00:00:00 2001
From: Petr Cech <pc...@redhat.com>
Date: Thu, 4 Aug 2016 16:27:40 +0200
Subject: [PATCH 2/4] LDAP: Adding SIGTERM signal before SIGKILL

We add better termination of ldap_child. If ldap_child reaches
the timeout for termination parent sents SIGTERM signal. Child
has 2 seconds for removing temporary file and exit.
If it is not sufficient there is SIGKILL send to the child.

Resolves:
https://fedorahosted.org/sssd/ticket/3106
---
 src/providers/ldap/sdap_child_helpers.c | 40 +++++++++++++++++++++++++++++----
 src/util/child_common.h                 |  1 +
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/src/providers/ldap/sdap_child_helpers.c b/src/providers/ldap/sdap_child_helpers.c
index 92642e8e47e530e5aef5f78bf1c7a2427979275e..2c0757d9a8c3f82779175da4a540f019ae1575d6 100644
--- a/src/providers/ldap/sdap_child_helpers.c
+++ b/src/providers/ldap/sdap_child_helpers.c
@@ -416,9 +416,7 @@ int sdap_get_tgt_recv(struct tevent_req *req,
     return EOK;
 }
 
-
-
-static void get_tgt_timeout_handler(struct tevent_context *ev,
+static void get_tgt_sigkill_handler(struct tevent_context *ev,
                                     struct tevent_timer *te,
                                     struct timeval tv, void *pvt)
 {
@@ -428,7 +426,8 @@ static void get_tgt_timeout_handler(struct tevent_context *ev,
     int ret;
 
     DEBUG(SSSDBG_TRACE_ALL,
-          "timeout for tgt child [%d] reached.\n", state->child->pid);
+          "timeout for sending SIGKILL to tgt child [%d] reached.\n",
+          state->child->pid);
 
     ret = kill(state->child->pid, SIGKILL);
     if (ret == -1) {
@@ -439,6 +438,39 @@ static void get_tgt_timeout_handler(struct tevent_context *ev,
     tevent_req_error(req, ETIMEDOUT);
 }
 
+static void get_tgt_timeout_handler(struct tevent_context *ev,
+                                      struct tevent_timer *te,
+                                      struct timeval tv, void *pvt)
+{
+    struct tevent_req *req = talloc_get_type(pvt, struct tevent_req);
+    struct sdap_get_tgt_state *state = tevent_req_data(req,
+                                            struct sdap_get_tgt_state);
+    int ret;
+
+    DEBUG(SSSDBG_TRACE_ALL,
+          "timeout for sending SIGTERM to tgt child [%d] reached.\n",
+          state->child->pid);
+
+    ret = kill(state->child->pid, SIGTERM);
+    if (ret == -1) {
+        ret = errno;
+        DEBUG(SSSDBG_CRIT_FAILURE,
+              "Sending SIGTERM failed [%d][%s].\n", ret, strerror(ret));
+    }
+
+    DEBUG(SSSDBG_TRACE_FUNC,
+          "Setting %d seconds timeout for sending SIGKILL to tgt child\n",
+          SIGTERM_TO_SIGKILL_TIME);
+
+    tv = tevent_timeval_current_ofs(SIGTERM_TO_SIGKILL_TIME, 0);
+
+    te = tevent_add_timer(ev, req, tv, get_tgt_sigkill_handler, req);
+    if (te == NULL) {
+        DEBUG(SSSDBG_CRIT_FAILURE, "tevent_add_timer failed.\n");
+        tevent_req_error(req, ECANCELED);
+    }
+}
+
 static errno_t set_tgt_child_timeout(struct tevent_req *req,
                                      struct tevent_context *ev,
                                      int timeout)
diff --git a/src/util/child_common.h b/src/util/child_common.h
index d843cc36d20f9b0214ea1d2b1fa26251080303a7..37116e2a73ede765f68b3dba8ab69aea432303bb 100644
--- a/src/util/child_common.h
+++ b/src/util/child_common.h
@@ -35,6 +35,7 @@
 #define IN_BUF_SIZE         512
 #define CHILD_MSG_CHUNK     256
 
+#define SIGTERM_TO_SIGKILL_TIME 2
 #define CHILD_TIMEOUT_EXIT_CODE 7
 
 struct response {
-- 
2.7.4

>From 923571254e5ceb5f99bfadf6594cd9f2d23439d6 Mon Sep 17 00:00:00 2001
From: Petr Cech <pc...@redhat.com>
Date: Thu, 11 Aug 2016 14:18:15 +0200
Subject: [PATCH 3/4] LDAP: Adding SIGCHLD callback

This patch adds SIGCHLD callback for ldap_child. So if timeout is
reached and ldap_child is terminated by handler we have debug message
about it.

Resolves:
https://fedorahosted.org/sssd/ticket/3106
---
 src/providers/ldap/sdap_child_helpers.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/providers/ldap/sdap_child_helpers.c b/src/providers/ldap/sdap_child_helpers.c
index 2c0757d9a8c3f82779175da4a540f019ae1575d6..0b31d89a07c7db03173dd16d0053ac1cf5c51273 100644
--- a/src/providers/ldap/sdap_child_helpers.c
+++ b/src/providers/ldap/sdap_child_helpers.c
@@ -69,6 +69,14 @@ static void sdap_close_fd(int *fd)
     *fd = -1;
 }
 
+void child_callback(int child_status, struct tevent_signal *sige, void *pvt)
+{
+    if (WEXITSTATUS(child_status) == CHILD_TIMEOUT_EXIT_CODE) {
+        DEBUG(SSSDBG_CRIT_FAILURE,
+              "LDAP child was terminated due to timeout\n");
+    }
+}
+
 static errno_t sdap_fork_child(struct tevent_context *ev,
                                struct sdap_child *child)
 {
@@ -110,7 +118,7 @@ static errno_t sdap_fork_child(struct tevent_context *ev,
         sss_fd_nonblocking(child->io->read_from_child_fd);
         sss_fd_nonblocking(child->io->write_to_child_fd);
 
-        ret = child_handler_setup(ev, pid, NULL, NULL, NULL);
+        ret = child_handler_setup(ev, pid, child_callback, NULL, NULL);
         if (ret != EOK) {
             goto fail;
         }
-- 
2.7.4

_______________________________________________
sssd-devel mailing list
sssd-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/admin/lists/sssd-devel@lists.fedorahosted.org

Reply via email to