On Mon, Feb 15, 2021 at 12:03:42PM +1000, Jonathan Matthew wrote:
> It's fairly easy to accidentally configure relayd to try to run check scripts
> faster than they finish, for example if you have a check interval of one
> second and the check script makes a tcp connection to a host that doesn't
> exist any more.
> 
> In this situation, the hce process will keep writing messages to its imsg
> buffer to the parent process asking it to run checks, which causes its memory
> usage to grow without bounds.  If the check script starts working again
> (or if you change it to just 'exit 0') the parent works its way through the
> backlog and memory usage goes back to normal, but ideally relayd would avoid
> doing this to itself.
> 
> If we don't clear the F_CHECK_SENT and F_CHECK_DONE flags in
> hce_launch_checks(), check_script() can use them to figure out if the
> last check request it sent for the host has finished yet, so it can avoid
> building up a backlog of work for the parent.  The ICMP and script check 
> implementations clear these flags as they start checks, and the TCP check
> code doesn't use them at all, so this shouldn't affect anything else.
> 
> ok?
> 
ok giovanni@
 Thanks
  Giovanni

> 
> Index: check_script.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/relayd/check_script.c,v
> retrieving revision 1.21
> diff -u -p -u -p -r1.21 check_script.c
> --- check_script.c    28 May 2017 10:39:15 -0000      1.21
> +++ check_script.c    15 Feb 2021 01:28:54 -0000
> @@ -38,6 +38,9 @@ check_script(struct relayd *env, struct 
>       struct ctl_script        scr;
>       struct table            *table;
>  
> +     if ((host->flags & (F_CHECK_SENT|F_CHECK_DONE)) == F_CHECK_SENT)
> +             return;
> +
>       if ((table = table_find(env, host->conf.tableid)) == NULL)
>               fatalx("%s: invalid table id", __func__);
>  
> @@ -52,7 +55,9 @@ check_script(struct relayd *env, struct 
>               fatalx("invalid script path");
>       memcpy(&scr.timeout, &table->conf.timeout, sizeof(scr.timeout));
>  
> -     proc_compose(env->sc_ps, PROC_PARENT, IMSG_SCRIPT, &scr, sizeof(scr));
> +     if (proc_compose(env->sc_ps, PROC_PARENT, IMSG_SCRIPT, &scr,
> +         sizeof(scr)) == 0)
> +             host->flags |= F_CHECK_SENT;
>  }
>  
>  void
> Index: hce.c
> ===================================================================
> RCS file: /cvs/src/usr.sbin/relayd/hce.c,v
> retrieving revision 1.79
> diff -u -p -u -p -r1.79 hce.c
> --- hce.c     6 Aug 2018 17:31:31 -0000       1.79
> +++ hce.c     15 Feb 2021 01:28:54 -0000
> @@ -139,7 +139,6 @@ hce_launch_checks(int fd, short event, v
>               TAILQ_FOREACH(host, &table->hosts, entry) {
>                       if ((host->flags & F_CHECK_DONE) == 0)
>                               host->he = HCE_INTERVAL_TIMEOUT;
> -                     host->flags &= ~(F_CHECK_SENT|F_CHECK_DONE);
>                       if (event_initialized(&host->cte.ev)) {
>                               event_del(&host->cte.ev);
>                               close(host->cte.s);
> 

Attachment: signature.asc
Description: PGP signature

Reply via email to