I enabled debug-on-demand mode and found the lines noted earlier in this thread within the debug log.
I also found lots of messages like these: $ sudo grep 'main Q' /var/log/rsyslog-debug-on-demand.log 7501.401841560:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker start 7502.328726922:imrelp.c : main Q: queue.c: EnqueueMsg advised worker start 7502.329026161:imrelp.c : main Q: queue.c: doEnqSingleObject: LightDelay mark reached for light delayable message - blocking a bit. 7502.329631733:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker start 7502.329664576:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker start 7503.329150535:imrelp.c : main Q: queue.c: EnqueueMsg advised worker start 7503.329426002:imrelp.c : main Q: queue.c: doEnqSingleObject: LightDelay mark reached for light delayable message - blocking a bit. 7503.329950433:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker start 7503.329989705:imuxsock.c : main Q: queue.c: EnqueueMsg advised worker start 7504.329531389:imrelp.c : main Q: queue.c: EnqueueMsg advised worker start 7504.329854248:imrelp.c : main Q: queue.c: doEnqSingleObject: LightDelay mark reached for light delayable message - blocking a bit. Turning to Google, I landed on these pages: https://www.rsyslog.com/doc/master/concepts/queues.html https://www.rsyslog.com/doc/master/rainerscript/queue_parameters.html https://github.com/rsyslog/rsyslog/issues/1778#issuecomment-353135527 https://github.com/rsyslog/rsyslog/pull/3642 https://github.com/rsyslog/rsyslog-doc/pull/819 Is the fix here to set queue.lightDelayMark to 0? Should this be set for the main queue or the ruleset attached to imrelp? For now, I've disabled port probes from Nagios on the receiver's imrelp port. -----Original Message----- From: rsyslog <rsyslog-boun...@lists.adiscon.com> On Behalf Of Adam Chalkley via rsyslog Sent: Wednesday, August 19, 2020 11:38 AM To: rsyslog-users <rsyslog@lists.adiscon.com> Cc: Adam Chalkley <atc0...@auburn.edu> Subject: [rsyslog] Upgraded receiver from Ubuntu 16.04 to 18.04, connections from clients failing with a high number of CLOSE_WAIT connections on receiver Hi, We upgraded the OS on our central receiver yesterday from Ubuntu 16.04 (4.4 kernel) to 18.04 (4.15 kernel). We are using the upstream PPA, so running 8.2006.0 on receivers and endpoints. When we started getting reports from our Nagios instance that the rsyslog forward queues endpoints were beginning to fill we checked our receiver (sawmill1) and saw 94 open TCP connections with 40 of them in CLOSE_WAIT from our Nagios server, most of them I suspect from the TCP port connection test performed every 5 minutes. Log samples from the receiver system (which are related to port probes from our Nagios instance): 2020-08-19T10:05:01.279416-05:00 lincoln rsyslogd: -- MARK -- 2020-08-19T10:05:08.249358-05:00 lincoln rsyslogd: imrelp[2514]: error 'server closed relp session, session broken', object 'lstn 2514: conn to clt 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ] 2020-08-19T10:05:08.249626-05:00 lincoln rsyslogd: imrelp[2514]: error 'error sending relp: Bad file descriptor', object 'lstn 2514: conn to clt 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ] 2020-08-19T10:08:08.020625-05:00 lincoln rsyslogd: imrelp[2514]: error 'server closed relp session, session broken', object 'lstn 2514: conn to clt 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ] 2020-08-19T10:08:08.021253-05:00 lincoln rsyslogd: imrelp[2514]: error 'error sending relp: Bad file descriptor', object 'lstn 2514: conn to clt 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ] 2020-08-19T10:11:08.074712-05:00 lincoln rsyslogd: imrelp[2514]: error 'server closed relp session, session broken', object 'lstn 2514: conn to clt 192.168.2.10/192.168.2.10' - input may not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ] Log samples from the Nagios instance: 2020-08-19T11:19:53.444953-05:00 nagios rsyslogd: omrelp[lincoln.lib.auburn.edu:2514]: error 'error waiting on required session state, session broken', object 'conn to srvr lincoln.lib.auburn.edu:2514' - action may not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ] 2020-08-19T11:19:53.445260-05:00 nagios rsyslogd: omrelp[lincoln.lib.auburn.edu:2514]: error 'error opening connection to remote peer', object 'conn to srvr lincoln.lib.auburn.edu:2514' - action may not work as intended [v8.2006.0 try https://www.rsyslog.com/e/2353 ] Is there a setting I can apply to rsyslog to help resolve this? Is this a known bug? We didn't have the issue with v8.2006.0 on our receiver when it was running Ubuntu 16.04 (the prior OS release), even though it made the same complaints about the TCP port probes from Nagios. Thanks in advance. _______________________________________________ rsyslog mailing list https://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT. _______________________________________________ rsyslog mailing list https://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.