Currently syslogd(8) doesn't support hostname parsing for incoming
messages. This means that if a sender adds a hostname to a message it
will be interpreted as progname. Additionally, when a message is being
relayed, or there's some form of NATting taking place the originator of
the message will be completely lost.

The diff below adds hostname parsing and is already OK bluhm@, but since
I wanted to give other people a chance to yell at me before committing.
If nobody objects I'll commit it next weekend.
Note that this only adds the parsing, the rest of the current behaviour
of stays the same. I have another diff in the pipeline for allowing the
hostname in the message.

The definitions below are mostly in line with what {Net,Free}BSD do.

The BSD syslog protocol is rather loose and in quite a few locations
open for interpretation, so some definitions need to be hammered out:
- Timestamp: is easy to interpret, since it's a strict format.
  No changes here.
- Hostname: ip{,6} address, RFC1034 label, fqdn and is max
  HOST_NAME_MAX long.
- progname: alphanumeric, '-', '.', '_', ending in a ':' or '[' and is
  max 255 characters long.
  This changes in that a progname must end with ':' or '['.

I left the program name parsing as similar as I could, but make it
always distinguishable from the hostname. The ending in ':' or '[' is
part of syslog(3) and what I've always seen in the wild (ymmv). The only
exception to this rule is when progname (or ident as syslog(3) calls it)
fills up the message and ':' can't be placed. However, in that case it
won't fit in syslogd(8)'s progname definition of 255 characters.

According to syslog(3)'s guts, progname can be omitted if both ident
and LOG_PID have been omitted during openlog(3), and __progname
isn't set (no clue how that could happen op OpenBSD).
If this somehow miraculously happens the whole message is just the
"message" parameter of the syslog(3) call.

Because progname is now always identifiable, we know if we have a
hostname if progname is set and found on the first or second position
after timestamp/priority.

If progname is completely omitted we're a bit in no-mans-land. So here
we have to make a best effort. Currently we never set the hostname when
forwarding a message, unless syslogd is specified with '-h'. However,
this is not a sane default since it hinders relaying messages and bluhm@
and I already agree that we should make -h default and make -h a noop.

Going from syslog(3), if no timestamp is given we have just the message
plus (not quite) optional progname/ident. So my proposal is that
hostname should be parsed if either timestamp or progname is found.

Finally, the syslogd regress with rsyslog now fails, because we send
a message from the guts of the perl framework without timestamp or
progname, but rsyslog adds a timestamp when forwarding, but no other
changes are made. This test can be re-enabled when we allow using the
hostname from the message, or someone is brave enough to go into these
perl guts.

Additional OK? Comment?

martijn@

Index: usr.sbin/syslogd/parsemsg.c
===================================================================
RCS file: /cvs/src/usr.sbin/syslogd/parsemsg.c,v
retrieving revision 1.1
diff -u -p -r1.1 parsemsg.c
--- usr.sbin/syslogd/parsemsg.c 13 Jan 2022 10:34:07 -0000      1.1
+++ usr.sbin/syslogd/parsemsg.c 22 Jan 2022 14:52:04 -0000
@@ -17,6 +17,11 @@
  * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  */
 
+#include <sys/socket.h>
+
+#include <arpa/inet.h>
+#include <netinet/in.h>
+
 #include <ctype.h>
 #include <limits.h>
 #include <stdio.h>
@@ -29,27 +34,42 @@
 
 size_t parsemsg_timestamp_bsd(const char *, char *);
 size_t parsemsg_timestamp_v1(const char *, char *);
+size_t parsemsg_hostname(const char *, char *);
 size_t parsemsg_prog(const char *, char *);
 
 struct msg *
 parsemsg(const char *msgstr, struct msg *msg)
 {
-       size_t n;
+       size_t timelen, proglen;
+       const char *hostname;
 
        msg->m_pri = -1;
        msgstr += parsemsg_priority(msgstr, &msg->m_pri);
        if (msg->m_pri &~ (LOG_FACMASK|LOG_PRIMASK))
                msg->m_pri = -1;
 
-       if ((n = parsemsg_timestamp_bsd(msgstr, msg->m_timestamp)) == 0)
-               n = parsemsg_timestamp_v1(msgstr, msg->m_timestamp);
-       msgstr += n;
+       if ((timelen = parsemsg_timestamp_bsd(msgstr, msg->m_timestamp)) == 0)
+               timelen = parsemsg_timestamp_v1(msgstr, msg->m_timestamp);
+       msgstr += timelen;
+
+       while (isspace((unsigned char)msgstr[0]))
+               msgstr++;
 
-       while (isspace(msgstr[0]))
+       hostname = msgstr;
+       msgstr += parsemsg_hostname(msgstr, msg->m_hostname);
+ 
+       while (isspace((unsigned char)msgstr[0]))
                msgstr++;
 
-       parsemsg_prog(msgstr, msg->m_prog);
+       proglen = parsemsg_prog(msgstr, msg->m_prog);
 
+       /*
+        * Without timestamp and tag, assume hostname as part of message.
+        */
+       if (!timelen && !proglen) {
+               msg->m_hostname[0] = '\0';
+               msgstr = hostname;
+       }
        strlcpy(msg->m_msg, msgstr, sizeof(msg->m_msg));
 
        return msg;
@@ -170,6 +190,73 @@ parsemsg_timestamp_v1(const char *msgstr
 }
 
 size_t
+parsemsg_hostname(const char *msgstr, char *hostname)
+{
+       size_t i, j;
+       struct in_addr buf4;
+       struct in6_addr buf6;
+
+       if (msgstr[0] == '-' && (msgstr[1] == ' ' || msgstr[1] == '\0')) {
+               hostname[0] = '\0';
+               if (msgstr[1] == '\0')
+                       return 1;
+               return 2;
+       }
+
+       for (i = 0; i < HOST_NAME_MAX; i++) {
+               /*
+                * IPv4: [[:digit:].]
+                * IPv6: [[:xdigit:]:]
+                * fqdn: [[:alnum:]-.]
+                */
+               if (!isalnum(msgstr[i]) && msgstr[i] != '.' &&
+                   msgstr[i] != ':' && msgstr[i] != '-')
+                       break;
+               hostname[i] = msgstr[i];
+       }
+       hostname[i] = '\0';
+
+       if (msgstr[i] != '\0' && msgstr[i] != ' ') {
+               hostname[0] = '\0';
+               return 0;
+       }
+       if (msgstr[i] == ' ')
+               i++;
+
+       if (inet_pton(AF_INET, hostname, &buf4) == 1 ||
+           inet_pton(AF_INET6, hostname, &buf6) == 1) {
+               return i;
+       }
+
+       /*
+        * RFC 1034 section 3.5:
+        * <label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
+        */
+       for (j = 0; j < hostname[j] != '\0'; j++) {
+               if (j == 0 || hostname[j - 1] == '.') {
+                       if (!isalpha(hostname[j]))
+                               break;
+               } else if (hostname[j + 1] == '.' || hostname[j + 1] == '\0') {
+                       if (!isalnum(hostname[j]))
+                               break;
+               } else if (!isalnum(hostname[j]) && hostname[j] != '.' &&
+                   hostname[j] != '-')
+                       break;
+       }
+       if (hostname[j] != '\0') {
+               hostname[0] = '\0';
+               return 0;
+       }
+
+       return i;
+}
+
+/*
+ * Parse a program name of the form [[:alnum:]-._]{1,255}
+ * and which ends in [:[].
+ * It return the length of the tag up to the closing symbol.
+ */
+size_t
 parsemsg_prog(const char *msg, char *prog)
 {
        size_t i;
@@ -179,6 +266,10 @@ parsemsg_prog(const char *msg, char *pro
                    msg[i] != '-' && msg[i] != '.' && msg[i] != '_')
                        break;
                prog[i] = msg[i];
+       }
+       if (msg[i] != ':' && msg[i] != '[') {
+               prog[0] = '\0';
+               return 0;
        }
        prog[i] = '\0';
 
Index: usr.sbin/syslogd/parsemsg.h
===================================================================
RCS file: /cvs/src/usr.sbin/syslogd/parsemsg.h,v
retrieving revision 1.1
diff -u -p -r1.1 parsemsg.h
--- usr.sbin/syslogd/parsemsg.h 13 Jan 2022 10:34:07 -0000      1.1
+++ usr.sbin/syslogd/parsemsg.h 22 Jan 2022 14:52:04 -0000
@@ -25,6 +25,7 @@
 struct msg {
        int             m_pri;
        char            m_timestamp[33];
+       char            m_hostname[HOST_NAME_MAX + 1];
        char            m_prog[NAME_MAX + 1];
        char            m_msg[LOG_MAXLINE + 1];
 };
Index: regress/usr.sbin/syslogd/Makefile
===================================================================
RCS file: /cvs/src/regress/usr.sbin/syslogd/Makefile,v
retrieving revision 1.34
diff -u -p -r1.34 Makefile
--- regress/usr.sbin/syslogd/Makefile   22 Dec 2021 15:14:13 -0000      1.34
+++ regress/usr.sbin/syslogd/Makefile   22 Jan 2022 14:52:04 -0000
@@ -42,6 +42,11 @@ run-args-rsyslog-client-tls.pl run-args-
        # rsyslogd TLS client side is totally unreliable.  Startup of
        # GnuTLS may take a long time on slow machines.  Disable test.
        @echo DISABLED
+run-args-rsyslog-client-tcp.pl run-args-rsyslog-client-udp.pl:
+       # rsyslogd interprets first word as hostname and adds timestamp.
+       # The added timestamp makes syslogd interpret first word as hostname.
+       # Disable test.
+       @echo DISABLED
 
 .MAIN: all
 
Index: regress/usr.sbin/syslogd/args-client-multilisten.pl
===================================================================
RCS file: /cvs/src/regress/usr.sbin/syslogd/args-client-multilisten.pl,v
retrieving revision 1.4
diff -u -p -r1.4 args-client-multilisten.pl
--- regress/usr.sbin/syslogd/args-client-multilisten.pl 13 Sep 2017 00:35:53 
-0000      1.4
+++ regress/usr.sbin/syslogd/args-client-multilisten.pl 22 Jan 2022 14:52:04 
-0000
@@ -73,7 +73,7 @@ our %args = (
        ],
        func => sub { redo_connect(shift, sub {
            my $self = shift;
-           write_message($self, "client proto: ". $self->{connectproto});
+           write_message($self, "client proto; ". $self->{connectproto});
        })},
        loggrep => {
            qr/connect sock: (127.0.0.1|::1) \d+/ => 9,
@@ -100,9 +100,9 @@ our %args = (
     },
     file => {
        loggrep => {
-           qr/client proto: udp/ => '>=1',
-           qr/client proto: tcp/ => 3,
-           qr/client proto: tls/ => 3,
+           qr/client proto; udp/ => '>=1',
+           qr/client proto; tcp/ => 3,
+           qr/client proto; tls/ => 3,
            get_testgrep() => 1,
        }
     },
Index: regress/usr.sbin/syslogd/args-localhost.pl
===================================================================
RCS file: /cvs/src/regress/usr.sbin/syslogd/args-localhost.pl,v
retrieving revision 1.6
diff -u -p -r1.6 args-localhost.pl
--- regress/usr.sbin/syslogd/args-localhost.pl  13 Jan 2022 10:34:58 -0000      
1.6
+++ regress/usr.sbin/syslogd/args-localhost.pl  22 Jan 2022 14:52:04 -0000
@@ -20,7 +20,7 @@ our %args = (
        loghost => '@localhost:$connectport',
        options => ["-u"],
        loggrep => {
-           qr/ from localhost, prog syslogd, msg /.get_testgrep() => 1,
+           qr/ from localhost, prog , msg /.get_testgrep() => 1,
        },
     },
     server => {
Index: regress/usr.sbin/syslogd/args-msgparsing.pl
===================================================================
RCS file: regress/usr.sbin/syslogd/args-msgparsing.pl
diff -N regress/usr.sbin/syslogd/args-msgparsing.pl
--- /dev/null   1 Jan 1970 00:00:00 -0000
+++ regress/usr.sbin/syslogd/args-msgparsing.pl 22 Jan 2022 14:52:04 -0000
@@ -0,0 +1,110 @@
+# The client writes message with different timestamps to /dev/log.
+# The syslogd writes it into a file and through a pipe and to tty.
+# The syslogd passes it via UDP to the loghost.
+# The server receives the message on its UDP socket.
+# Find the message in client, file, pipe, console, user, syslogd, server log.
+# Check for the correct time conversion in file and server log.
+
+use strict;
+use warnings;
+use Socket;
+use Sys::Hostname;
+
+(my $host = hostname()) =~ s/\..*//;
+
+my $bsd = qr/[[:upper:]][[:lower:]]{2} [[:digit:] ][[:digit:]] 
[[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2}/;
+
+our %args = (
+    client => {
+       connect => { domain => AF_UNIX },
+       func => sub {
+           my $self = shift;
+           write_message($self, "<13>Jan 13 07:06:00 testhost testprog[123]: 
testcontent 1");
+           write_message($self, "<13>Jan 13 07:06:00 testhost testprog: 
testcontent 2");
+           write_message($self, "<13>Jan 13 07:06:00 testhost testprog 
testcontent 3");
+           write_message($self, "Jan 13 07:06:00 testhost testprog: 
testcontent 4");
+           write_message($self, "<13> testhost testprog: testcontent 5");
+           write_message($self, "<13>testhost testprog: testcontent 6");
+           write_message($self, "<13>testhost testprog testcontent 7");
+           write_message($self, "<13> testprog: testcontent 8");
+           write_message($self, "<13>testprog: testcontent 9");
+           write_message($self, "<13>testprog testcontent 10");
+           write_message($self, "<13>Jan 13 07:06:00 testprog: testcontent 
11");
+           write_log($self);
+       },
+    },
+    syslogd => {
+       conf => <<'EOF',
+!testprog
+*.*    $objdir/file-0.log
+!*
++$host
+*.*    $objdir/file-1.log
+EOF
+    },
+    server => {
+       loggrep => {
+           qr/<13>Jan 13 07:06:00 testprog\[123\]: testcontent 1$/ => 1,
+           qr/<13>Jan 13 07:06:00 testprog: testcontent 2$/ => 1,
+           qr/<13>Jan 13 07:06:00 testprog testcontent 3$/ => 1,
+           qr/<13>Jan 13 07:06:00 testprog: testcontent 4$/ => 1,
+           qr/<13>$bsd testprog: testcontent 5$/ => 1,
+           qr/<13>$bsd testprog: testcontent 6$/ => 1,
+           qr/<13>$bsd testhost testprog testcontent 7$/ => 1,
+           qr/<13>$bsd testprog: testcontent 8$/ => 1,
+           qr/<13>$bsd testprog: testcontent 9$/ => 1,
+           qr/<13>$bsd testprog testcontent 10$/ => 1,
+           qr/<13>Jan 13 07:06:00 testprog: testcontent 11$/ => 1,
+       },
+    },
+    file => {
+       loggrep => {
+           qr/Jan 13 07:06:00 $host testprog\[123\]: testcontent 1$/ => 1,
+           qr/Jan 13 07:06:00 $host testprog: testcontent 2$/ => 1,
+           qr/Jan 13 07:06:00 $host testprog testcontent 3$/ => 1,
+           qr/Jan 13 07:06:00 $host testprog: testcontent 4$/ => 1,
+           qr/$bsd $host testprog: testcontent 5$/ => 1,
+           qr/$bsd $host testprog: testcontent 6$/ => 1,
+           qr/$bsd $host testhost testprog testcontent 7$/ => 1,
+           qr/$bsd $host testprog: testcontent 8$/ => 1,
+           qr/$bsd $host testprog: testcontent 9$/ => 1,
+           qr/$bsd $host testprog testcontent 10$/ => 1,
+           qr/Jan 13 07:06:00 $host testprog: testcontent 11$/ => 1,
+       },
+    },
+    multifile => [
+       {
+               loggrep => {
+                   qr/Jan 13 07:06:00 $host testprog\[123\]: testcontent 1$/ 
=> 1,
+                   qr/Jan 13 07:06:00 $host testprog: testcontent 2$/ => 1,
+                   qr/Jan 13 07:06:00 $host testprog testcontent 3$/ => 0,
+                   qr/Jan 13 07:06:00 $host testprog: testcontent 4$/ => 1,
+                   qr/$bsd $host testprog: testcontent 5$/ => 1,
+                   qr/$bsd $host testprog: testcontent 6$/ => 1,
+                   qr/$bsd $host testhost testprog testcontent 7$/ => 0,
+                   qr/$bsd $host testprog: testcontent 8$/ => 1,
+                   qr/$bsd $host testprog: testcontent 9$/ => 1,
+                   qr/$bsd $host testprog testcontent 10$/ => 0,
+                   qr/Jan 13 07:06:00 $host testprog: testcontent 11$/ => 1,
+               }
+       },
+       # Everything should match, since syslogd(8) defaults to overwriting 
hostname
+       {
+               loggrep => {
+                   qr/Jan 13 07:06:00 $host testprog\[123\]: testcontent 1$/ 
=> 1,
+                   qr/Jan 13 07:06:00 $host testprog: testcontent 2$/ => 1,
+                   qr/Jan 13 07:06:00 $host testprog testcontent 3$/ => 1,
+                   qr/Jan 13 07:06:00 $host testprog: testcontent 4$/ => 1,
+                   qr/$bsd $host testprog: testcontent 5$/ => 1,
+                   qr/$bsd $host testprog: testcontent 6$/ => 1,
+                   qr/$bsd $host testhost testprog testcontent 7$/ => 1,
+                   qr/$bsd $host testprog: testcontent 8$/ => 1,
+                   qr/$bsd $host testprog: testcontent 9$/ => 1,
+                   qr/$bsd $host testprog testcontent 10$/ => 1,
+                   qr/Jan 13 07:06:00 $host testprog: testcontent 11$/ => 1,
+               }
+       }
+    ],
+);
+
+1;
Index: regress/usr.sbin/syslogd/args-zulu.pl
===================================================================
RCS file: /cvs/src/regress/usr.sbin/syslogd/args-zulu.pl,v
retrieving revision 1.3
diff -u -p -r1.3 args-zulu.pl
--- regress/usr.sbin/syslogd/args-zulu.pl       12 Sep 2017 15:24:21 -0000      
1.3
+++ regress/usr.sbin/syslogd/args-zulu.pl       22 Jan 2022 14:52:04 -0000
@@ -23,14 +23,14 @@ our %args = (
        func => sub {
            my $self = shift;
            write_message($self, "no time");
-           write_message($self, "Oct 11 22:14:15 bsd time");
-           write_message($self, "1985-04-12T23:20:50Z iso time");
-           write_message($self, "1985-04-12T23:20:50.52Z iso frac");
-           write_message($self, "1985-04-12T19:20:50.52-04:00 iso offset");
-           write_message($self, "2003-10-11T22:14:15.003Z iso milisec");
-           write_message($self, "2003-08-24T05:14:15.000003-07:00 iso full");
+           write_message($self, "Oct 11 22:14:15 $host bsd time");
+           write_message($self, "1985-04-12T23:20:50Z $host iso time");
+           write_message($self, "1985-04-12T23:20:50.52Z $host iso frac");
+           write_message($self, "1985-04-12T19:20:50.52-04:00 $host iso 
offset");
+           write_message($self, "2003-10-11T22:14:15.003Z $host iso milisec");
+           write_message($self, "2003-08-24T05:14:15.000003-07:00 $host iso 
full");
            write_message($self, "2003-08-24T05:14:15.000000003-07:00 invalid");
-           write_message($self, "- nil time");
+           write_message($self, "- $host nil time");
            write_message($self, "2003-08-24T05:14:15.000003-07:00space");
            write_message($self, "2003-08-24T05:14:15.-07:00 nofrac");
            write_message($self, "2003-08-24T05:14:15.1234567-07:00 longfrac");

Reply via email to