Following on from the "Send international text with mail(1)" thread...

There is some interest in making mail(1) add relevant MIME headers to allow:

* Correctly sending UTF-8 email.
* Identifying 7-bit ASCII emails with an appropriate content type.

... and possibly other things in the future.

However:

* Adding UTF-8 parsing directly in mail(1) and hard-coding it's behaviour is
  inflexible.
* Exactly what should be sent with UTF-8 headers rather than none or us-ascii
  is partly down to personal preference:

1. Send everything as UTF-8, including plain ASCII.
2. Send nothing as UTF-8.
3. Send valid UTF-8 streams as UTF-8, everything else ASCII.
4. Send valid and sort-of-valid-but-not-really UTF-8 streams as UTF-8, and
   plain ASCII as us-ascii.

... etc, etc.

Different sites might reasonably have different requirements.  Plus, we don't
really want to break mail(1) for anybody.

As can be seen from the size of the previous thread, finding a universal
solution that suits everybody has not yet been possible.

In an attempt to solve this, I've produced a proof of concept patch for
mail(1) to allow it to call a fixed external program, passing the mail to it
on standard input for analysis and receiving back a flag to indicate which set
of MIME headers should be included.

This is only a POC at this stage, so there may be bugs and room for
improvement.  But it seems to work.

Advantages of this approach:

* Very minimal changes to mail(1).
* Flexible.
* No change for people who don't want this functionality.
  - If the external validator program is not installed, mail(1) does not add
    any new headers at all.


Cheat sheet:

1. Apply the patch, re-compile and re-install mail(1).
2. Compile the new program and put it in /bin/validate_utf8 .
3. Send mail with mail(1) and observe the headers.


For ease of testing by users who don't use or care about UTF-8, the demo
validator simply looks for an 'X' character in the mail body, and if it finds
one then it treats the mail as UTF-8, everything else is treated as us-ascii.

A real UTF-8 validator would return 2 for a valid UTF-8 stream, 1 for ASCII,
and 0 for non-conformant data that we don't want to mess with, (E.G. a legacy
8-bit encoding).

Have fun!

--- collect.c.dist      Fri Jan 17 15:42:30 2014
+++ collect.c   Sun Sep 24 19:09:04 2023
@@ -39,6 +39,7 @@
 
 #include "rcv.h"
 #include "extern.h"
+#include <sys/wait.h>
 
 /*
  * Read a message from standard output and return a read file to it
@@ -62,6 +63,12 @@
        char getsub;
        char linebuf[LINESIZE], tempname[PATHSIZE], *cp;
 
+       int val_status;
+       sigset_t old_sigmask;
+       sigset_t temp_sigmask;
+       pid_t pid;
+       #define VALIDATOR "/bin/validate_utf8"
+
        collf = NULL;
        eofcount = 0;
        hadintr = 0;
@@ -374,7 +381,77 @@
                (void)Fclose(collf);
                collf = NULL;
        }
+
 out:
+
+/*
+ * Pass the content of the collected file to stdin of a forked
+ * validator program, and use it's exit status to set a flag
+ * in the struct header that we can later use to include an
+ * appropriate content type header.
+ */
+
+rewind(collf);
+
+/*
+ * If this fork fails, it's not a critical error.  We just don't
+ * perform any UTF-8 validation in that case.
+ */
+
+pid=fork();
+if (pid==-1) {
+       goto done;
+       }
+
+if (pid==0) {
+       sigset_t val_sigs;
+       sigemptyset(&val_sigs);
+       sigaddset(&val_sigs, SIGHUP);
+       prepare_child(&val_sigs, fileno(collf), -1);
+       execl(VALIDATOR, VALIDATOR, NULL);
+       /*
+        * If the validator doesn't exist or isn't executable then
+        * the following exit value will be passed to the parent
+        * below.  Therefore, it must _not_ conflict with an
+        * expected exit value from the validator.
+        */
+       _exit(127);
+       }
+
+/*
+ * To wait on the forked validator and get it's exit status we need
+ * to enable SIGCHLD.
+ */
+
+sigemptyset(&temp_sigmask);
+sigaddset(&temp_sigmask, SIGCHLD);
+sigprocmask(SIG_BLOCK, &temp_sigmask, &old_sigmask);
+if (waitpid(pid, &val_status, 0) != -1) {
+       if (WIFEXITED(val_status)) {
+               /*
+                * Only permit _specific_ values.
+                */
+               if (WEXITSTATUS(val_status) != 127)
+                       fprintf (stderr, "Validator %s returned status %d\n",
+                           VALIDATOR, WEXITSTATUS(val_status));
+               if (WEXITSTATUS(val_status)==1)
+                       hp->enc_flag=1;
+               if (WEXITSTATUS(val_status)==2)
+                       hp->enc_flag=2;
+               }
+       }
+
+/*
+ * Restore previous signal mask now that we are done with the validator.
+ */
+
+sigprocmask(SIG_SETMASK, &old_sigmask, NULL);
+
+/*
+ * All done!
+ */
+
+done:
        if (collf != NULL)
                rewind(collf);
        noreset--;
--- def.h.dist  Fri Jan 28 03:18:41 2022
+++ def.h       Sun Sep 24 16:01:03 2023
@@ -176,6 +176,7 @@
        struct name *h_cc;              /* Carbon copies string */
        struct name *h_bcc;             /* Blind carbon copies */
        struct name *h_smopts;          /* Sendmail options */
+       unsigned int enc_flag;          /* Flag set by external UTF-8 validator 
*/
 };
 
 /*
--- send.c.dist Wed Mar  8 01:43:11 2023
+++ send.c      Sun Sep 24 19:00:37 2023
@@ -309,6 +309,7 @@
        head.h_cc = NULL;
        head.h_bcc = NULL;
        head.h_smopts = NULL;
+       head.enc_flag = 0;
        mail1(&head, 0);
        return(0);
 }
@@ -529,6 +530,16 @@
                fmt("Cc:", hp->h_cc, fo, w&GCOMMA), gotcha++;
        if (hp->h_bcc != NULL && w & GBCC)
                fmt("Bcc:", hp->h_bcc, fo, w&GCOMMA), gotcha++;
+       if (hp->enc_flag == 1)
+               fprintf(fo, "MIME-Version: 1.0\n"
+                   "Content-Type: text/plain; charset=us-ascii\n"
+                   "Content-Transfer-Encoding: 7bit\n");
+               gotcha++;
+       if (hp->enc_flag == 2)
+               fprintf(fo, "MIME-Version: 1.0\n"
+                   "Content-Type: text/plain; charset=utf-8\n"
+                   "Content-Transfer-Encoding: 8bit\n");
+               gotcha++;
        if (gotcha && w & GNL)
                (void)putc('\n', fo);
        return(0);




#include <stdio.h>

int main()
{
/*
 * Do the UTF-8 parsing of your choice here.
 *
 * This demo code just treats anything with an X as UTF-8, everything else as 
ASCII.
 *
 * Input is on stdin.
 *
 * Return 0 for no additional headers.
 * Return 1 for us-ascii headers.
 * Return 2 for utf-8 headers.
 * Other return values are undefined but will currently behave like 0, (no 
additional headers).
 */
int i;
while ((i=getc(stdin)) != EOF) {
        if (i == 'X') {
                return (2);
                }
        }
return(1);
}

Reply via email to