Following on from the "Send international text with mail(1)" thread...
There is some interest in making mail(1) add relevant MIME headers to allow:
* Correctly sending UTF-8 email.
* Identifying 7-bit ASCII emails with an appropriate content type.
... and possibly other things in the future.
However:
* Adding UTF-8 parsing directly in mail(1) and hard-coding it's behaviour is
inflexible.
* Exactly what should be sent with UTF-8 headers rather than none or us-ascii
is partly down to personal preference:
1. Send everything as UTF-8, including plain ASCII.
2. Send nothing as UTF-8.
3. Send valid UTF-8 streams as UTF-8, everything else ASCII.
4. Send valid and sort-of-valid-but-not-really UTF-8 streams as UTF-8, and
plain ASCII as us-ascii.
... etc, etc.
Different sites might reasonably have different requirements. Plus, we don't
really want to break mail(1) for anybody.
As can be seen from the size of the previous thread, finding a universal
solution that suits everybody has not yet been possible.
In an attempt to solve this, I've produced a proof of concept patch for
mail(1) to allow it to call a fixed external program, passing the mail to it
on standard input for analysis and receiving back a flag to indicate which set
of MIME headers should be included.
This is only a POC at this stage, so there may be bugs and room for
improvement. But it seems to work.
Advantages of this approach:
* Very minimal changes to mail(1).
* Flexible.
* No change for people who don't want this functionality.
- If the external validator program is not installed, mail(1) does not add
any new headers at all.
Cheat sheet:
1. Apply the patch, re-compile and re-install mail(1).
2. Compile the new program and put it in /bin/validate_utf8 .
3. Send mail with mail(1) and observe the headers.
For ease of testing by users who don't use or care about UTF-8, the demo
validator simply looks for an 'X' character in the mail body, and if it finds
one then it treats the mail as UTF-8, everything else is treated as us-ascii.
A real UTF-8 validator would return 2 for a valid UTF-8 stream, 1 for ASCII,
and 0 for non-conformant data that we don't want to mess with, (E.G. a legacy
8-bit encoding).
Have fun!
--- collect.c.dist Fri Jan 17 15:42:30 2014
+++ collect.c Sun Sep 24 19:09:04 2023
@@ -39,6 +39,7 @@
#include "rcv.h"
#include "extern.h"
+#include <sys/wait.h>
/*
* Read a message from standard output and return a read file to it
@@ -62,6 +63,12 @@
char getsub;
char linebuf[LINESIZE], tempname[PATHSIZE], *cp;
+ int val_status;
+ sigset_t old_sigmask;
+ sigset_t temp_sigmask;
+ pid_t pid;
+ #define VALIDATOR "/bin/validate_utf8"
+
collf = NULL;
eofcount = 0;
hadintr = 0;
@@ -374,7 +381,77 @@
(void)Fclose(collf);
collf = NULL;
}
+
out:
+
+/*
+ * Pass the content of the collected file to stdin of a forked
+ * validator program, and use it's exit status to set a flag
+ * in the struct header that we can later use to include an
+ * appropriate content type header.
+ */
+
+rewind(collf);
+
+/*
+ * If this fork fails, it's not a critical error. We just don't
+ * perform any UTF-8 validation in that case.
+ */
+
+pid=fork();
+if (pid==-1) {
+ goto done;
+ }
+
+if (pid==0) {
+ sigset_t val_sigs;
+ sigemptyset(&val_sigs);
+ sigaddset(&val_sigs, SIGHUP);
+ prepare_child(&val_sigs, fileno(collf), -1);
+ execl(VALIDATOR, VALIDATOR, NULL);
+ /*
+ * If the validator doesn't exist or isn't executable then
+ * the following exit value will be passed to the parent
+ * below. Therefore, it must _not_ conflict with an
+ * expected exit value from the validator.
+ */
+ _exit(127);
+ }
+
+/*
+ * To wait on the forked validator and get it's exit status we need
+ * to enable SIGCHLD.
+ */
+
+sigemptyset(&temp_sigmask);
+sigaddset(&temp_sigmask, SIGCHLD);
+sigprocmask(SIG_BLOCK, &temp_sigmask, &old_sigmask);
+if (waitpid(pid, &val_status, 0) != -1) {
+ if (WIFEXITED(val_status)) {
+ /*
+ * Only permit _specific_ values.
+ */
+ if (WEXITSTATUS(val_status) != 127)
+ fprintf (stderr, "Validator %s returned status %d\n",
+ VALIDATOR, WEXITSTATUS(val_status));
+ if (WEXITSTATUS(val_status)==1)
+ hp->enc_flag=1;
+ if (WEXITSTATUS(val_status)==2)
+ hp->enc_flag=2;
+ }
+ }
+
+/*
+ * Restore previous signal mask now that we are done with the validator.
+ */
+
+sigprocmask(SIG_SETMASK, &old_sigmask, NULL);
+
+/*
+ * All done!
+ */
+
+done:
if (collf != NULL)
rewind(collf);
noreset--;
--- def.h.dist Fri Jan 28 03:18:41 2022
+++ def.h Sun Sep 24 16:01:03 2023
@@ -176,6 +176,7 @@
struct name *h_cc; /* Carbon copies string */
struct name *h_bcc; /* Blind carbon copies */
struct name *h_smopts; /* Sendmail options */
+ unsigned int enc_flag; /* Flag set by external UTF-8 validator
*/
};
/*
--- send.c.dist Wed Mar 8 01:43:11 2023
+++ send.c Sun Sep 24 19:00:37 2023
@@ -309,6 +309,7 @@
head.h_cc = NULL;
head.h_bcc = NULL;
head.h_smopts = NULL;
+ head.enc_flag = 0;
mail1(&head, 0);
return(0);
}
@@ -529,6 +530,16 @@
fmt("Cc:", hp->h_cc, fo, w&GCOMMA), gotcha++;
if (hp->h_bcc != NULL && w & GBCC)
fmt("Bcc:", hp->h_bcc, fo, w&GCOMMA), gotcha++;
+ if (hp->enc_flag == 1)
+ fprintf(fo, "MIME-Version: 1.0\n"
+ "Content-Type: text/plain; charset=us-ascii\n"
+ "Content-Transfer-Encoding: 7bit\n");
+ gotcha++;
+ if (hp->enc_flag == 2)
+ fprintf(fo, "MIME-Version: 1.0\n"
+ "Content-Type: text/plain; charset=utf-8\n"
+ "Content-Transfer-Encoding: 8bit\n");
+ gotcha++;
if (gotcha && w & GNL)
(void)putc('\n', fo);
return(0);
#include <stdio.h>
int main()
{
/*
* Do the UTF-8 parsing of your choice here.
*
* This demo code just treats anything with an X as UTF-8, everything else as
ASCII.
*
* Input is on stdin.
*
* Return 0 for no additional headers.
* Return 1 for us-ascii headers.
* Return 2 for utf-8 headers.
* Other return values are undefined but will currently behave like 0, (no
additional headers).
*/
int i;
while ((i=getc(stdin)) != EOF) {
if (i == 'X') {
return (2);
}
}
return(1);
}