Tom Horsley writes:

 > Some of the messages show the international characters,

In this thread, those are Ed's.

 > and some do not.

Tim's.

 > It would be interesting to examine the detailed headers of the
 > message where the characters disappeared,

I looked in my local folder, and there's nothing interesting the
headers.  Not surprising; the mail header is not very expressive.
Archiving happens *after* message transformations, so I'm seeing the
same headers you would in HyperKitty.  We'd probably need to see the
pre-Mailman header to identify any problems with Mailman via the
header, but Mailman has never kept those.  Perhaps if Tim sent a mail
both to the list and to himself, comparing those headers might tell us
something, but his messages are simple text/plain; charset=UTF-8, so I
doubt it.  I'm prety sure it has something to do with the Unicode
encoding itself.

Ed sent his message in Unicode with NFC normalization (the é is
pre-composed) and UTF-8.  Tim's message contains two ?, indicating a
pair of unknown characters.  One possibility is that Tim's MUA
(Evolution) converts that to NFD normalization and something in
between chokes on that, and produces the doubled ?? instead of a
single é.  Renormalization is perfectly conformant to both Unicode and
mail standards, but lots of software has issues with NFD.  Mailman
should not, since it doesn't need to interpret anything other than
ASCII text, and passes anything else along (or deletes/quarantines
whole MIME parts), and I've not heard of such problems (but Mailman 3
is a completely new code base, so it's possible a new issues has been
introduced).

It's also possible that Tim's MUA double-UTF-8-encodes the é, which
results in an illegal code point sequence which might also be
represented as ??.  Of course double-encoding is a bug, and if Mailman
receives such email, it's quite likely that it would replace the
broken text with ??.  This seems highly unlikely, as Tim would be
seeing issues all over the place, including in mail directly to
himself, which he has tried without problem.

So most likely something between Tim's MUA and Mailman (the list
manager, not HyperKitty) is mishandling the text, in one of the ways
described above.  I can't exclude either endpoint, but in both cases
somebody should be seeing a lot of similar mojibake.  Tim reports
some, but not in direct to self, and I've never seen Mailman cause
anything like this.  More objectively, the fact that the ??s are in
HyperKitty rather than some 8-bit mojibake strongly suggests that even
if Mailman is directly responsible for the ??s, it was replacing
existing mojibake with ??.

 > but that seems to be impossible with hyperkitty.

I believe there's work being done on more detailed archiving at GNU
Mailman that might help diagnosing this kind of issue (not done yet
though).  I don't know if lists.fedoraproject is tracking us, though.


-- 
Associate Professor              Division of Policy and Planning Science
http://turnbull.sk.tsukuba.ac.jp/     Faculty of Systems and Information
Email: turnb...@sk.tsukuba.ac.jp                   University of Tsukuba
Tel: 029-853-5175                 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN
_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org

Reply via email to