Hi Anthony, Anthony J. Bentley wrote on Mon, Oct 27, 2014 at 12:57:10AM -0600: > Ingo Schwarze writes:
>> In ports land, many manual pages contain occasional non-ASCII >> characters - even though i don't consider that a particularly >> smart idea, but let's face it, those characters *are* out there. > I agree that this is appropriate for mandoc to try to handle for a > common, very limited subset of encodings. >> Since this is a somewhat bigger and user-visible change, i'm >> asking whether there are any concerns or comments before committing. > After applying this diff, mandoc -Tutf8 shows U+FFFD anywhere > there's a \& in the source... very obvious in the mdoc(7) page. Oops. That was actually not caused by the diff, rather a regression in my latest commit to term.c. It is fixed now. The diff i sent still applies. >> +If not specified, autodetection uses the first match: >> +.Bl -tag -width iso-8859-1 >> +.It Cm utf-8 >> +if the first three bytes of the input file >> +are the UTF-8 byte order mark (BOM, 0xefbbbf) >> +.It Ar encoding >> +if the first or second line of the input file matches the >> +.Sy emacs >> +mode line format >> +.Pp >> +.D1 .\e" -*- Oo ...; Oc coding: Ar encoding ; No -*- >> +.It Cm utf-8 >> +if the first non-ASCII byte in the file introduces a valid UTF-8 sequence >> +.It Cm iso-8859-1 >> +otherwise >> +.El > I agree with this logic as well. I would be uncomfortable if it got > any more complicated. The idea is to do the same as the preconv(1) utility contained in the textproc/groff package. Yours, Ingo
