Re: native UTF-8 and ISO-8859-1 input support for mandoc(1)

Ingo Schwarze Mon, 27 Oct 2014 06:42:20 -0700

Hi Anthony,

Anthony J. Bentley wrote on Mon, Oct 27, 2014 at 12:57:10AM -0600:
> Ingo Schwarze writes:


>> In ports land, many manual pages contain occasional non-ASCII
>> characters - even though i don't consider that a particularly
>> smart idea, but let's face it, those characters *are* out there.

> I agree that this is appropriate for mandoc to try to handle for a
> common, very limited subset of encodings.

>> Since this is a somewhat bigger and user-visible change, i'm
>> asking whether there are any concerns or comments before committing.

> After applying this diff, mandoc -Tutf8 shows U+FFFD anywhere
> there's a \& in the source... very obvious in the mdoc(7) page.

Oops.  That was actually not caused by the diff, rather a regression
in my latest commit to term.c.  It is fixed now.

The diff i sent still applies.

>> +If not specified, autodetection uses the first match:
>> +.Bl -tag -width iso-8859-1
>> +.It Cm utf-8
>> +if the first three bytes of the input file
>> +are the UTF-8 byte order mark (BOM, 0xefbbbf)
>> +.It Ar encoding
>> +if the first or second line of the input file matches the
>> +.Sy emacs
>> +mode line format
>> +.Pp
>> +.D1 .\e" -*- Oo ...; Oc coding: Ar encoding ; No -*-
>> +.It Cm utf-8
>> +if the first non-ASCII byte in the file introduces a valid UTF-8 sequence
>> +.It Cm iso-8859-1
>> +otherwise
>> +.El

> I agree with this logic as well. I would be uncomfortable if it got
> any more complicated.

The idea is to do the same as the preconv(1) utility contained in
the textproc/groff package.

Yours,
  Ingo

Re: native UTF-8 and ISO-8859-1 *input* support for mandoc(1)

Reply via email to

Re: native UTF-8 and ISO-8859-1 input support for mandoc(1)