Ouch.  Correcting myself.

Ingo Schwarze wrote on Sun, Feb 16, 2014 at 03:11:07PM +0100:

>  1. I asked around a bit and Thomas Klausner (NetBSD) mentioned
>     that both groff and mandoc format bare, unescaped ASCII minus
>     characters (`-', 0x2d) found in the input stream as the
>     three-byte UTF-8 sequence 0xe2 0x80 0x93 in the output stream
>     when running with -Tutf8 or with -Tlocale and LC_CTYPE=*_*.UTF-8.

Dmitrij D. Czarkoff just pointed out to me in private mail that
this isn't true at all.  I misunderstood what Thomas said.

So i re-checked.  Here is how the various dashes and hyphens
actually render in both groff and mandoc:

   input   output   output
   -----   ASCII    UTF-8
           -----    -----

       -   -        -
      \-   -        -
    \(hy   -        U+2010
    \(en   -        U+2013
    \(em   --       U+2014

>     That can be annoying when trying to copy and paste code examples
>     from formatted manual pages.

Consequently, that can only happen if people use \(hy, \(en, or \(em
for formatting their code examples.  Hopefully, few people do that.

Yours,
  Ingo

Reply via email to