Ouch. Correcting myself.
Ingo Schwarze wrote on Sun, Feb 16, 2014 at 03:11:07PM +0100:
> 1. I asked around a bit and Thomas Klausner (NetBSD) mentioned
> that both groff and mandoc format bare, unescaped ASCII minus
> characters (`-', 0x2d) found in the input stream as the
> three-byte UTF-8 sequence 0xe2 0x80 0x93 in the output stream
> when running with -Tutf8 or with -Tlocale and LC_CTYPE=*_*.UTF-8.
Dmitrij D. Czarkoff just pointed out to me in private mail that
this isn't true at all. I misunderstood what Thomas said.
So i re-checked. Here is how the various dashes and hyphens
actually render in both groff and mandoc:
input output output
----- ASCII UTF-8
----- -----
- - -
\- - -
\(hy - U+2010
\(en - U+2013
\(em -- U+2014
> That can be annoying when trying to copy and paste code examples
> from formatted manual pages.
Consequently, that can only happen if people use \(hy, \(en, or \(em
for formatting their code examples. Hopefully, few people do that.
Yours,
Ingo