Re: Backslash n [OT] was Line Separator and Paragraph Separator

jon Wed, 22 Oct 2003 08:25:44 -0700

> > > So this legacy encoding of end-of-lines is now quite obsolete
> > > even on MacOS.
> >
> > I don't think it can be called "obsolete" as long as files generated using
> > that line end convention exist.  Or, at least, applications that have an
> > operation for  "read a line" will have to cope with it.  (In other words,
> > all of the CR LF CRLF LFCR should mark an "end of line".)
> 
> I was not speaking about the actual encoding of files into bytes, but
> only about the interpretation of '\n' or '\r' in C/C++, which was the real
> subject of the message.


ISO 14882 says that \n is LF (and also it is newline, i.e. LF is the newline 
function as far as C++ is concerned) and \r is CR.

It does not define this relative to any given character set. So there is 
nothing in the standard to prevent char being interpreted as an implementation-
defined character encoding which is identical to, say US-ASCII or a part of ISO 
8859, except for having CR encoded as 0x0A and LF encoded as 0x0D. This would 
simplify converting newline functions when writing text files on Macs, but 
potentially cause problems elsewhere.

However because the universal-character-name escapes (\uXXXX and \UXXXXXXXX) 
are defined relative to a particular encoding, namely ISO 10646, it would be an 
error if ('\n' != '\u000A' || '\r' != '\u000D'). Whether this is implemented by 
using the values 0x0A and 0x0D for LF and CR respectivley (e.g. by using US-
ASCII or a proper superset of US-ASCII such as Unicode) or by converting those 
values to another encoding when parsing isn't specified.

Given that C and C++ are intended to be neutral to encodings, and indeed they 
do not even mandate that a char be an octet, or that a wchar_t be of the same 
size as 2 or 4 chars, this is not surprising. The consequence is that we cannot 
assume that conversion of character, wide character, and string literals to and 
from Unicode will be trivial.

Re: Backslash n [OT] was Line Separator and Paragraph Separator

Reply via email to