> > > So this legacy encoding of end-of-lines is now quite obsolete > > > even on MacOS. > > > > I don't think it can be called "obsolete" as long as files generated using > > that line end convention exist. Or, at least, applications that have an > > operation for "read a line" will have to cope with it. (In other words, > > all of the CR LF CRLF LFCR should mark an "end of line".) > > I was not speaking about the actual encoding of files into bytes, but > only about the interpretation of '\n' or '\r' in C/C++, which was the real > subject of the message.
ISO 14882 says that \n is LF (and also it is newline, i.e. LF is the newline function as far as C++ is concerned) and \r is CR. It does not define this relative to any given character set. So there is nothing in the standard to prevent char being interpreted as an implementation- defined character encoding which is identical to, say US-ASCII or a part of ISO 8859, except for having CR encoded as 0x0A and LF encoded as 0x0D. This would simplify converting newline functions when writing text files on Macs, but potentially cause problems elsewhere. However because the universal-character-name escapes (\uXXXX and \UXXXXXXXX) are defined relative to a particular encoding, namely ISO 10646, it would be an error if ('\n' != '\u000A' || '\r' != '\u000D'). Whether this is implemented by using the values 0x0A and 0x0D for LF and CR respectivley (e.g. by using US- ASCII or a proper superset of US-ASCII such as Unicode) or by converting those values to another encoding when parsing isn't specified. Given that C and C++ are intended to be neutral to encodings, and indeed they do not even mandate that a char be an octet, or that a wchar_t be of the same size as 2 or 4 chars, this is not surprising. The consequence is that we cannot assume that conversion of character, wide character, and string literals to and from Unicode will be trivial.