From: "John Cowan" <[EMAIL PROTECTED]> > Kent Karlsson scripsit: > > > All of CR, LF, <CR, LF>, NEL, LS, PS, and EOF(!). (Assuming that the > > encoding of the text file is recognised.) > > XML 1.0 treats CR, LF, and <CR, LF> as line terminators and reports > them as LF. > XML 1.1 will treat CR, LF, NEL, <CR, LF>, <CR, NEL>, and LS as line > terminators and report them all as LF. PS is left alone, because of > the bare possibility that it is being used as quasi-markup. > [...]
I also have some old documents that use <VT>=U+000B instead of LF=U+000A to increase the interparagraph spacing. This is still mapped to the source '\v' character constant in C/C++ (and Java as well, except that Java _requires_ that '\v' be mapped only to VT. Some applications still seem to use <VT> after <CR> to create soft line breaks, in text files where paragraphs are normally ended by <CR><LF>. CR was intended to create an overstrike on the previously written (but still complete) line, for example to underline some characters on that line. This is what '\r' should imply in C, and in fact such '\r' should no more be used in C, as it relies to add visual attributes to the previous text. That why <CR> comes before <LF> that terminates the paragraph. Of course there will still be a lot more usages in terminal emulation protocols, which technically are not a text file encodings, as they can create dynamic effects, or can encode and render a text in a non logical order, for example when emulating blinking, or creating "ASCII arts": I consider that terminal emulation protocols (including printing protocols) are supersets of the plain text format, but plain texts should not attempt to reproduce all the terminal "features". So what is the status of <VT> in plain text files ? For me it should have the same behavior as <LF>, except that it does not imply a end of paragraph. Is there a good replacement for this legacy control, that just means a explicit soft line break in the middle of a paragraph (in which case it may occur instead of a <SPACE> and act as a word separator, except if it occurs after a <soft hyphen> where it becomes ignorable) ?