On Fri, 02 Mar 2007 20:24:44 +0100, "A.J.Mechelynck" <[EMAIL PROTECTED]> wrote:
> Bill Moseley wrote: > > I have a utf-8 file that uses the unicode line separator. Not > > something I've come across very often. In utf-8 the sequence is: > > > > 0xE2 0x80 0xA8 (e280a8) > > > > In a uxterm vim correctly reads (and sets) the file encoding as utf8 > > (there's no BOM on the file), but the U-2028 character is displayed > > as an un-displayable character and not displayed as a new line. > > That is, all the text is displayed as a single line. > > > > Can anyone educate me a bit on the use of the Line Separator character > > and if or how it can be supported in Vim? > > I may be wrong, but IIUC this codepoint plays the same role as the HTML <br> > tag: it does not define an "end of line" in the text file which contains it, > but it means that, when rendered typographically, as in a browser or a > WYSIWYG > editor (neither of which Vim is, or tries to mimic), the rendered output must > have a linebreak at this point. > > IOW: I think it's a feature, not a bug. > > You can add a linebreak after every occurrence of that codepoint by using > > :exe "%s/\<Char-0x2028>/" . '\0\r/g' > > Note that I intentionally use double quotes in the first part and single > quotes in the second part. According to http://www.unicode.org/reports/tr13/tr13-9.html the correct way to treat U+2028 and U+2029 (paragraph separator) is to translate them into the platform's standard sequence for representing the end of a line. (What it actually says is that if the purpose of the line break is unambiguously known -- that is, whether it is the end of a line or the end of a paragraph -- then the corresponding Unicode character should be used. But Vim is a text editor and knows nothing of paragraphs, so I would expect both these characters to be translated into the platform's end-of-line representation.) However, this would be lossy, so if this were to be implemented I suspect an option would be required for the benefit of people who want to edit Unicode text without losing the distinction between line and paragraph endings. -- Matthew Winn