On Fri, 02 Mar 2007 20:24:44 +0100, "A.J.Mechelynck"
<[EMAIL PROTECTED]> wrote:

> Bill Moseley wrote:
> > I have a utf-8 file that uses the unicode line separator.  Not
> > something I've come across very often.  In utf-8 the sequence is:
> > 
> >     0xE2 0x80 0xA8 (e280a8)
> > 
> > In a uxterm vim correctly reads (and sets) the file encoding as utf8
> > (there's no BOM on the file), but the U-2028 character is displayed
> > as an un-displayable character and not displayed as a new line.
> > That is, all the text is displayed as a single line.
> > 
> > Can anyone educate me a bit on the use of the Line Separator character
> > and if or how it can be supported in Vim?
> 
> I may be wrong, but IIUC this codepoint plays the same role as the HTML <br> 
> tag: it does not define an "end of line" in the text file which contains it, 
> but it means that, when rendered typographically, as in a browser or a 
> WYSIWYG 
> editor (neither of which Vim is, or tries to mimic), the rendered output must 
> have a linebreak at this point.
> 
> IOW: I think it's a feature, not a bug.
> 
> You can add a linebreak after every occurrence of that codepoint by using
> 
>       :exe "%s/\<Char-0x2028>/" . '\0\r/g'
> 
> Note that I intentionally use double quotes in the first part and single 
> quotes in the second part.

According to http://www.unicode.org/reports/tr13/tr13-9.html the
correct way to treat U+2028 and U+2029 (paragraph separator) is to
translate them into the platform's standard sequence for representing
the end of a line. (What it actually says is that if the purpose of
the line break is unambiguously known -- that is, whether it is the
end of a line or the end of a paragraph -- then the corresponding
Unicode character should be used. But Vim is a text editor and knows
nothing of paragraphs, so I would expect both these characters to be
translated into the platform's end-of-line representation.)

However, this would be lossy, so if this were to be implemented I
suspect an option would be required for the benefit of people who want
to edit Unicode text without losing the distinction between line and
paragraph endings.

-- 
Matthew Winn

Reply via email to