Re: Bullet character across Vim platforms

AndyHancock Sun, 17 Jan 2010 16:16:13 -0800

On Sun, Jan 17, 2010 at 6:46 PM, Tony Mechelynck
<[email protected]> wrote:
> On 17/01/10 23:55, AndyHancock wrote:
>>
>> I am finally following up on the solution below for using bullets
>> corresponding to Windows-1252 code 149 (0x95).  I put the script in
>> vimrc, issues "gvim Temp.txt" from the bash command line (Temp.txt is
>> nonexistent), and got no warning about missing multi_byte capability
>> [ not surprising since ":echo has('multi_byte')" yields 1 ].  However,
>> I'm still not getting the bullet.  I tried a few ways.
>>
>> First, I created a bullet in one of a couple of Windows app (Firefox,
>> Palm Desktop) using the usual method: Alt-0149 on the number pad.
>> (Actually, it's a laptops, so I had to use "num lk", which locks some
>> of the qwerty keys into a number pad function).  Then I copied and
>> pasted the bullet into gvim (mouse middle-button to paste, since it's
>> X-windows).  It pastes as a question-mark character.
>>
>> I then tried Alt-0149 directly in gvim while in insert mode.  No joy,
>> as this translates into four characters, corrersponding to Alt-0 Alt-1
>> Alt-4 Alt-9.
>>
>> Finally, in insert mode, I did Ctrl-V 149, which simply inserts a
>> character with hex code ox95.  I literally shows up as a blue-coloured
>> "<95>" without quotes.  The "ga" command shows this to be a character
>> with hex code 0x95.  I thought that the last method above might be
>> fine, and perhaps it just wasn't showing up in the gvim window due
>> some reason related to X-windows fonts.  So I copied and pasted the
>> text into a Windows app...the result was a space in place of the
>> bullet.  Note that the bullet 0x95 copies successfully between windows
>> apps, and between windows apps and windows-based gvim (as opposed to
>> cygwin gvim).
>>
>> I also tried the alternative bullet 0xB7, but that's almost invisible
>> in Palm Desktop.  Better to use an asterisk (which is highly
>> nonideal).
>>
>> To troubleshoot the script, I tried querired the following options,
>> with the shown results:
>>
>>    Encoding
>>    --------
>>    set encoding
>>    Ans: encoding=utf-8
>>
>>    setlocal encoding
>>    Ans: encoding=utf-8
>>
>>    setglobal encoding
>>    Ans: encoding=utf-8
>>
>>    This make sense, since encoding is global.
>>
>>    Fileencoding
>>    ------------
>>    set fileencoding
>>    Ans: fileencoding=
>>
>>    setlocal fileencoding
>>    Ans: fileencoding=
>
> 'fileencoding' empty means that the file will be recorded in the same
> charset as 'encoding', i.e., UTF-8. In that charset, 0x95 (or 149 decimal)
> is a non-printable control character. Since this control character has no
> representation in Windows-1252, you cannot convert to Windows-1252 an UTF-8
> buffer which contains it (and remember, 'encoding', not 'fileencoding',
> defines how Vim represent the data in memory).
>
> As I said in my previous post, to generate a file encoded on disk in
> Windows-1252 with a Windows-1252 bullet (0x95 on disk) in it, you must:
>
> 1) keep 'encoding' to UTF-8 as above
>
> 2) :setlocal fileencoding=cp1252   " (or: :setlocal fenc=windows-1252 )
>
> 3) Enter the character into Vim as the Unicode representation of the
> equivalent character, i.e. U+2022. To do that, in Insert mode, type (with no
> intervening spaces, I add them here only for legibility): Ctrl-V u 2 0 2 2
> (but if your Ctrl-V has been remapped to the Paste operation, use Ctrl-Q
> instead).
>
> See
>        :help i_CTRL-V_digit
>        :help i_CTRL-Q
>        :help CTRL-V-alternative


Actually, I'm just discovering this from reading at the same time you
responded.  What happened was that I suspected the bullet wasn't
getting properly copied over by the bridge between the Windows cut/
paste buffer and X-windows's cut/paste buffer.  So on the Windows
side, I used Notepad to save a text file containing the desired
bullet.  Then I opened it using Vim, but I had to set encoding=utf-8
for the bullet to show properly.  "ga" revealed the 4-digit hex code,
which was no longer decimal x95, but the help showed how to enter 4-
digit hex codes.  If I originated the file from the unix/cygwin/gvim
side, I'd have to be sure to set fileformat=dos before I could see the
bullet.

All this to say that if I want to use Vim to work on text from the
Palm Desktop, I need to save it to a DOS text file by pasting into
Notepad first.  Ah well.  A bit more buttonology won't kill me, though
I have to admit it's not convenient.  However, it isn't inconvenient
enough for me to install the Windows version of gvim -- that would be
so overkill.

Thanks, Tony.

>>    setglobal fileencoding
>>    Ans: fileencoding=windows-1252
>
> This is the global default, but it won't be applied to the file because the
> local value is different.
>
>>
>>
>>    Fileencodings
>>    -------------
>>    set fileencodings
>>    Ans: fileencodings=ucs-bom,utf-8,Windows-1252
>>
>>    setlocal fileencodings
>>    Ans: fileencodings=ucs-bom,utf-8,Windows-1252
>>
>>    setglobal fileencodings
>>    Ans: fileencodings=ucs-bom,utf-8,Windows-1252
>>
>>    This make sense, since fileencodings is global.
>>
>>    Bomb
>>    ----
>>    set bomb?
>>    Ans: nobomb
>>
>>    setlocal bomb?
>>    Ans: nobomb
>>
>>    setglobal bomb?
>>    Ans: bomb
>>
>> Rather than create a new Temp.txt from the bash command line, I also
>> tried creating new unnamed files using<Ctrl-w><Ctrl-n>  and ":new".
>> The only difference for these two buffers was that the local value of
>> fileencoding was Windows-1252, and local boolean bomb option was set.
>> However, but all the above attempts to create a bullet yielded the
>> same results.
>>
>> I also issued "setlocal fileencoding=Windows-1252" in Temp.txt (using
>> both capital and small "w"), and "setlocal bomb".  That prevented me
>> from saving the file:
>>
>>    E513:
>>    write error, conversion failed
>>    (make 'fenc' empty to override)
>>
>> This was due to the 0x95 character that I inserted using Ctrl-V.  It
>> turns out that this also affected the two unnamed buffers -- that is,
>> if I tried to issue "w! Temp3.txt", I get the same error if the 0x95
>> character is present in the buffer.  The only way to be able to write
>> the file is to setlocal fileencoding to null, or remove the 0x95
>> character.
>>
>> I admit that I am far from experienced with character encodings.  Is
>> there anything I'm missing from the solution below?
>>
>> ---------- Forwarded message ----------
>> From: Tony Mechelynck<[email protected]>
>> Date: Jan 11 2009, 7:03 pm
>> Subject: Bullet character across Vim platforms
>> To: vim_use
>>
>> On 11/01/09 17:35, AndyHancock wrote:
>>>
>>> Sorry for the repost, but the first time submitted through Google
>>> Groups yielded a blank submission form.  So I have recomposed and
>>> reposted (20 minutes of time).
>>>
>>> I am using:
>>> 1. Vim6.2 on Windows 2000, Lucida Console font, and
>>> 2. Vim7.1.2 on Cygwin's Xwin[dows], Lucida Typewriter font, on to of
>>>     Windows 2000
>>>
>>> After some surfing, I found that I can get a realbulletcharacter
>>> (not asterisk or dash) in Windows using ASCII code 149.
>>>
>>> A. On windows applications, press Alt, enter 0149 on number pad.
>>> B. On #1 above in insert mode, enter Ctrl-V followed by 149.
>>>
>>> Neither of these work for #2 above.  Even if I create abullet
>>> character using #1 and #B, it shows up as "~U" (minus quotes) in #2.
>>>
>>> Is there a way to create bullets in #2?
>>>
>>> Is there a way to have those bullets maintain their appearance across
>>> Vim platforms?
>>
>> It depends on your 'encoding', which is how Vim represents data in
>> memory.
>>
>> It also depends on each file's 'fileencoding', which is how that
>> file's data is represented on disk.
>>
>> Of course, to be able to use any given character in a file edited by
>> Vim, that character must be representable (not necessarily the same
>> way) in both Vim 's'encoding' and the file's 'fileencoding'.
>>
>> In the Latin1 aka ISO-8859-1 encoding, the character decimal 149, hex
>> 0x95 is a control character, corresponding to Unicode U+0095<control>
>> = MESSAGE WAITING. That character is not printable.
>>
>> In the Windows-1252 encoding, that same decimal 149 hex 0x95 value is
>> used to represent a different character, namely the unicode codepoint U
>> +2022BULLET. That character is not representable in Latin1.
>>
>> Now you have several possibilities.
>>
>> First, I recommend using utf-8 for Vim'sinternal representation of the
>> data in memory, because that 'encoding' can represent any Unicode
>> codepoint, which means that regardless of the file's
>> 'fileencoding',Vim will be able to represent it in memory. This
>> requires a binary compiled with +multi_byte -- such a binary will
>> answer with the number 1 (one) when you ask ":echo has('multi_byte').
>>
>> Then you will have to decide how to represent the data on disk. For
>> portability between various computers, Latin1 is recommended; however
>> this means that anything between 0x80 and 0x9F included is reserved
>> for non-printable control characters.
>>
>> If you prefer having an additional 32 characters at your disposal in
>> an 8-bit encoding, you can use Windows-1252 everywhere, and decide
>> that you'll represent any 8-bit disk file in that 'fileencoding'. You
>> could make Vim (with 'encoding' set to utf-8) recognize these files by
>> means of the command ":set fileencodings=ucs-bom,utf-8,Windows-1252"
>> in your vimrc (see where in the snippet at the bottom of this email,
>> and notice the difference between 'fileencoding' [singular] and
>> 'fileencodings' [plural]). The problem with this approach is that if
>> you publish such documents, anyone with a Unix or Linux or Mac
>> operating system will probably not display those 32 additional
>> characters correctly.
>>
>> Or else, you can choose the Unicode UTF-8 encoding as your preferred
>> 'fileencoding', which doesn't forbid using Latin1, Windows-1252, or
>> indeed anything else for occasional files. In that case I recommend
>> using a BOM on Unicode files in order to let them be recognized
>> unambiguously even by programs other than Vim and by computers other
>> than your own.
>>
>> Now here's the promised snippet of code; place it near the top of your
>> vimrc, after setting ":language" if you use that command but before
>> defining any mappings. I have added comments to make it as
>> understandable as I can.
>>
>> " Unicode can only be used if Vim is compiled with +multi_byte
>> if has('multi_byte')
>>         " if Vim is already using Unicode, no need to change it
>>         if&encoding !~? '^u'
>>                 " avoid clobbering the keyboard/display encoding
>>                 if&termencoding == ''
>>                         let&termencoding =&encoding
>>                 endif
>>                 " use UTF-8 internally in Vim memory
>>                 set encoding=utf-8
>>         endif
>>         " setup the heuristics to recognize
>>         " how existing files are coded
>>         set fileencodings=ucs-bom,utf-8,Windows-1252
>>         " define defaults for new files
>>         " use Windows-1252 (8 bit) by default
>>         setglobal fileencoding=Windows-1252
>>         " use a BOM on Unicode files
>>         setglobal bomb
>> " if Vim has no +multi_byte capability, warn the user
>> else
>>         echomsg "No +multi_byte in this Vim version"
>> endif
>>
>> You can vary the details of the above once you understand the general
>> idea. If you don't change anything, your new files will be created in
>> Windows-1252, and existing files will be assumed to be Windows-1252
>> unless they either start with a Unicode BOM, or contain only codes
>> which are valid for UTF-8 (anything above 0x7F is represented in UTF-8
>> by at least two bytes with the high bit set, so this will still allow
>> recognizing your existing bullets). To write one new file in UTF-8
>> instead, use either
>>
>>         :e ++enc=utf-8 newfile
>> or
>>         :e newfile
>>         :setlocal fenc=utf-8
>>
>> (where 'fenc' is of course the short name for the 'fileencoding'
>> option).
>>
>> See
>>         :help Unicode
>>         :help +multi_byte
>>         :help 'encoding'
>>         :help 'fileencoding'
>>         :help 'fileencodings'
>>         :help 'termencoding'
>>         :help 'bomb'
>>         :help ++opt
>>        http://vim.wikia.org/wiki/Working_with_Unicode
>>
>> Oh, and one more thing: For abullet-like character which looks the
>> same in both Latin1 and Windows-1252, you could use the character
>> 0xB7, corresponding in both of these encodings to the Unicode
>> codepoint U+00B7 MIDDLE DOT. This is a thinnerbulletthan U+2022 but it
>> is more portable. This "middle dot" is used in Catalan to separate two
>> letters l which must be pronounced as a "geminated hard l", as in
>> col·lega (a colleague) rather than as a single "palatalized l"
>> intermediary between l and y, as in collar (a collar).
>
> Best regards,
> Tony.

-- 
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php

Re: Bullet character across Vim platforms

Reply via email to