On Sun, Jan 17, 2010 at 6:46 PM, Tony Mechelynck <[email protected]> wrote: > On 17/01/10 23:55, AndyHancock wrote: >> >> I am finally following up on the solution below for using bullets >> corresponding to Windows-1252 code 149 (0x95). I put the script in >> vimrc, issues "gvim Temp.txt" from the bash command line (Temp.txt is >> nonexistent), and got no warning about missing multi_byte capability >> [ not surprising since ":echo has('multi_byte')" yields 1 ]. However, >> I'm still not getting the bullet. I tried a few ways. >> >> First, I created a bullet in one of a couple of Windows app (Firefox, >> Palm Desktop) using the usual method: Alt-0149 on the number pad. >> (Actually, it's a laptops, so I had to use "num lk", which locks some >> of the qwerty keys into a number pad function). Then I copied and >> pasted the bullet into gvim (mouse middle-button to paste, since it's >> X-windows). It pastes as a question-mark character. >> >> I then tried Alt-0149 directly in gvim while in insert mode. No joy, >> as this translates into four characters, corrersponding to Alt-0 Alt-1 >> Alt-4 Alt-9. >> >> Finally, in insert mode, I did Ctrl-V 149, which simply inserts a >> character with hex code ox95. I literally shows up as a blue-coloured >> "<95>" without quotes. The "ga" command shows this to be a character >> with hex code 0x95. I thought that the last method above might be >> fine, and perhaps it just wasn't showing up in the gvim window due >> some reason related to X-windows fonts. So I copied and pasted the >> text into a Windows app...the result was a space in place of the >> bullet. Note that the bullet 0x95 copies successfully between windows >> apps, and between windows apps and windows-based gvim (as opposed to >> cygwin gvim). >> >> I also tried the alternative bullet 0xB7, but that's almost invisible >> in Palm Desktop. Better to use an asterisk (which is highly >> nonideal). >> >> To troubleshoot the script, I tried querired the following options, >> with the shown results: >> >> Encoding >> -------- >> set encoding >> Ans: encoding=utf-8 >> >> setlocal encoding >> Ans: encoding=utf-8 >> >> setglobal encoding >> Ans: encoding=utf-8 >> >> This make sense, since encoding is global. >> >> Fileencoding >> ------------ >> set fileencoding >> Ans: fileencoding= >> >> setlocal fileencoding >> Ans: fileencoding= > > 'fileencoding' empty means that the file will be recorded in the same > charset as 'encoding', i.e., UTF-8. In that charset, 0x95 (or 149 decimal) > is a non-printable control character. Since this control character has no > representation in Windows-1252, you cannot convert to Windows-1252 an UTF-8 > buffer which contains it (and remember, 'encoding', not 'fileencoding', > defines how Vim represent the data in memory). > > As I said in my previous post, to generate a file encoded on disk in > Windows-1252 with a Windows-1252 bullet (0x95 on disk) in it, you must: > > 1) keep 'encoding' to UTF-8 as above > > 2) :setlocal fileencoding=cp1252 " (or: :setlocal fenc=windows-1252 ) > > 3) Enter the character into Vim as the Unicode representation of the > equivalent character, i.e. U+2022. To do that, in Insert mode, type (with no > intervening spaces, I add them here only for legibility): Ctrl-V u 2 0 2 2 > (but if your Ctrl-V has been remapped to the Paste operation, use Ctrl-Q > instead). > > See > :help i_CTRL-V_digit > :help i_CTRL-Q > :help CTRL-V-alternative
Actually, I'm just discovering this from reading at the same time you responded. What happened was that I suspected the bullet wasn't getting properly copied over by the bridge between the Windows cut/ paste buffer and X-windows's cut/paste buffer. So on the Windows side, I used Notepad to save a text file containing the desired bullet. Then I opened it using Vim, but I had to set encoding=utf-8 for the bullet to show properly. "ga" revealed the 4-digit hex code, which was no longer decimal x95, but the help showed how to enter 4- digit hex codes. If I originated the file from the unix/cygwin/gvim side, I'd have to be sure to set fileformat=dos before I could see the bullet. All this to say that if I want to use Vim to work on text from the Palm Desktop, I need to save it to a DOS text file by pasting into Notepad first. Ah well. A bit more buttonology won't kill me, though I have to admit it's not convenient. However, it isn't inconvenient enough for me to install the Windows version of gvim -- that would be so overkill. Thanks, Tony. >> setglobal fileencoding >> Ans: fileencoding=windows-1252 > > This is the global default, but it won't be applied to the file because the > local value is different. > >> >> >> Fileencodings >> ------------- >> set fileencodings >> Ans: fileencodings=ucs-bom,utf-8,Windows-1252 >> >> setlocal fileencodings >> Ans: fileencodings=ucs-bom,utf-8,Windows-1252 >> >> setglobal fileencodings >> Ans: fileencodings=ucs-bom,utf-8,Windows-1252 >> >> This make sense, since fileencodings is global. >> >> Bomb >> ---- >> set bomb? >> Ans: nobomb >> >> setlocal bomb? >> Ans: nobomb >> >> setglobal bomb? >> Ans: bomb >> >> Rather than create a new Temp.txt from the bash command line, I also >> tried creating new unnamed files using<Ctrl-w><Ctrl-n> and ":new". >> The only difference for these two buffers was that the local value of >> fileencoding was Windows-1252, and local boolean bomb option was set. >> However, but all the above attempts to create a bullet yielded the >> same results. >> >> I also issued "setlocal fileencoding=Windows-1252" in Temp.txt (using >> both capital and small "w"), and "setlocal bomb". That prevented me >> from saving the file: >> >> E513: >> write error, conversion failed >> (make 'fenc' empty to override) >> >> This was due to the 0x95 character that I inserted using Ctrl-V. It >> turns out that this also affected the two unnamed buffers -- that is, >> if I tried to issue "w! Temp3.txt", I get the same error if the 0x95 >> character is present in the buffer. The only way to be able to write >> the file is to setlocal fileencoding to null, or remove the 0x95 >> character. >> >> I admit that I am far from experienced with character encodings. Is >> there anything I'm missing from the solution below? >> >> ---------- Forwarded message ---------- >> From: Tony Mechelynck<[email protected]> >> Date: Jan 11 2009, 7:03 pm >> Subject: Bullet character across Vim platforms >> To: vim_use >> >> On 11/01/09 17:35, AndyHancock wrote: >>> >>> Sorry for the repost, but the first time submitted through Google >>> Groups yielded a blank submission form. So I have recomposed and >>> reposted (20 minutes of time). >>> >>> I am using: >>> 1. Vim6.2 on Windows 2000, Lucida Console font, and >>> 2. Vim7.1.2 on Cygwin's Xwin[dows], Lucida Typewriter font, on to of >>> Windows 2000 >>> >>> After some surfing, I found that I can get a realbulletcharacter >>> (not asterisk or dash) in Windows using ASCII code 149. >>> >>> A. On windows applications, press Alt, enter 0149 on number pad. >>> B. On #1 above in insert mode, enter Ctrl-V followed by 149. >>> >>> Neither of these work for #2 above. Even if I create abullet >>> character using #1 and #B, it shows up as "~U" (minus quotes) in #2. >>> >>> Is there a way to create bullets in #2? >>> >>> Is there a way to have those bullets maintain their appearance across >>> Vim platforms? >> >> It depends on your 'encoding', which is how Vim represents data in >> memory. >> >> It also depends on each file's 'fileencoding', which is how that >> file's data is represented on disk. >> >> Of course, to be able to use any given character in a file edited by >> Vim, that character must be representable (not necessarily the same >> way) in both Vim 's'encoding' and the file's 'fileencoding'. >> >> In the Latin1 aka ISO-8859-1 encoding, the character decimal 149, hex >> 0x95 is a control character, corresponding to Unicode U+0095<control> >> = MESSAGE WAITING. That character is not printable. >> >> In the Windows-1252 encoding, that same decimal 149 hex 0x95 value is >> used to represent a different character, namely the unicode codepoint U >> +2022BULLET. That character is not representable in Latin1. >> >> Now you have several possibilities. >> >> First, I recommend using utf-8 for Vim'sinternal representation of the >> data in memory, because that 'encoding' can represent any Unicode >> codepoint, which means that regardless of the file's >> 'fileencoding',Vim will be able to represent it in memory. This >> requires a binary compiled with +multi_byte -- such a binary will >> answer with the number 1 (one) when you ask ":echo has('multi_byte'). >> >> Then you will have to decide how to represent the data on disk. For >> portability between various computers, Latin1 is recommended; however >> this means that anything between 0x80 and 0x9F included is reserved >> for non-printable control characters. >> >> If you prefer having an additional 32 characters at your disposal in >> an 8-bit encoding, you can use Windows-1252 everywhere, and decide >> that you'll represent any 8-bit disk file in that 'fileencoding'. You >> could make Vim (with 'encoding' set to utf-8) recognize these files by >> means of the command ":set fileencodings=ucs-bom,utf-8,Windows-1252" >> in your vimrc (see where in the snippet at the bottom of this email, >> and notice the difference between 'fileencoding' [singular] and >> 'fileencodings' [plural]). The problem with this approach is that if >> you publish such documents, anyone with a Unix or Linux or Mac >> operating system will probably not display those 32 additional >> characters correctly. >> >> Or else, you can choose the Unicode UTF-8 encoding as your preferred >> 'fileencoding', which doesn't forbid using Latin1, Windows-1252, or >> indeed anything else for occasional files. In that case I recommend >> using a BOM on Unicode files in order to let them be recognized >> unambiguously even by programs other than Vim and by computers other >> than your own. >> >> Now here's the promised snippet of code; place it near the top of your >> vimrc, after setting ":language" if you use that command but before >> defining any mappings. I have added comments to make it as >> understandable as I can. >> >> " Unicode can only be used if Vim is compiled with +multi_byte >> if has('multi_byte') >> " if Vim is already using Unicode, no need to change it >> if&encoding !~? '^u' >> " avoid clobbering the keyboard/display encoding >> if&termencoding == '' >> let&termencoding =&encoding >> endif >> " use UTF-8 internally in Vim memory >> set encoding=utf-8 >> endif >> " setup the heuristics to recognize >> " how existing files are coded >> set fileencodings=ucs-bom,utf-8,Windows-1252 >> " define defaults for new files >> " use Windows-1252 (8 bit) by default >> setglobal fileencoding=Windows-1252 >> " use a BOM on Unicode files >> setglobal bomb >> " if Vim has no +multi_byte capability, warn the user >> else >> echomsg "No +multi_byte in this Vim version" >> endif >> >> You can vary the details of the above once you understand the general >> idea. If you don't change anything, your new files will be created in >> Windows-1252, and existing files will be assumed to be Windows-1252 >> unless they either start with a Unicode BOM, or contain only codes >> which are valid for UTF-8 (anything above 0x7F is represented in UTF-8 >> by at least two bytes with the high bit set, so this will still allow >> recognizing your existing bullets). To write one new file in UTF-8 >> instead, use either >> >> :e ++enc=utf-8 newfile >> or >> :e newfile >> :setlocal fenc=utf-8 >> >> (where 'fenc' is of course the short name for the 'fileencoding' >> option). >> >> See >> :help Unicode >> :help +multi_byte >> :help 'encoding' >> :help 'fileencoding' >> :help 'fileencodings' >> :help 'termencoding' >> :help 'bomb' >> :help ++opt >> http://vim.wikia.org/wiki/Working_with_Unicode >> >> Oh, and one more thing: For abullet-like character which looks the >> same in both Latin1 and Windows-1252, you could use the character >> 0xB7, corresponding in both of these encodings to the Unicode >> codepoint U+00B7 MIDDLE DOT. This is a thinnerbulletthan U+2022 but it >> is more portable. This "middle dot" is used in Catalan to separate two >> letters l which must be pronounced as a "geminated hard l", as in >> col·lega (a colleague) rather than as a single "palatalized l" >> intermediary between l and y, as in collar (a collar). > > Best regards, > Tony.
-- You received this message from the "vim_use" maillist. For more information, visit http://www.vim.org/maillist.php
