Re: Text edit versus vi on some files

A.J.Mechelynck Mon, 18 Sep 2006 17:45:09 -0700

Brian McKee wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



On 18-Sep-06, at 11:56 AM, David Morel wrote:

Brian McKee a écrit :

file Localizable.strings
Localizable.strings: Big-endian UTF-16 Unicode C program characterdata

If I open that file in vim I get
??^@/[EMAIL PROTECTED]@ [EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@ 
[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL 
PROTECTED]
but Text Edit displays it correctly.
Can vi handle this type of file?  If so, how?

in vim, type :h multibyte
that should get you started :)


Eeeek - started right around the bend I think :-)

Biggest issue from my current point of view is it studiously ignores MacOS...


Chris Eidhof suggested

set encoding=utf8
set fileencoding=utf8


which works if you set it before you open the file in question.

Interestingly =utf16 'works' too... or at least it shows plain ASCIItype lettering ok.


Between those ideas I've decided to leave things alone and just do a
   :e ++enc=utf16
whenever I see lots of alternating @ signs and letters :-)

I think I'd prefer leaving my standard encoding at latin1 to match thelinux

boxes I'm often working on at the same time.

Am I right in understanding that Apple's TextEdit must be automatically
detecting UTF16 files and changing it's base encoding to match?

And is there some way that vi could do the same?

Brian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFFDuvUGnOmb9xIQHQRAi6hAJ9858onQRWXR+kByXCcm/Cpk631bACg2cbB
e2JH8drOIyERomjI7zpPTn0=
=Wa4n
-----END PGP SIGNATURE-----

Your example looks like UTF-16 (or UCS-2) text, i.e. Unicode encoded attwo bytes per character for most characters. Such text may containcharacters (Chinese, Russian, Hebrew, Greek, Arabic, whatever) whichcanot be represented in latin1. I suggest the following (in gvim):


        if &termencoding == ""
                let &termencoding = &encoding
        endif
        set encoding=utf-8
        set fileencodings=ucs-bom,utf-8,latin1

Here's an explanation:

'termencoding' defines how your keyboard encodes the data. The defaultis empty, which means "fallback to 'encoding'". If you change'encoding', you should keep 'termencoding' at the _old_ value of'encoding', the one which was set according to your OS locale.

'encoding' defines how Vim represents the data in memory. For allUnicode encodings, Vim actually uses UTF-8 internally, because otherUnicode encodings uses null bytes within the data, and that isincompatible with the way the C language encodes strings.

'fileencodings' (plural) defines which heuristics Vim will use to"guess" the 'fileencoding' (singular) of an editfile when opening it."ucs-bom" means "check for a BOM at the start of the file". The BOM isthe codepoint U+FEFF ZERO-WIDTH NO-BREAK SPACE (which is deprecatedexcept as an encoding marker). It looks like your file has one; eachUnicode encoding has a different disk representation for it (here in hex):


UTF-8:       EF BB BF
UTF-16be:    FE FF
UTF-16le:    FF FE
UTF-32be:    00 00 FE FF
UTF-32le:    FF FE 00 00

The encodings mentioned in 'fileencodings' are tested from left toright. 'ucs-bom', if present, should be first; and since 8-bit encodingsnever give an "error signal" (every byte is valid in an 8-bit encoding),there should be at most one 8-bit encoding (such as latin1) and, ifpresent, it should come last.

After setting the above settings, Vim should open correctly any Unicodefile with BOM (like yours seems to be) and any UTF-8 file. 7-bitUS-ASCII files will be seen as UTF-8 (which is compatible in the0x00-0x7F range) and Latin1 files which include accented characters orother bytes in the range 0x80-0xFF, will be opened as latin1.



Best regards,
Tony.

Re: Text edit versus vi on some files

Reply via email to