Brian McKee wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 18-Sep-06, at 11:56 AM, David Morel wrote:
Brian McKee a écrit :
file Localizable.strings
Localizable.strings: Big-endian UTF-16 Unicode C program character
data
If I open that file in vim I get
??^@/[EMAIL PROTECTED]@ [EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@
[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL
PROTECTED]
but Text Edit displays it correctly.
Can vi handle this type of file? If so, how?
in vim, type :h multibyte
that should get you started :)
Eeeek - started right around the bend I think :-)
Biggest issue from my current point of view is it studiously ignores Mac
OS...
Chris Eidhof suggested
set encoding=utf8
set fileencoding=utf8
which works if you set it before you open the file in question.
Interestingly =utf16 'works' too... or at least it shows plain ASCII
type lettering ok.
Between those ideas I've decided to leave things alone and just do a
:e ++enc=utf16
whenever I see lots of alternating @ signs and letters :-)
I think I'd prefer leaving my standard encoding at latin1 to match the
linux
boxes I'm often working on at the same time.
Am I right in understanding that Apple's TextEdit must be automatically
detecting UTF16 files and changing it's base encoding to match?
And is there some way that vi could do the same?
Brian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Darwin)
iD8DBQFFDuvUGnOmb9xIQHQRAi6hAJ9858onQRWXR+kByXCcm/Cpk631bACg2cbB
e2JH8drOIyERomjI7zpPTn0=
=Wa4n
-----END PGP SIGNATURE-----
Your example looks like UTF-16 (or UCS-2) text, i.e. Unicode encoded at
two bytes per character for most characters. Such text may contain
characters (Chinese, Russian, Hebrew, Greek, Arabic, whatever) which
canot be represented in latin1. I suggest the following (in gvim):
if &termencoding == ""
let &termencoding = &encoding
endif
set encoding=utf-8
set fileencodings=ucs-bom,utf-8,latin1
Here's an explanation:
'termencoding' defines how your keyboard encodes the data. The default
is empty, which means "fallback to 'encoding'". If you change
'encoding', you should keep 'termencoding' at the _old_ value of
'encoding', the one which was set according to your OS locale.
'encoding' defines how Vim represents the data in memory. For all
Unicode encodings, Vim actually uses UTF-8 internally, because other
Unicode encodings uses null bytes within the data, and that is
incompatible with the way the C language encodes strings.
'fileencodings' (plural) defines which heuristics Vim will use to
"guess" the 'fileencoding' (singular) of an editfile when opening it.
"ucs-bom" means "check for a BOM at the start of the file". The BOM is
the codepoint U+FEFF ZERO-WIDTH NO-BREAK SPACE (which is deprecated
except as an encoding marker). It looks like your file has one; each
Unicode encoding has a different disk representation for it (here in hex):
UTF-8: EF BB BF
UTF-16be: FE FF
UTF-16le: FF FE
UTF-32be: 00 00 FE FF
UTF-32le: FF FE 00 00
The encodings mentioned in 'fileencodings' are tested from left to
right. 'ucs-bom', if present, should be first; and since 8-bit encodings
never give an "error signal" (every byte is valid in an 8-bit encoding),
there should be at most one 8-bit encoding (such as latin1) and, if
present, it should come last.
After setting the above settings, Vim should open correctly any Unicode
file with BOM (like yours seems to be) and any UTF-8 file. 7-bit
US-ASCII files will be seen as UTF-8 (which is compatible in the
0x00-0x7F range) and Latin1 files which include accented characters or
other bytes in the range 0x80-0xFF, will be opened as latin1.
Best regards,
Tony.