From: Axel Mock
Sent: 6/19/2006 7:59:48 AM
To: [EMAIL PROTECTED]
Cc: activeperl@listserv.ActiveState.com
Subject: Re: auto-detecting file encoding
> Hi,
>
> i was just reading this thread concerning detecting/guessing Unicode, while I
> was debugging
> my little module that, among
Hi,
i was just reading this thread concerning detecting/guessing Unicode, while I
was debugging
my little module that, among other file releated things, should read in some
file, convert it to
internal UTF8.
Things I came across:
Encode::Guess was obviously written with non-UTF input in mind
quot; <[EMAIL PROTECTED]>
Sent by: To:
activeperl@listserv.ActiveState.com
[EMAIL PROTECTED]cc:
eState.com Subject:
auto-detecting file encoding
On Jun 18, 2006, at 22:06, Jerry Yang wrote:
Hi,
The file in UTF-8 should have a BOM like this "EF BB BF"
Bytes Encoding Form 00 00 FE FF UTF-32, big-endian FF FE 00 00
UTF-32,
little-endian FE FF UTF-16, big-endian FF FE UTF-16, little-endian
EF BB
BF UTF-8
Should, but don't have t
Sent by: To:
activeperl@listserv.ActiveState.com
[EMAIL PROTECTED]cc:
Hi, The file in UTF-8 should have a BOM like this "EF BB BF"Bytes
Encoding Form
00 00 FE FF
UTF-32, big-endian
FF FE 00 00
UTF-32, little-endian
Hello:
I need to process the text of thousands of files automatically, with simple
regexp substitutions. The problem I have is that, although all files are
plaintext, they have been written with a variety of programs in Windows, so
they employ diverse encodings. For example, some are in 'ut