Re: auto-detecting file encoding

2006-06-19 Thread
From: Axel Mock Sent: 6/19/2006 7:59:48 AM To: [EMAIL PROTECTED] Cc: activeperl@listserv.ActiveState.com Subject: Re: auto-detecting file encoding > Hi, > > i was just reading this thread concerning detecting/guessing Unicode, while I > was debugging > my little module that, among

Re: auto-detecting file encoding

2006-06-19 Thread Axel Mock
Hi, i was just reading this thread concerning detecting/guessing Unicode, while I was debugging my little module that, among other file releated things, should read in some file, convert it to internal UTF8. Things I came across: Encode::Guess was obviously written with non-UTF input in mind

Re: auto-detecting file encoding

2006-06-19 Thread DZ-Jay
quot; <[EMAIL PROTECTED]> Sent by: To: activeperl@listserv.ActiveState.com [EMAIL PROTECTED]cc: eState.com Subject: auto-detecting file encoding

Re: auto-detecting file encoding

2006-06-19 Thread DZ-Jay
On Jun 18, 2006, at 22:06, Jerry Yang wrote: Hi, The file in UTF-8 should have a BOM like this "EF BB BF" Bytes Encoding Form 00 00 FE FF UTF-32, big-endian FF FE 00 00 UTF-32, little-endian FE FF UTF-16, big-endian FF FE UTF-16, little-endian EF BB BF UTF-8 Should, but don't have t

Re: auto-detecting file encoding

2006-06-19 Thread Torsten . Werner
Sent by: To: activeperl@listserv.ActiveState.com [EMAIL PROTECTED]cc:

Re: auto-detecting file encoding

2006-06-18 Thread Jerry Yang
Hi, The file in UTF-8 should have a BOM like this "EF BB BF"Bytes Encoding Form 00 00 FE FF UTF-32, big-endian FF FE 00 00 UTF-32, little-endian

auto-detecting file encoding

2006-06-16 Thread
Hello: I need to process the text of thousands of files automatically, with simple regexp substitutions. The problem I have is that, although all files are plaintext, they have been written with a variety of programs in Windows, so they employ diverse encodings. For example, some are in 'ut