ID: 22108 Comment by: gump at hotmail dot com Reported By: bugzilla at jellycan dot com Status: Open Bug Type: Feature/Change Request Operating System: Any PHP Version: All (as of the current implementation) Assigned To: moriyoshi New Comment:
> [8 Feb 4:24am CST] [EMAIL PROTECTED] > PHP doesn't want UNICODE scripts, but just ASCII ones. Not > a bug -> bogus. Not bogus. PHP is embedded in HTML, the surrounding document determines the encoding. You can't just specify this problem out of existence. Previous Comments: ------------------------------------------------------------------------ [2003-05-05 03:40:23] tokiee at sayclub dot com for who are not familiar with UTF-8: UTF-8(UCS Transformation Format 8) is not different to ASCII. it's compatible with the ASCII: if you write your text in english with UTF-8. you dont see any difference between the text in ASCII in each byte. (and UTF-8 BOM is optional). it's not quite a exact explanation of UTF-8 but: UTF-8 expands ASCII to support Full UNICODE characters without disurbing any existing alphabet order or something. so basically the UTF-8 is ASCII. and you dont have to imagine it as totally new freak. actually, when a modern Unicode-supported OS reads this UTF-8, the OS needs to CONVERT it to real UNICODE internally. so the UTF-8 is rather similar with URL encoding. in ASCII world, each byte corresponds a character, up to 255 characters. in UNICODE, two bytes corresponds a character, up to 65535 characters. and it's totally a new system as you think. in UTF-8, it's interesting, a character can be one byte, or two bytes, or even 3, 4 bytes!. why is that so complicated but the rule is simple and actually you dont have to handle this: OS will do it for you. even if you have any software which does not understand the utf-8, it's totally okay because it's ASCII transparent. so it "can be used with normal string comparison functions for sorting and such." (quoted in PHP.NET Reference: utf8_encode()) ------------------------------------------------------------------------ [2003-04-14 12:17:37] [EMAIL PROTECTED] As a short-term workaround (yes I know it's not a solution), can you try using output buffering? That should at least solve the problem of sneaking the headers in prior to the BOM even if it doesn't solve the underlying problem of recoginizing document encodings properly. ------------------------------------------------------------------------ [2003-04-06 00:53:04] tronxoe at hotpop dot com The BOM is still fine when the php file does not include another Unicode file (by using @include()). Another problem: If a php file is saved in unicode, session and cookies can not be used because "headers already sent ...". I think the first 3 bytes has been sent in this case ------------------------------------------------------------------------ [2003-02-08 10:57:30] [EMAIL PROTECTED] reassigning ------------------------------------------------------------------------ [2003-02-08 06:10:51] [EMAIL PROTECTED] Ok, the UTF-8 BOM was new to me. If i find the time i'll have a look at it over the weekend. I think the solution would be somewhere in zend's multibyte support since i fear adding that bom to mbstring alone does not do the trick. ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/22108 -- Edit this bug report at http://bugs.php.net/?id=22108&edit=1