#22108 [Com]: php doesn't ignore the utf-8 BOM

gump at hotmail dot com Wed, 04 Jun 2003 05:14:15 -0700

 ID:               22108
 Comment by:       gump at hotmail dot com
 Reported By:      bugzilla at jellycan dot com
 Status:           Open
 Bug Type:         Feature/Change Request
 Operating System: Any
 PHP Version:      All (as of the current implementation)
 Assigned To:      moriyoshi
 New Comment:


> [8 Feb 4:24am CST] [EMAIL PROTECTED]

> PHP doesn't want UNICODE scripts, but just ASCII ones. Not 
> a bug -> bogus.

Not bogus.  

PHP is embedded in HTML, the surrounding document determines the
encoding.  You can't just specify this problem out of existence.


Previous Comments:
------------------------------------------------------------------------

[2003-05-05 03:40:23] tokiee at sayclub dot com

for who are not familiar with UTF-8:

UTF-8(UCS Transformation Format 8) is not different to ASCII. it's
compatible with the ASCII: if you write your text in english with
UTF-8. you dont see any difference between the text in ASCII in each
byte. (and UTF-8 BOM is optional).

it's not quite a exact explanation of UTF-8 but: UTF-8 expands ASCII to
support Full UNICODE characters without disurbing any existing alphabet
order or something. so basically the UTF-8 is ASCII. and you dont have
to imagine it as totally new freak.

actually, when a modern Unicode-supported OS reads this UTF-8, the OS
needs to CONVERT it to real UNICODE internally. so the UTF-8 is rather
similar with URL encoding.

in ASCII world, each byte corresponds a character, up to 255
characters.

in UNICODE, two bytes corresponds a character, up to 65535 characters.
and it's totally a new system as you think.

in UTF-8, it's interesting, a character can be one byte, or two bytes,
or even 3, 4 bytes!. why is that so complicated but the rule is simple
and actually you dont have to handle this: OS will do it for you. 

even if you have any software which does not understand the utf-8, it's
totally okay because it's ASCII transparent. so it "can be used with
normal string comparison functions for sorting and such." (quoted in
PHP.NET Reference: utf8_encode())

------------------------------------------------------------------------

[2003-04-14 12:17:37] [EMAIL PROTECTED]

As a short-term workaround (yes I know it's not a solution), can you
try using output buffering?  That should at least solve the problem of
sneaking the headers in prior to the BOM even if it doesn't solve the
underlying problem of recoginizing document encodings properly.

------------------------------------------------------------------------

[2003-04-06 00:53:04] tronxoe at hotpop dot com

The BOM is still fine when the php file does not include another
Unicode file (by using @include()).

Another problem: If a php file is saved in unicode,  session and
cookies can not be used because "headers already sent ...". I think the
first 3 bytes has been sent in this case

------------------------------------------------------------------------

[2003-02-08 10:57:30] [EMAIL PROTECTED]

reassigning

------------------------------------------------------------------------

[2003-02-08 06:10:51] [EMAIL PROTECTED]

Ok, the UTF-8 BOM was new to me.
If i find the time i'll have a look at it over the weekend.
I think the solution would be somewhere in zend's multibyte support
since i fear adding that bom to mbstring
alone does not do the trick.

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/22108

-- 
Edit this bug report at http://bugs.php.net/?id=22108&edit=1

#22108 [Com]: php doesn't ignore the utf-8 BOM

Reply via email to