Edit report at http://bugs.php.net/bug.php?id=49638&edit=1

 ID:                 49638
 Updated by:         cataphr...@php.net
 Reported by:        olivier at oxeva dot fr
 Summary:            mbstring default behaviour is unexpected
-Status:             Open
+Status:             Feedback
 Type:               Feature/Change Request
-Package:            Feature/Change Request
+Package:            *General Issues
 Operating System:   Linux
 PHP Version:        5.3SVN-2009-09-23 (snap)
 Block user comment: N
 Private report:     N

 New Comment:

The test file is no longer online. Can you provide it?


Previous Comments:
------------------------------------------------------------------------
[2009-09-23 14:11:25] olivier at oxeva dot fr

Just to be clear : 

When I said "file is converted to latin1" (bug description), I did not
mean the file is rewritten on disk. I'm saying the _output_ is in
latin1.

------------------------------------------------------------------------
[2009-09-23 09:43:04] olivier at oxeva dot fr

Description:
------------
Default behaviour of mbstring concerning php files encoding is
unexpected: If the source file is in UTF-8 (with BOM), the file is
converted to latin1 (mbstring.internal_encoding has default value).



Reproduce code:
---------------
Here are all tests done on a file named "testmbstring.php"



http://www.ajeux.com/phptests/testmbstring.phps (md5sum:
e663d28964a20ec404e68226effc27d0)





1/ PHP 5.3.0 WITHOUT mbstring

'./configure'  '--enable-zend-multibyte'

(on a file without the var_dump())

Hexdump:

00000000  31 32 33 c3 a9 31 32 33  e2 82 ac 31 32 33       
|123..123...123|

0000000e



=> result is OK, non-latin1 chars are coded on 2 bytes and 3 bytes





2/ PHP 5.3.0 WITH mbstring

'./configure'  '--enable-zend-multibyte' '--enable-mbstring'



  a) no specific parameters on .ini

php -i returned the following:

mbstring.detect_order => no value => no value

mbstring.encoding_translation => Off => Off

mbstring.func_overload => 0 => 0

mbstring.http_input => pass => pass

mbstring.http_output => pass => pass

mbstring.http_output_conv_mimetypes => ^(text/|application/xhtml\+xml)
=> ^(text/|application/xhtml\+xml)

mbstring.internal_encoding => no value => no value

mbstring.language => neutral => neutral

mbstring.script_encoding => no value => no value

mbstring.strict_detection => Off => Off

mbstring.substitute_character => no value => no value



Hexdump:

00000000  73 74 72 69 6e 67 28 31  30 29 20 22 49 53 4f 2d  |string(10)
"ISO-|

00000010  38 38 35 39 2d 31 22 0a  31 32 33 e9 31 32 33 3f 
|8859-1".123.123?|

00000020  31 32 33                                          |123|

00000023





=> Default behaviour is to have latin1 as internal_encoding. Characters
are no longer UTF-8





   b) With mbstring.internal_encoding="UTF-8"

php -i returned the following:

mbstring.detect_order => no value => no value

mbstring.encoding_translation => Off => Off

mbstring.func_overload => 0 => 0

mbstring.http_input => pass => pass

mbstring.http_output => pass => pass

mbstring.http_output_conv_mimetypes => ^(text/|application/xhtml\+xml)
=> ^(text/|application/xhtml\+xml)

mbstring.internal_encoding => UTF-8 => UTF-8

mbstring.language => neutral => neutral

mbstring.script_encoding => no value => no value

mbstring.strict_detection => Off => Off

mbstring.substitute_character => no value => no value



Hexdump output:

00000000  73 74 72 69 6e 67 28 35  29 20 22 55 54 46 2d 38  |string(5)
"UTF-8|

00000010  22 0a 31 32 33 c3 a9 31  32 33 e2 82 ac 31 32 33 
|".123..123...123|

00000020



=> OK





   c) With var_dump(mb_internal_encoding('UTF-8')) on top of file (and
no changes in php.ini)

php -i : output == 2a.



00000000  62 6f 6f 6c 28 74 72 75  65 29 0a 73 74 72 69 6e 
|bool(true).strin|

00000010  67 28 35 29 20 22 55 54  46 2d 38 22 0a 31 32 33  |g(5)
"UTF-8".123|

00000020  e9 31 32 33 3f 31 32 33                           |.123?123|

00000028





=> In spite of the mb_internal_encoding(), the output is not utf-8.









Expected result:
----------------
MBString should not change the source file encoding if there is no
default internal_encoding specified by the user
(mbstring.internal_encoding => no value).



If this is expected, at least phpinfo() (php -i) should show to the user
that default internal_encoding is latin1 (ISO-8859-1).





Also, the mb_internal_encoding('UTF-8') function should work on current
file (see test 2c). User should not be forced to change
internal_encoding through php.ini (all users do not have access to it).



Actual result:
--------------
Versions tested with same behaviour : 

- PHP 5.3.0

- PHP 5.3.0 snap 200909230830

- PHP 5.2.3

- PHP 5.2.11


------------------------------------------------------------------------



-- 
Edit this bug report at http://bugs.php.net/bug.php?id=49638&edit=1

Reply via email to