ID: 44014 Comment by: d_kelsey at uk dot ibm dot com Reported By: michael202 at gmx dot de Status: No Feedback Bug Type: mbstring related Operating System: Win XP PHP Version: 5.2.5 Assigned To: hirokawa New Comment:
My understanding of UTF-16 is that the BOM is a mandatory. For mbstring I have found that if I input a UTF-16 string for conversion in mb_convert_encoding for example to UTF-8, it treats the BOM as UTF-16 data and converts it. MBString doesn't generate the BOM when converting to UTF-16, so as I thought the BOM was mandatory, it isn't generating valid UTF-16 bytes. I see that MBString uses UTF-16BE effectively when you specify UTF-16. If mbstring doesn't support BOM then UTF-16 cannot be handled properly. Should this at least be documented and recommend considering using UTF-16BE as the encoding so that you are explicit in what is supportable ? Previous Comments: ------------------------------------------------------------------------ [2008-02-24 01:00:00] php-bugs at lists dot php dot net No feedback was provided for this bug for over a week, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open". ------------------------------------------------------------------------ [2008-02-16 12:17:13] [EMAIL PROTECTED] BOM of Unicode is not supported by encoding conversion function in mbstring. And big endian is default in UTF-16. Please specify 'UTF-16LE' if you need to specify little endian format. Try, <?php $utf16 = chr(0).chr(0x4d).chr(0).chr(0x6f); //'Mo' $utf8 = mb_convert_encoding($utf16, 'UTF-8', 'UTF-16'); echo($utf8 . "\n"); // -> Mo ?> or <?php $utf16 = chr(0x4d).chr(0).chr(0x6f).chr(0); //'Mo' $utf8 = mb_convert_encoding($utf16, 'UTF-8', 'UTF-16LE'); echo($utf8 . "\n"); // -> Mo ?> ------------------------------------------------------------------------ [2008-02-05 05:10:37] [EMAIL PROTECTED] Assigned to the mbstring maintainer. ------------------------------------------------------------------------ [2008-02-01 12:08:07] michael202 at gmx dot de Description: ------------ mb_convert_encoding 'destroys' first character when converting from UTF16 to UTF8 (iconv works). Reproduce code: --------------- $utf16 = chr(0xFF).chr(0xFE).chr(0x4d).chr(0).chr(0x6f).chr(0); //'Mo' $utf8 = mb_convert_encoding($utf16, 'UTF-8', 'UTF-16'); echo($utf8 . "\n"); // -> ´++´¢ìo $utf8 = iconv('UTF-16', 'UTF-8', $utf16); echo($utf8 . "\n"); // -> Mo Expected result: ---------------- mb: (BOM8)Mo iconv: Mo (BOM8) is a placeholder Actual result: -------------- mb: (BOM8)´¢ìo (copied from cmd shell) iconv: Mo (BOM8) is a placeholder ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=44014&edit=1