ID: 35711 Updated by: [EMAIL PROTECTED] Reported By: matteo at beccati dot com -Status: Assigned +Status: Feedback Bug Type: mbstring related Operating System: Debian GNU/Linux PHP Version: 5.1.1 Assigned To: moriyoshi New Comment:
Please provide any patches in unified diff format. (like the first one). And downloadable somewhere. Previous Comments: ------------------------------------------------------------------------ [2005-12-17 10:13:11] matteo at beccati dot com Improved patch to also work with mbstring.encoding_translation enabled. Index: ext/mbstring/libmbfl/mbfl/mbfilter.c =================================================================== RCS file: /repository/php-src/ext/mbstring/libmbfl/mbfl/mbfilter.c,v retrieving revision 1.7.2.1 diff -r1.7.2.1 mbfilter.c 443c443 < if (!filter->flag) { --- > if (!filter->flag && !filter->status) { 447a448,458 > > if (encoding == mbfl_no_encoding_invalid) { > n = identd->filter_list_size - 1; > while (n >= 0) { > filter = identd->filter_list[n]; > if (!filter->flag) { > encoding = filter->encoding->no_encoding; > } > n--; > } > } 578c589 < if (!filter->flag) { --- > if (!filter->flag && !filter->status) { 583a595,604 > if (!encoding) { > for (i = 0; i < num; i++) { > filter = &flist[i]; > if (!filter->flag) { > encoding = filter->encoding; > break; > } > } > } > ------------------------------------------------------------------------ [2005-12-16 23:50:13] matteo at beccati dot com I've made a patch which seems to fix the issue. It basicly checks filter status during judgement. Status seems to be != 0 only when it is matching a multibyte character. I added anyway a fallback to the old judgement routine, just in case no matching encoding is found. Index: ext/mbstring/libmbfl/mbfl/mbfilter.c =================================================================== RCS file: /repository/php-src/ext/mbstring/libmbfl/mbfl/mbfilter.c,v retrieving revision 1.7.2.1 diff -u -r1.7.2.1 mbfilter.c --- ext/mbstring/libmbfl/mbfl/mbfilter.c 5 Nov 2005 04:49:57 -0000 1.7.2.1 +++ ext/mbstring/libmbfl/mbfl/mbfilter.c 16 Dec 2005 22:46:26 -0000 @@ -575,12 +575,22 @@ for (i = 0; i < num; i++) { filter = &flist[i]; - if (!filter->flag) { + if (!filter->flag && !filter->status) { encoding = filter->encoding; break; } } + if (!encoding) { + for (i = 0; i < num; i++) { + filter = &flist[i]; + if (!filter->flag) { + encoding = filter->encoding; + break; + } + } + } + /* cleanup */ /* dtors should be called in reverse order */ i = num; while (--i >= 0) { ------------------------------------------------------------------------ [2005-12-16 17:51:13] [EMAIL PROTECTED] Moriyoshi, if ext/mbstring is not maintained anymore, please let us know. ------------------------------------------------------------------------ [2005-12-16 17:18:27] matteo at beccati dot com Description: ------------ I was evaluating the mbstring extension because of its capabilities to filter and convert input parameter to the correct encoding. During my test I found out that an ISO-8859-1 string which ends with an an accented character is wrongly detected as UTF-8, even if it ends with an incomplete multibyte character (using iconv to convert the string raises such notice). Also reproduced with PHP 4.3.11 on FreeBSD 4 and 5.0.2 on Win32. Reproduce code: --------------- <?php error_reporting(E_ALL); mb_detect_order('ASCII,UTF-8,ISO-8859-1'); // \xE0 is ISO-8859-1 small a grave char test_bug("Test: \xE0"); test_bug("Test: \xE0a"); function test_bug($s) { echo "Trying: "; var_dump($s); iconv('UTF8', 'UCS2', $s); echo "Detected encoding: ".mb_detect_encoding($s)."\n"; echo "Converted string:"; var_dump(mb_convert_encoding($s, 'UTF-8', 'ASCII,UTF-8,ISO-8859-1')); echo "\n"; } ?> Expected result: ---------------- Trying: string(7) "Test: à" Notice: iconv(): Detected an incomplete multibyte character in input string in test.php on line 13 Detected encoding: ISO-8859-1 Converted string:string(8) "Test: Ã " Trying: string(8) "Test: àa" Notice: iconv(): Detected an illegal character in input string in /var/www/mbstring/test.php on line 13 Detected encoding: ISO-8859-1 Converted string:string(9) "Test: Ã a" Actual result: -------------- Trying: string(7) "Test: à" Notice: iconv(): Detected an incomplete multibyte character in input string in test.php on line 13 Detected encoding: UTF-8 Converted string:string(6) "Test: " Trying: string(8) "Test: àa" Notice: iconv(): Detected an illegal character in input string in /var/www/mbstring/test.php on line 13 Detected encoding: ISO-8859-1 Converted string:string(9) "Test: Ã a" ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=35711&edit=1