ID:               35711
 Updated by:       [EMAIL PROTECTED]
 Reported By:      matteo at beccati dot com
-Status:           Assigned
+Status:           Feedback
 Bug Type:         mbstring related
 Operating System: Debian GNU/Linux
 PHP Version:      5.1.1
 Assigned To:      moriyoshi
 New Comment:

Please provide any patches in unified diff format. (like the first
one). And downloadable somewhere.


Previous Comments:
------------------------------------------------------------------------

[2005-12-17 10:13:11] matteo at beccati dot com

Improved patch to also work with mbstring.encoding_translation
enabled.


Index: ext/mbstring/libmbfl/mbfl/mbfilter.c
===================================================================
RCS file: /repository/php-src/ext/mbstring/libmbfl/mbfl/mbfilter.c,v
retrieving revision 1.7.2.1
diff -r1.7.2.1 mbfilter.c
443c443
<                       if (!filter->flag) {
---
>                       if (!filter->flag && !filter->status) {
447a448,458
>
>               if (encoding == mbfl_no_encoding_invalid) {
>                       n = identd->filter_list_size - 1;
>                       while (n >= 0) {
>                               filter = identd->filter_list[n];
>                               if (!filter->flag) {
>                                       encoding =
filter->encoding->no_encoding;
>                               }
>                               n--;
>                       }
>               }
578c589
<               if (!filter->flag) {
---
>               if (!filter->flag && !filter->status) {
583a595,604
>       if (!encoding) {
>               for (i = 0; i < num; i++) {
>                       filter = &flist[i];
>                       if (!filter->flag) {
>                               encoding = filter->encoding;
>                               break;
>                       }
>               }
>       }
>

------------------------------------------------------------------------

[2005-12-16 23:50:13] matteo at beccati dot com

I've made a patch which seems to fix the issue. It basicly checks
filter status during judgement. Status seems to be != 0 only when it is
matching a multibyte character. I added anyway a fallback to the old
judgement routine, just in case no matching encoding is found.

Index: ext/mbstring/libmbfl/mbfl/mbfilter.c
===================================================================
RCS file: /repository/php-src/ext/mbstring/libmbfl/mbfl/mbfilter.c,v
retrieving revision 1.7.2.1
diff -u -r1.7.2.1 mbfilter.c
--- ext/mbstring/libmbfl/mbfl/mbfilter.c        5 Nov 2005 04:49:57
-0000      1.7.2.1
+++ ext/mbstring/libmbfl/mbfl/mbfilter.c        16 Dec 2005 22:46:26
-0000
@@ -575,12 +575,22 @@

        for (i = 0; i < num; i++) {
                filter = &flist[i];
-               if (!filter->flag) {
+               if (!filter->flag && !filter->status) {
                        encoding = filter->encoding;
                        break;
                }
        }

+       if (!encoding) {
+               for (i = 0; i < num; i++) {
+                       filter = &flist[i];
+                       if (!filter->flag) {
+                               encoding = filter->encoding;
+                               break;
+                       }
+               }
+       }
+
        /* cleanup */
        /* dtors should be called in reverse order */
        i = num; while (--i >= 0) {

------------------------------------------------------------------------

[2005-12-16 17:51:13] [EMAIL PROTECTED]

Moriyoshi, if ext/mbstring is not maintained anymore, please let us
know.

------------------------------------------------------------------------

[2005-12-16 17:18:27] matteo at beccati dot com

Description:
------------
I was evaluating the mbstring extension because of its capabilities to
filter and convert input parameter to the correct encoding. During my
test I found out that an ISO-8859-1 string which ends with an an
accented character is wrongly detected as UTF-8, even if it ends with
an incomplete multibyte character (using iconv to convert the string
raises such notice).

Also reproduced with PHP 4.3.11 on FreeBSD 4 and 5.0.2 on Win32.


Reproduce code:
---------------
<?php

error_reporting(E_ALL);
mb_detect_order('ASCII,UTF-8,ISO-8859-1');

// \xE0 is ISO-8859-1 small a grave char
test_bug("Test: \xE0");
test_bug("Test: \xE0a");

function test_bug($s) {
    echo "Trying: ";
    var_dump($s);
    iconv('UTF8', 'UCS2', $s);
    echo "Detected encoding: ".mb_detect_encoding($s)."\n";
    echo "Converted string:";
    var_dump(mb_convert_encoding($s, 'UTF-8',
        'ASCII,UTF-8,ISO-8859-1'));
    echo "\n";
}

?>

Expected result:
----------------
Trying: string(7) "Test: à"

Notice: iconv(): Detected an incomplete multibyte character in input
string in test.php on line 13
Detected encoding: ISO-8859-1
Converted string:string(8) "Test: Ã "

Trying: string(8) "Test: àa"

Notice: iconv(): Detected an illegal character in input string in
/var/www/mbstring/test.php on line 13
Detected encoding: ISO-8859-1
Converted string:string(9) "Test: Ã a"


Actual result:
--------------
Trying: string(7) "Test: à"

Notice: iconv(): Detected an incomplete multibyte character in input
string in test.php on line 13
Detected encoding: UTF-8
Converted string:string(6) "Test: "

Trying: string(8) "Test: àa"

Notice: iconv(): Detected an illegal character in input string in
/var/www/mbstring/test.php on line 13
Detected encoding: ISO-8859-1
Converted string:string(9) "Test: Ã a"



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=35711&edit=1

Reply via email to