ID:               35711
 Updated by:       [EMAIL PROTECTED]
 Reported By:      matteo at beccati dot com
-Status:           Analyzed
+Status:           Wont fix
 Bug Type:         mbstring related
 Operating System: Debian GNU/Linux
 PHP Version:      5.1.1
 Assigned To:      hirokawa


Previous Comments:
------------------------------------------------------------------------

[2005-12-20 15:44:31] [EMAIL PROTECTED]

Please note that encoding detection is not always perfect.
Especially, when the string is too short, the wrong detection might be
caused.
In your case, it is not a bug, but it is the specification.
UTF-8 is a variable length multibyte encoding format,
the length of a character in UTF-8 is from one to six.
Please look at ext/mbstring/libmbfl/filter/mbfilter_utf8.c:about 249L.
0xe8 is a valid byte sequence as the 1st character of 3 byte code.
We cannot detect 0xe8 is ISO-8859-1 or UTF-8,
because this byte is valid in both encodings.
In this case, the response will be choose 
from the order defined by mb_detect_order().
I suggest to use the sufficient length of string
for the reliable encoding detection.











------------------------------------------------------------------------

[2005-12-19 09:03:36] [EMAIL PROTECTED]

Rui, can you check this out please?

------------------------------------------------------------------------

[2005-12-19 09:00:50] matteo at beccati dot com

Oops, I just realized that I forgot the -u flag :)

Here is the downlaodable patch:

http://beccati.com/download/mbstring-patch-20051219.txt

------------------------------------------------------------------------

[2005-12-19 08:48:47] [EMAIL PROTECTED]

Please provide any patches in unified diff format. (like the first
one). And downloadable somewhere.

------------------------------------------------------------------------

[2005-12-16 23:50:13] matteo at beccati dot com

I've made a patch which seems to fix the issue. It basicly checks
filter status during judgement. Status seems to be != 0 only when it is
matching a multibyte character. I added anyway a fallback to the old
judgement routine, just in case no matching encoding is found.

Index: ext/mbstring/libmbfl/mbfl/mbfilter.c
===================================================================
RCS file: /repository/php-src/ext/mbstring/libmbfl/mbfl/mbfilter.c,v
retrieving revision 1.7.2.1
diff -u -r1.7.2.1 mbfilter.c
--- ext/mbstring/libmbfl/mbfl/mbfilter.c        5 Nov 2005 04:49:57
-0000      1.7.2.1
+++ ext/mbstring/libmbfl/mbfl/mbfilter.c        16 Dec 2005 22:46:26
-0000
@@ -575,12 +575,22 @@

        for (i = 0; i < num; i++) {
                filter = &flist[i];
-               if (!filter->flag) {
+               if (!filter->flag && !filter->status) {
                        encoding = filter->encoding;
                        break;
                }
        }

+       if (!encoding) {
+               for (i = 0; i < num; i++) {
+                       filter = &flist[i];
+                       if (!filter->flag) {
+                               encoding = filter->encoding;
+                               break;
+                       }
+               }
+       }
+
        /* cleanup */
        /* dtors should be called in reverse order */
        i = num; while (--i >= 0) {

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/35711

-- 
Edit this bug report at http://bugs.php.net/?id=35711&edit=1

Reply via email to