ID: 33898 User updated by: feldgendler at mail dot ru Reported By: feldgendler at mail dot ru -Status: Bogus +Status: Open Bug Type: Filesystem function related Operating System: Debian GNU/Linux i686 PHP Version: 5.0.4 New Comment:
A message in that bug says "Sorry, but this is not supported yet. You'll have to wait for PHP that supports unicode." What do you mean? Doesn't PHP 5.0.4, with all its multi-byte capabilities, support Unicode? I've searched the bug database and found that there were similar bugs (30105, 30014, 28981) that are currently in "No feedback", not "Bogus" state. Why is this bug Bogus? And last, what's wrong with my proposed modification? Doesn't it fix the bug? Previous Comments: ------------------------------------------------------------------------ [2005-07-28 11:31:26] [EMAIL PROTECTED] See bug #33260. ------------------------------------------------------------------------ [2005-07-28 11:15:46] feldgendler at mail dot ru I've explored the source code of php_basename() function, and here is what I found: In case of a multi-byte character (inc_len > 1) that immediately follows a slash, state is not changed to 1 because that code is skipped. The following code: if (state == 0) { comp = c; state = 1; } ...needs to be inserted to the point marked below: while (cnt > 0) { inc_len = (*c == '\0' ? 1: php_mblen(c, cnt)); switch (inc_len) { case -2: case -1: inc_len = 1; php_mblen(NULL, 0); break; case 0: goto quit_loop; case 1: #if defined(PHP_WIN32) || defined(NETWARE) if (*c == '/' || *c == '\\') { #else if (*c == '/') { #endif if (state == 1) { state = 0; cend = c; } } else { if (state == 0) { comp = c; state = 1; } } default: -- HERE IT GOES --> break; } c += inc_len; cnt -= inc_len; } Can I expect that this bug will be fixed in CVS? ------------------------------------------------------------------------ [2005-07-28 10:59:54] feldgendler at mail dot ru Description: ------------ The source code in my testcase is in UTF-8 encoding itself. The quoted string contains Cyrillic letters. If I save the source code in KOI8-R (single-byte) Cyrillic encoding, and change the second argument to setlocale() to "ru_RU.KOI8-R", the observed result is what I expect. This shows that the bug only occurs on multi-byte characters, because in KOI8-R all characters are single-byte. Relevant PHP configuration options: --enable-mbstring=all (--enable-zend-multibyte was not specified) Relevant environment variables: LANG=en_US.UTF-8 (LC_* are not set) Reproduce code: --------------- <?php setlocale(LC_CTYPE, "en_US.UTF-8"); echo basename("english/ÒÕÓÓËÉÊ"); ?> Expected result: ---------------- ÒÕÓÓËÉÊ Actual result: -------------- english ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=33898&edit=1