From: strata_ranger at hotmail dot com Operating system: * PHP version: 5.2.10 PHP Bug Type: PCRE related Bug description: PREG_BAD_UTF8_ERROR should emit E_NOTICE
Description: ------------ This is not a PHP bug, but a suggestion that would help with troubleshooting PCRE calls in one's own PHP scripts. When using the /u modifier in PCRE, if the subject string contains an invalid Unicode sequence, this generates a PREG_BAD_UTF8_ERROR (which can be retrieved using preg_last_error() ). This is expected behavior for PCRE, but it should also emit an E_NOTICE to the user because it could indicate an error in their script (the definition of an E_NOTICE). Specifically, when using preg_replace() in an assignment context (i.e: $subject = preg_replace($foo, $bar, $subject) ), this can create situations where a PREG_BAD_UTF8_ERROR causes the subject string to be "erased" (re-assigned NULL) if the script author didn't take time to ensure that their subject string was valid utf-8 before calling preg_replace(). Even though it's the fault of the script author, the preg_* functions should still at least emit an E_NOTICE about bad UTF-8; it's a pain to hunt through one's proverbial 'miles of code' to figure out why one of their variables suddenly 'disappeared', without a file name or line number to start the troubleshooting by. Workarounds available in the meantime are: // As of PHP 5.3 // (unless the replacement yields string '0') $string = preg_replace(..., $string) ?: $string; // As of PHP 5.3 // Other workaround (any PHP version) $string = is_string($repl=preg_replace(..., $string))? $repl : string; Reproduce code: --------------- --- >From manual page: reference.pcre.pattern.modifiers --- error_reporting(-1); // Emit all errors $subject = "fa\xa0ade"; // Valid in ISO-8859-1 (but not UTF-8!) // Causes a PREG_BAD_UTF8_ERROR and sets $subject to NULL. // And we didn't make a copy of the original $subject. Oops! $subject = preg_replace('//u', '', $subject); var_dump($string); // NULL var_dump(preg_last_error()); --- Actual result: -------------- preg_replace() returns NULL; checking preg_last_error() verifies a PREG_BAD_UTF8_ERROR. No errors, warnings, or notices of any kind were generated. We did, however, immediately assign the preg_replace() back to $subject, so $subject is now NULL and has lost whatever data it originally contained. Even though this was obviously our fault, an E_NOTICE would have told us about it. -- Edit bug report at http://bugs.php.net/?id=49339&edit=1 -- Try a snapshot (PHP 5.2): http://bugs.php.net/fix.php?id=49339&r=trysnapshot52 Try a snapshot (PHP 5.3): http://bugs.php.net/fix.php?id=49339&r=trysnapshot53 Try a snapshot (PHP 6.0): http://bugs.php.net/fix.php?id=49339&r=trysnapshot60 Fixed in SVN: http://bugs.php.net/fix.php?id=49339&r=fixed Fixed in SVN and need be documented: http://bugs.php.net/fix.php?id=49339&r=needdocs Fixed in release: http://bugs.php.net/fix.php?id=49339&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=49339&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=49339&r=needscript Try newer version: http://bugs.php.net/fix.php?id=49339&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=49339&r=support Expected behavior: http://bugs.php.net/fix.php?id=49339&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=49339&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=49339&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=49339&r=globals PHP 4 support discontinued: http://bugs.php.net/fix.php?id=49339&r=php4 Daylight Savings: http://bugs.php.net/fix.php?id=49339&r=dst IIS Stability: http://bugs.php.net/fix.php?id=49339&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=49339&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=49339&r=float No Zend Extensions: http://bugs.php.net/fix.php?id=49339&r=nozend MySQL Configuration Error: http://bugs.php.net/fix.php?id=49339&r=mysqlcfg