Edit report at http://bugs.php.net/bug.php?id=52971&edit=1

 ID:                 52971
 User updated by:    marc dot bennewitz at giata dot de
 Reported by:        marc dot bennewitz at giata dot de
 Summary:            PCRE-Meta-Characters not working with utf-8
 Status:             Closed
 Type:               Bug
 Package:            PCRE related
 Operating System:   Linux
 PHP Version:        5.3.3
 Assigned To:        felipe
 Block user comment: N

 New Comment:

now it works fine :)


Previous Comments:
[2010-10-03 18:02:19] fel...@php.net

This bug has been fixed in SVN.

Snapshots of the sources are packaged every three hours; this change
will be in the next snapshot. You can grab the snapshot at
Thank you for the report, and for helping us make PHP better.

In the last version of PCRE was added a flag PCRE_UCP, as states the

"In  PCRE,  by  default, \d, \D, \s, \S, \w, and \W recognize only ASCII
characters, even in UTF-8 mode. However, this can be changed by setting
the PCRE_UCP option."

Setting the flag we got:

array(1) {


  array(1) {


    array(2) {


      string(6) "Wasser"






array(1) {


  array(1) {


    array(2) {


      string(7) " Wasser"






[2010-10-03 18:01:40] fel...@php.net

Automatic comment from SVN on behalf of felipe
Revision: http://svn.php.net/viewvc/?view=revision&revision=303963
Log: - Fixed bug #52971 (PCRE-Meta-Characters not working with utf-8)
#   In  PCRE,  by  default, \d, \D, \s, \S, \w, and \W recognize only
#       characters, even in UTF-8 mode. However, this can be changed by
#       the PCRE_UCP option.

[2010-10-03 11:02:15] cataphr...@php.net

I'm reopening as there's indeed a different behavior in Windows that I
can't yet quite explain,

[2010-10-03 10:21:34] marc dot bennewitz at giata dot de

There are some problems with it:

1. On windows it works as expected

2. With Unicode properties there is no word boundary (\w \W)

3. With the modifier "u" php knows that the subject is UTF-8

4. http://php.net/manual/regexp.reference.escape.php there is no note
for UTF-8 incompatibility

php.exe -i



iconv support => enabled

iconv implementation => "libiconv"

iconv library version => 1.11

Directive => Local Value => Master Value

iconv.input_encoding => ISO-8859-1 => ISO-8859-1

iconv.internal_encoding => ISO-8859-1 => ISO-8859-1

iconv.output_encoding => ISO-8859-1 => ISO-8859-1



PCRE (Perl Compatible Regular Expressions) Support => enabled

PCRE Library Version => 8.02 2010-03-19

Directive => Local Value => Master Value

pcre.backtrack_limit => 100000 => 100000

pcre.recursion_limit => 100000 => 100000


[2010-10-02 20:26:05] cataphr...@php.net

This is by design, it's the way \b and \w are defined in PCRE.

You'll have to use another strategy, like look behind and unicode
character properties.


The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at


Edit this bug report at http://bugs.php.net/bug.php?id=52971&edit=1

Reply via email to