Which PHP version do you use? And from which repository?

AFAIR we ship PHP 5.1 in RHEL5, which shouldn't be sufficient for statusnet.

Thus you must be using some PHP from another repo - Remi? EPEL? Because this 
might be worth discussing with the repo manager/PHP maintainer for that 
repo?

Jan

----- Original Message -----
From: [email protected] 
<[email protected]>
To: [email protected] <[email protected]>
Sent: Sun Mar 21 04:29:40 2010
Subject: [StatusNet-dev] PCRE without unicode support on RHEL/CentOS5 
breakshashtagsHi all,

After several hours of tearing my hair out, I've discovered something
interesting...

*The regex which matches hashtags won't work on standard RHEL/CentOS5,
because PCRE is complied without unicode support.*

Background
--------------

I was trying to work out why my fresh 0.9.0 install on CentOS5.4 didn't
seem to detect my hash tags. I'd post notices with plenty of hash tags,
and none of them would be detected. My notice_tags table was empty.

I tracked the problem down to this code in Classes/Notice.php:

     /**
      * Extract #hashtags from this notice's content and save them to
the database.
      */
     function saveTags()
     {
         /* extract all #hastags */
         $count = preg_match_all('/(?:^|\s)#([\pL\pN_\-\.]{1,64})/',
strtolower($this->content), $match);
         if (!$count) {
             return true;
         }

The problem is that the UTF-8 / unicode regex characters "\pL" and "\pN"
are not matched, but no error is thrown.

When building a simpler regex using these characters, I got an error:

     PHP Warning:  preg_match_all(): Compilation failed: support for \P,
\p, and \X has not been compiled

I discovered this page
(http://gaarai.com/2009/01/31/unicode-support-on-centos-52-with-php-and-pcre/)
which details how to rebuild PCRE with unicode support, and after doing
so, my hash tags are working perfectly.

At this point, it doesn't look like this will be fixed upstream until
the next major (6) release of RHEL/CentOS
(https://bugzilla.redhat.com/show_bug.cgi?id=457064)

Solution
----------
In the interim, should this code be augmented with a non-utf-8 pattern
match, so that at least standard ascii hashtags will work?

         /* extract all #hastags */
         $count = preg_match_all('/(?:^|\s)#([\pL\pN_\-\.]{1,64})/',
strtolower($this->content), $match);
         if (!$count) {
                 $count_without_utf8 =
preg_match_all('/(?:^|\s)#([a-z0-9_\-\.]{1,64})/',
strtolower($this->content), $match);
                 if (!$count_without_utf8) {
                      return true;
                 }
         }

Comments? :)

D



_______________________________________________
StatusNet-dev mailing list
[email protected]
http://lists.status.net/mailman/listinfo/statusnet-dev
_______________________________________________
StatusNet-dev mailing list
[email protected]
http://lists.status.net/mailman/listinfo/statusnet-dev

Reply via email to