Which PHP version do you use? And from which repository? AFAIR we ship PHP 5.1 in RHEL5, which shouldn't be sufficient for statusnet.
Thus you must be using some PHP from another repo - Remi? EPEL? Because this might be worth discussing with the repo manager/PHP maintainer for that repo? Jan ----- Original Message ----- From: [email protected] <[email protected]> To: [email protected] <[email protected]> Sent: Sun Mar 21 04:29:40 2010 Subject: [StatusNet-dev] PCRE without unicode support on RHEL/CentOS5 breakshashtagsHi all, After several hours of tearing my hair out, I've discovered something interesting... *The regex which matches hashtags won't work on standard RHEL/CentOS5, because PCRE is complied without unicode support.* Background -------------- I was trying to work out why my fresh 0.9.0 install on CentOS5.4 didn't seem to detect my hash tags. I'd post notices with plenty of hash tags, and none of them would be detected. My notice_tags table was empty. I tracked the problem down to this code in Classes/Notice.php: /** * Extract #hashtags from this notice's content and save them to the database. */ function saveTags() { /* extract all #hastags */ $count = preg_match_all('/(?:^|\s)#([\pL\pN_\-\.]{1,64})/', strtolower($this->content), $match); if (!$count) { return true; } The problem is that the UTF-8 / unicode regex characters "\pL" and "\pN" are not matched, but no error is thrown. When building a simpler regex using these characters, I got an error: PHP Warning: preg_match_all(): Compilation failed: support for \P, \p, and \X has not been compiled I discovered this page (http://gaarai.com/2009/01/31/unicode-support-on-centos-52-with-php-and-pcre/) which details how to rebuild PCRE with unicode support, and after doing so, my hash tags are working perfectly. At this point, it doesn't look like this will be fixed upstream until the next major (6) release of RHEL/CentOS (https://bugzilla.redhat.com/show_bug.cgi?id=457064) Solution ---------- In the interim, should this code be augmented with a non-utf-8 pattern match, so that at least standard ascii hashtags will work? /* extract all #hastags */ $count = preg_match_all('/(?:^|\s)#([\pL\pN_\-\.]{1,64})/', strtolower($this->content), $match); if (!$count) { $count_without_utf8 = preg_match_all('/(?:^|\s)#([a-z0-9_\-\.]{1,64})/', strtolower($this->content), $match); if (!$count_without_utf8) { return true; } } Comments? :) D _______________________________________________ StatusNet-dev mailing list [email protected] http://lists.status.net/mailman/listinfo/statusnet-dev _______________________________________________ StatusNet-dev mailing list [email protected] http://lists.status.net/mailman/listinfo/statusnet-dev
