*When adding debug to source like this: *
* if (exists $rule->{text}) { next unless $info->{anchor_text};
my($op,$patt,$neg) = @{$rule->{text}}; my $match; for my $text
(@{ $info->{anchor_text} }) { if ( ($op eq '=~' && $text =~ $patt)
|| ($op eq '!~' && $text !~ $patt) ) { dbg("uri:
Match found: text:%s matches the pattern:%s with operator:%s", $text,
$patt, $op); $match = $text; last ; } else {
dbg("uri: Not match: text:%s not matches the pattern:%s with
operator:%s", $text, $patt, $op); } } if ( $neg ) {
next if defined $match; dbg("uri: text negative matched: %s
/%s/", $op,$patt); } else { next unless defined $match;
dbg("uri: text matched: '%s' %s /%s/", $match,$op,$patt); } }*
*and the debug output as:*
dbg: uri: Not match:
text:\x{E0}\x{B8}\x{95}\x{E0}\x{B9}\x{88}\x{E0}\x{B8}\x{AD}\x{E0}\x{B8}\x{AD}\x{E0}\x{B8}\x{B2}\x{E0}\x{B8}\x{A2}\x{E0}\x{B8}\x{B8}\x{E0}\x{B8}\x{97}\x{E0}\x{B8}\x{B1}\x{E0}\x{B8}\x{99}\x{E0}\x{B8}\x{97}\x{E0}\x{B8}\x{B5}
not matches the
pattern:(?^aa:\\\\x\\{E0\\}\\\\x\\{B8\\}\\\\x\\{97\\}\\\\x\\{E0\\}\\\\x\\{B8\\}\\\\x\\{B1\\}\\\\x\\{E0\\}\\\\x\\{B8\\}\\\\x\\{99\\}\\\\x\\{E0\\}\\\\x\\{B8\\}\\\\x\\{97\\}\\\\x\\{E0\\}\\\\x\\{B8\\}\\\\x\\{B5\\})
with operator:=~
On Sun, Feb 2, 2025 at 1:57 PM John Hardin <[email protected]> wrote:
> On Sun, 2 Feb 2025, Jimmy wrote:
>
> > Hello,
> >
> > I am experiencing difficulties creating a rule to match UTF-8 anchor text
> > using the plugin, and I suspect there might be a bug related to UTF-8
> > matching.
> >
> > For example, I attempted to use the following rule:
> >
> > uri_detail UNICODE_LINK_TEXT text =~
> >
> /\\x{E0}\\x{B8}\\x{97}\\x{E0}\\x{B8}\\x{B1}\\x{E0}\\x{B8}\\x{99}\\x{E0}\\x{B8}\\x{97}\\x{E0}\\x{B8}\\x{B5}/
>
> ...do you alwo need to escape the curlies?
>
> /\\x\{E0\}\\x\{B8\} etc...
>
>
> --
> John Hardin KA7OHZ http://www.impsec.org/~jhardin/
> [email protected] pgpk -a [email protected]
> key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
> -----------------------------------------------------------------------
> Are you a mildly tech-literate politico horrified by the level of
> ignorance demonstrated by lawmakers gearing up to regulate online
> technology they don't even begin to grasp? Cool. Now you have a
> tiny glimpse into a day in the life of a gun owner. -- Sean Davis
> -----------------------------------------------------------------------
> Today: the 22nd anniversary of the loss of STS-107 Columbia
>