regex help - only one value returned

2020-12-02 Thread Gary Stainburn
I have an array of regex expressions that I apply to text returned from tesseract. Each match that I get then gets stored for future processing. However, I'm struggling with one regex. The problem is that: 1) with brackets round the titles it returns two matches. 2) without brackets, it retu

Re: regex help - only one value returned

2020-12-02 Thread Vlado Keselj
Well, it seems that the first one is what you want, but you just need to use $1 and ignore $2. You do need parentheses in '(mr|mrs|miss|dr|prof|sir)' but if you do not want for them to be captured in $2, you can use: '(?:mr|mrs|miss|dr|prof|sir)'. For example: print "match3='$1' '$2'\n" if (

Re: regex help - only one value returned

2020-12-02 Thread Gary Stainburn
On 02/12/2020 13:56, Vlado Keselj wrote: Well, it seems that the first one is what you want, but you just need to use $1 and ignore $2. You do need parentheses in '(mr|mrs|miss|dr|prof|sir)' but if you do not want for them to be captured in $2, you can use: '(?:mr|mrs|miss|dr|prof|sir)'. For ex

Re: regex help - only one value returned

2020-12-02 Thread Jim Gibson
In your original example: print "match1='$1' '$2'\n" if ($T=~/^((mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi); print "match2='$1' '$2'\n" if ($T=~/^(mr|mrs|miss|dr|prof|sir .{5,}?)\n/smi); the interior parentheses in example one terminates the alternation, so the last string is ’sir’. In example two