Re: question on REGEXP in Ruta

Peter Klügl Mon, 08 Feb 2016 01:49:17 -0800

Hi,

capturing groups are not supported by the REGEXP condition since it is
essentially just a boolean function and cannot transfer its internal
information to an action which creates annotations. However, there are
many other ways to solve it.


There is maybe a problem with your regexp. I changed it to ".*(?:no|No)
(.*)" in the following.

You can, for example, use the simple regexp rule and restrict its
matching context to each line:

... with a BLOCK:
BLOCK(eachLine) Line{}{
    ".*(?:no|No) (.*)" -> Rule1NoPattern, 1=Group1;
}

... with an inlined rule:
Line->{".*(?:no|No) (.*)" -> Rule1NoPattern, 1=Group1;};

Some additional comments:

You should mention the type Line in the EXEC action for reindexing, if
you want to use these annotations in the following rules:
Document{-> EXEC(PlainTextAnnotator, {Line})};
For your rules, it does not make a difference, but if you use other
conditions like PARTOF, it will not work correctly. 

>From my experience, I'd recommend to work directly with annotations
instead of regexes for detecting the target of a negation. Then, you can
refactor the rules more easily, e.g., if you have a rule like
Line->{PrefixNegationInd #{-> Group1};}; you can replace the wildcard
with something better in future like ChunkNP. (I just wanted to mention
it. I know that your example was probably just an example to describe
the problem with ruta.)

Best,

Peter

Am 08.02.2016 um 00:37 schrieb Bonnie MacKellar:
> Hi,
>
> I am trying to write RUTA rules using regular expressions and capturing
> groups. I want the matches to be line by line. I can do this using the
> following script
>
> ENGINE utils.PlainTextAnnotator;
> TYPESYSTEM utils.PlainTextTypeSystem;
> Document{-> RETAINTYPE(BREAK)};
> Document{-> EXEC(PlainTextAnnotator)};
> DECLARE Rule1NoPattern, Group1, Group2;
> Line{REGEXP(".*no|No (.*)") -> Rule1NoPattern};
>
> Given this text
> Not pregnant or nursing
> Fertile patients must use effective contraception (hormonal contraception
> or intra-uterine device [IUD])
> No concurrent participation in another clinical trial that would preclude
> the interventions or outcome assessment of this clinical trial
> No other concurrent anticancer therapy
>
> it correctly matches the last two lines and annotates them with
> Rule1NoPattern
> The problem is, I want to use the capturing group information as well. I
> can do this using the simple regular expression syntax
> ".*no|No (.*)\n|S" -> Rule1NoPattern, 1=Group1;
>
> if I just give it one line, say
> No other concurrent anticancer therapy
>
> it will correctly annotate the entire line with Rule1NoPattern, and "other
> concurrent anticancer therapy" wll be annotated with Group1.
> Is there a way, using the first rule variant
> Line{REGEXP(".*no|No (.*)") -> Rule1NoPattern};
>
> to annotate the text in capturing group?
>
> I have tried all kinds of syntax, but none of it seems to be correct
>
> thanks,
> Bonnie MacKellar
>

Re: question on REGEXP in Ruta

Reply via email to