Hi Team,
I am trying to extract data between two annotated tags which can be present
in different lines and can have other tags in between them.
I have tried many approaches but none of them worked perfectly.

Sample Input :

Seller Name  FirstAvenue Mortgage, Contact Name John
             TN 12230              Contact Title Supervisor


Code :

TYPESYSTEM utils.PlainTextTypeSystem;
ENGINE utils.PlainTextAnnotator;

DECLARE Keyword (STRING label);
DECLARE Entry(Keyword keyword);

DECLARE Keyword SellerNameKeyword, SellerNameContextBlocker, ContactNameKeyword;

EXEC(PlainTextAnnotator, {Line,Paragraph});

ADDRETAINTYPE(WS);
Line{->TRIM(WS)};
Paragraph{->TRIM(WS)};
REMOVERETAINTYPE(WS);

"Seller Name" -> SellerNameKeyword ( "label" = "Seller Name");
"Contact Title" -> SellerNameContextBlocker("label" = "Seller Name
Context Blocker");
"Contact Name" -> ContactNameKeyword("label"= "Contact Name");

DECLARE Entity (STRING label, STRING value);
DECLARE Entity ContactName, SellerName;

BLOCK(line1) Line{CONTAINS(ContactNameKeyword)} {
    ContactNameKeyword c:#{-PARTOF(ContactName)->
CREATE(ContactName,"label"="Contact Name", "value"=c.ct)};
}
SellerNameKeyword
c:#{-PARTOF(ContactNameKeyword),-PARTOF(SellerNameContextBlocker),-PARTOF(ContactName)
->
    CREATE(SellerName,"label"="Seller Name", "value"=c.ct)}
SellerNameContextBlocker;


Output : FirstAvenue Mortgage, Contact Name John TN 12230

Expected Output : FirstAvenue Mortgage, TN 12230

Why -PARTOF tag is not working with #? I tried ANY+ but that didnt work too.
>
I think ANY tags all possible tokens while # tags everything. Do tell how
usage of # and ANY differs?
I have already posted query on
stackoverflow(https://stackoverflow.com/questions/58830986/get-text-between-two-annotated-tags-in-ruta)

<https://stackoverflow.com/questions/58830986/get-text-between-two-annotated-tags-in-ruta>
Please suggest how this could be achieved or any other better approach.

Thanks and Regards,
Shashank

Reply via email to