Re: RUTA: Copy features into new annotation

Erik Fäßler Sun, 10 Jan 2021 23:14:00 -0800

Hello Peter,

thank you again that you put so much thought it in.
I am a bit embarrassed to say that I already had the solution in my script when 
I just opened Eclipse again. I think I just didn’t really try it because I 
didn’t expect it to work.
This works now, thank you!


In order to better understand my case, here some details:
My type system is indeed the JCoRe TS.
And I am not working with Person annotations but with Organism mentions, but I 
wanted to keep things simple. Organism mentions are extended from 
ConceptMentions:
https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-semantics-mention-types.xml#L125
 
<https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-semantics-mention-types.xml#L125>

Those have the “resourceEntryList” feature which is an FSArray of ResourceEntry 
instances:
https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-basic-types.xml#L44
 
<https://github.com/JULIELab/jcore-base/blob/90b4e56d80d774add1997db29f4eb27d6b394309/jcore-types/src/main/resources/de/julielab/jcore/types/jcore-basic-types.xml#L44>

The ResourceEntry, finally, has a feature named “entryId”.

The entryIds are set in a separate annotator (JCoRe Linneaus annotator). And my 
goal is to connect multiple mentions of Organisms ("mouse and human”) into a 
single expression for a downstream annotator that is checking the Organism 
mentions directly in front of gene mentions. However, in the example “mouse and 
human” it would always detect “human” but disregard “mouse”. So I thought I 
would create new annotations to “merge” the originals.

Is this how you would do it? Alternatively, I could also have merged the two 
existing Organism annotations. I would even prefer that. But I would not know 
how to organize this so that, in the end, instead of two single Organism 
annotations with two resourceEntries there would be only one Organism 
annotation with both resourceEntries.

So actually, there is one step missing now: I need to replace merged Organism 
entries with the covering OrganismEnumeration (Person and PersonEnumeration in 
my example).
Is there a way to do this better in RUTA? I have to say that I have not yet 
fully penetrated the syntax, I would have not been able to come up with the
// collect ids of all covered Persons using a extra list
STRINGLIST ids;
pe:PersonEnumeration{-> pe.personIds = ids}
    <-{p:Person{-> ADD(ids,p.ids.personId)};};

construction so this enumeration-annotation-merging might actually be easy and 
I just don’t see it.

Thank you so much!

Erik

> On 10. Jan 2021, at 16:21, Peter Klügl <[email protected]> wrote:
> 
> Hi,
> 
> 
> Am 07.01.2021 um 14:55 schrieb Erik Fäßler:
>> Hi Peter and thank you once again for your excellent support of your 
>> excellent RUTA software!
> 
> 
> You are welcome :-)
> 
> 
>> 
>> Your second example was very much what I needed. Thank you so far!
>> I have one last bump in the road:
>> 
>> My Person#id feature is an FSArray with ID annotations instead of a plain 
>> uima.cas.String. So, one Person annotation might have multiple IDs per the 
>> type system.
>> The ID type has a feature “entryId”.
>> In my particular case I actually have only one entry in the id array. Still, 
>> I need to access this entry somehow.
>> Is that at all possible in RUTA? I would need something like
>> 
>> 
>> // collect ids of all covered Persons using an extra list
>> STRINGLIST ids;
>> pe:PersonEnumeration{-> pe.personIds = ids}
>>    <-{p:Person{-> ADD(ids,p.id <http://p.id/> <http://p.id/ 
>> <http://p.id/>>[0].entryId)};};
>> 
>> This does not seem to be covered by the FeatureExpression grammar in RUTA. 
>> Is there a work around? Otherwise I will have to solve it some other way.
> 
> 
> there are actual "indexed" expressions like Person.ids[0] but it's not
> yet an "official" and stable feature. However, I think it's not even
> necessary.
> 
> 
> Is your typesystem available somewhere? JCoRe?
> 
> Is this a solution for you?
> 
> 
> PACKAGE uima.ruta;
> 
> // mock types
> DECLARE CC, EnumCC;
> DECLARE Person (FSArray ids);
> DECLARE PersonId (String personId);
> DECLARE PersonEnumeration (StringArray personIds);
> 
> // mock annotations
> "Trump" -> Person;
> "Biden" -> Person;
> "and" -> CC;
> INT counter = 1;
> p:Person{-> pid:CREATE(PersonId, "personId" = "id_" + (counter)),
> counter = counter +1, p.ids = pid};
> 
> (COMMA? @CC){-> EnumCC};
> 
> // identify enum span
> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
> 
> // collect ids of all covered Persons using a extra list
> STRINGLIST ids;
> pe:PersonEnumeration{-> pe.personIds = ids}
>     <-{p:Person{-> ADD(ids,p.ids.personId)};};
> 
> 
> Best,
> 
> 
> Peter
> 
> 
> 
>> 
>> Many thanks,
>> 
>> Erik
>> 
>>> On 7. Jan 2021, at 10:47, Peter Klügl <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hi Erik,
>>> 
>>> 
>>> it depends on how you want to represent the information of the ids of
>>> the covered Person annotations. You somehow need to represent the values
>>> in the PersonEnumeration annotation. I assume that the ID feature of
>>> Person is uima.cas.String? PersonEnumeration could either use one String
>>> Feature, a StringArray feature or a FSArray feature (pointing to the
>>> Person annotation which provide the IDs).
>>> 
>>> Here are two examples:
>>> 
>>> 
>>> PACKAGE uima.ruta;
>>> 
>>> // mock types
>>> DECLARE CC, EnumCC;
>>> DECLARE Person (STRING id);
>>> DECLARE PersonEnumeration (FSArray persons);
>>> 
>>> // mock annotations
>>> "Trump" -> Person ("id" = "1");
>>> "Biden" -> Person ("id" = "2");
>>> "and" -> CC;
>>> 
>>> COMMA? @CC{-> EnumCC};
>>> 
>>> // identify enum span
>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>> 
>>> // collect all covered Persons
>>> pe:PersonEnumeration{-> pe.persons = Person};
>>> 
>>> ########################
>>> 
>>> ########################
>>> 
>>> PACKAGE uima.ruta;
>>> 
>>> // mock types
>>> DECLARE CC, EnumCC;
>>> DECLARE Person (STRING id);
>>> DECLARE PersonEnumeration (StringArray personIds);
>>> 
>>> // mock annotations
>>> "Trump" -> Person ("id" = "1");
>>> "Biden" -> Person ("id" = "2");
>>> "and" -> CC;
>>> 
>>> COMMA? @CC{-> EnumCC};
>>> 
>>> // identify enum span
>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>> 
>>> // collect ids of all covered Persons using an extra list
>>> STRINGLIST ids;
>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>    <-{p:Person{-> ADD(ids,p.id)};};
>>> 
>>> 
>>> 
>>> 
>>> Best,
>>> 
>>> 
>>> Peter
>>> 
>>> 
>>> Am 06.01.2021 um 08:29 schrieb Erik Fäßler:
>>>> Hello everyone (and a happy new year :-)),
>>>> 
>>>> I have been working on the following issue: Whenever there is conjunction 
>>>> in text of two entities (e.g. [...]Biden and Trump ran for president […]) 
>>>> I create a new annotation spanning both entities and the conjunction 
>>>> ([Biden and Trump]_coordination). I can do this fine.
>>>> However, my entities - Biden and Trump - also have the ID feature. The new 
>>>> annotation should receive both IDs from the Biden and Trump annotations. 
>>>> But I couldn’t manage to do this.
>>>> 
>>>> I have rules like this:
>>>> 
>>>> (Person (
>>>>   ",” (Person)
>>>>    ","? PennBioIEPOSTag.value=="CC"
>>>> Person
>>>> ) {->MARK(PersonEnumeration)};
>>>> 
>>>> So an enumeration of Persons are covered with a new annotation of type 
>>>> “PersonEnumeration”. And now “PersonEnumeration” should receive all the ID 
>>>> features from the covered Person annotations. How can I do this?
>>>> 
>>>> Best,
>>>> 
>>>> Erik
>>> -- 
>>> Dr. Peter Klügl
>>> Head of Text Mining/Machine Learning
>>> 
>>> Averbis GmbH
>>> Salzstr. 15
>>> 79098 Freiburg
>>> Germany
>>> 
>>> Fon: +49 761 708 394 0
>>> Fax: +49 761 708 394 10
>>> Email: [email protected]
>>> Web: https://averbis.com
>>> 
>>> Headquarters: Freiburg im Breisgau
>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>> 
>> 
> -- 
> Dr. Peter Klügl
> Head of Text Mining/Machine Learning
> 
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
> 
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: [email protected] <mailto:[email protected]>
> Web: https://averbis.com <https://averbis.com/>
> 
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Re: RUTA: Copy features into new annotation

Reply via email to