Re: RUTA: Copy features into new annotation

Erik Fäßler Wed, 13 Jan 2021 03:04:47 -0800

> 
> :-)
> 
> I was looking the the Person definition there, but didn't find matching
> features.


Oh, sorry, I did not articulate myself clear enough: In my real case work I 
don’t have Person annotations but Organism annotation which are derived from 
ConceptMentions. And ConceptMentions have the resourceEntryList feature.
I apologize for the confusion. For the matter of simplicity I made up the 
Person example in my initial E-Mail and now and bit me in the a** ;-)
> 
> 
> In general, I find it better to create additional annotations for
> complex structures instead of merging the information in an existing
> annotation, simple due to maintainability reasons. It's easier to
> inspect unintended behavior several month later that way ...

Great, I am with you here, feels like I did it the recommended way.
> 
> 
>> 
>> So actually, there is one step missing now: I need to replace merged 
>> Organism entries with the covering OrganismEnumeration (Person and 
>> PersonEnumeration in my example).
> 
> 
> I am not sure what the input/output behavior should be. Don't you have
> two separate annotations and isn't the enum the merge of the semantic?

You’re right. And I think I will leave it this way. I’m thinking too 
complicated.
> 
> 
> Labels and inlined rules are the two best language features I added in
> Ruta, really useful. Let me know if you want to learn more about them
> and if there is information missing in the documentation.
> 

No, it’s all great. It’s just not that trivial and, honestly, while I had a 
look at the base syntax, I came quite far with cherry-picking from the 
documentation what I needed. I did not study the syntax in great detail because 
I could always make it work with doing it. That’s my bad. But this time I 
didn’t know where to start so I asked. And you helped me a lot, thank you so 
much.
RUTA is a great tool. I only have trouble of a regular exceptions in the 
Eclipse Workbench but I got used to it and I have probably combined wrong 
versions of RUTA and Eclipse or something.

Thank you!

Erik

> 
> 
> Best,
> 
> 
> Peter
> 
> 
> 
>> 
>> construction so this enumeration-annotation-merging might actually be easy 
>> and I just don’t see it.
>> 
>> Thank you so much!
>> 
>> Erik
>> 
>>> On 10. Jan 2021, at 16:21, Peter Klügl <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> 
>>> Am 07.01.2021 um 14:55 schrieb Erik Fäßler:
>>>> Hi Peter and thank you once again for your excellent support of your 
>>>> excellent RUTA software!
>>> 
>>> You are welcome :-)
>>> 
>>> 
>>>> Your second example was very much what I needed. Thank you so far!
>>>> I have one last bump in the road:
>>>> 
>>>> My Person#id feature is an FSArray with ID annotations instead of a plain 
>>>> uima.cas.String. So, one Person annotation might have multiple IDs per the 
>>>> type system.
>>>> The ID type has a feature “entryId”.
>>>> In my particular case I actually have only one entry in the id array. 
>>>> Still, I need to access this entry somehow.
>>>> Is that at all possible in RUTA? I would need something like
>>>> 
>>>> 
>>>> // collect ids of all covered Persons using an extra list
>>>> STRINGLIST ids;
>>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>>   <-{p:Person{-> ADD(ids,p.id <http://p.id/> <http://p.id/ 
>>>> <http://p.id/>>[0].entryId)};};
>>>> 
>>>> This does not seem to be covered by the FeatureExpression grammar in RUTA. 
>>>> Is there a work around? Otherwise I will have to solve it some other way.
>>> 
>>> there are actual "indexed" expressions like Person.ids[0] but it's not
>>> yet an "official" and stable feature. However, I think it's not even
>>> necessary.
>>> 
>>> 
>>> Is your typesystem available somewhere? JCoRe?
>>> 
>>> Is this a solution for you?
>>> 
>>> 
>>> PACKAGE uima.ruta;
>>> 
>>> // mock types
>>> DECLARE CC, EnumCC;
>>> DECLARE Person (FSArray ids);
>>> DECLARE PersonId (String personId);
>>> DECLARE PersonEnumeration (StringArray personIds);
>>> 
>>> // mock annotations
>>> "Trump" -> Person;
>>> "Biden" -> Person;
>>> "and" -> CC;
>>> INT counter = 1;
>>> p:Person{-> pid:CREATE(PersonId, "personId" = "id_" + (counter)),
>>> counter = counter +1, p.ids = pid};
>>> 
>>> (COMMA? @CC){-> EnumCC};
>>> 
>>> // identify enum span
>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>> 
>>> // collect ids of all covered Persons using a extra list
>>> STRINGLIST ids;
>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>    <-{p:Person{-> ADD(ids,p.ids.personId)};};
>>> 
>>> 
>>> Best,
>>> 
>>> 
>>> Peter
>>> 
>>> 
>>> 
>>>> Many thanks,
>>>> 
>>>> Erik
>>>> 
>>>>> On 7. Jan 2021, at 10:47, Peter Klügl <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Hi Erik,
>>>>> 
>>>>> 
>>>>> it depends on how you want to represent the information of the ids of
>>>>> the covered Person annotations. You somehow need to represent the values
>>>>> in the PersonEnumeration annotation. I assume that the ID feature of
>>>>> Person is uima.cas.String? PersonEnumeration could either use one String
>>>>> Feature, a StringArray feature or a FSArray feature (pointing to the
>>>>> Person annotation which provide the IDs).
>>>>> 
>>>>> Here are two examples:
>>>>> 
>>>>> 
>>>>> PACKAGE uima.ruta;
>>>>> 
>>>>> // mock types
>>>>> DECLARE CC, EnumCC;
>>>>> DECLARE Person (STRING id);
>>>>> DECLARE PersonEnumeration (FSArray persons);
>>>>> 
>>>>> // mock annotations
>>>>> "Trump" -> Person ("id" = "1");
>>>>> "Biden" -> Person ("id" = "2");
>>>>> "and" -> CC;
>>>>> 
>>>>> COMMA? @CC{-> EnumCC};
>>>>> 
>>>>> // identify enum span
>>>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>>>> 
>>>>> // collect all covered Persons
>>>>> pe:PersonEnumeration{-> pe.persons = Person};
>>>>> 
>>>>> ########################
>>>>> 
>>>>> ########################
>>>>> 
>>>>> PACKAGE uima.ruta;
>>>>> 
>>>>> // mock types
>>>>> DECLARE CC, EnumCC;
>>>>> DECLARE Person (STRING id);
>>>>> DECLARE PersonEnumeration (StringArray personIds);
>>>>> 
>>>>> // mock annotations
>>>>> "Trump" -> Person ("id" = "1");
>>>>> "Biden" -> Person ("id" = "2");
>>>>> "and" -> CC;
>>>>> 
>>>>> COMMA? @CC{-> EnumCC};
>>>>> 
>>>>> // identify enum span
>>>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>>>> 
>>>>> // collect ids of all covered Persons using an extra list
>>>>> STRINGLIST ids;
>>>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>>>   <-{p:Person{-> ADD(ids,p.id)};};
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Best,
>>>>> 
>>>>> 
>>>>> Peter
>>>>> 
>>>>> 
>>>>> Am 06.01.2021 um 08:29 schrieb Erik Fäßler:
>>>>>> Hello everyone (and a happy new year :-)),
>>>>>> 
>>>>>> I have been working on the following issue: Whenever there is 
>>>>>> conjunction in text of two entities (e.g. [...]Biden and Trump ran for 
>>>>>> president […]) I create a new annotation spanning both entities and the 
>>>>>> conjunction ([Biden and Trump]_coordination). I can do this fine.
>>>>>> However, my entities - Biden and Trump - also have the ID feature. The 
>>>>>> new annotation should receive both IDs from the Biden and Trump 
>>>>>> annotations. But I couldn’t manage to do this.
>>>>>> 
>>>>>> I have rules like this:
>>>>>> 
>>>>>> (Person (
>>>>>>  ",” (Person)
>>>>>>   ","? PennBioIEPOSTag.value=="CC"
>>>>>> Person
>>>>>> ) {->MARK(PersonEnumeration)};
>>>>>> 
>>>>>> So an enumeration of Persons are covered with a new annotation of type 
>>>>>> “PersonEnumeration”. And now “PersonEnumeration” should receive all the 
>>>>>> ID features from the covered Person annotations. How can I do this?
>>>>>> 
>>>>>> Best,
>>>>>> 
>>>>>> Erik
>>>>> -- 
>>>>> Dr. Peter Klügl
>>>>> Head of Text Mining/Machine Learning
>>>>> 
>>>>> Averbis GmbH
>>>>> Salzstr. 15
>>>>> 79098 Freiburg
>>>>> Germany
>>>>> 
>>>>> Fon: +49 761 708 394 0
>>>>> Fax: +49 761 708 394 10
>>>>> Email: [email protected]
>>>>> Web: https://averbis.com
>>>>> 
>>>>> Headquarters: Freiburg im Breisgau
>>>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>>>> 
>>> -- 
>>> Dr. Peter Klügl
>>> Head of Text Mining/Machine Learning
>>> 
>>> Averbis GmbH
>>> Salzstr. 15
>>> 79098 Freiburg
>>> Germany
>>> 
>>> Fon: +49 761 708 394 0
>>> Fax: +49 761 708 394 10
>>> Email: [email protected] <mailto:[email protected]>
>>> Web: https://averbis.com <https://averbis.com/>
>>> 
>>> Headquarters: Freiburg im Breisgau
>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>> 
> -- 
> Dr. Peter Klügl
> Head of Text Mining/Machine Learning
> 
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
> 
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: [email protected] <mailto:[email protected]>
> Web: https://averbis.com <https://averbis.com/>
> 
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Re: RUTA: Copy features into new annotation

Reply via email to