Re: RUTA: Copy features into new annotation

Peter Klügl Thu, 14 Jan 2021 00:46:03 -0800

Hi,

Am 13.01.2021 um 12:04 schrieb Erik Fäßler:
>> :-)
>>
>> I was looking the the Person definition there, but didn't find matching
>> features.
> Oh, sorry, I did not articulate myself clear enough: In my real case work I 
> don’t have Person annotations but Organism annotation which are derived from 
> ConceptMentions. And ConceptMentions have the resourceEntryList feature.
> I apologize for the confusion. For the matter of simplicity I made up the 
> Person example in my initial E-Mail and now and bit me in the a** ;-)



Ah no, all fine. When I prepared the first exemplary rules, I wondered
about the type range of the id feature. As I assumed you were using the
JCore type systems as your question indicated some non-trivial real
world use case. I have a quick look (1min) if I can identify the range
for the ids Person annotations in these type systems but failed... so I
simply used String as range :-)



>>
>> In general, I find it better to create additional annotations for
>> complex structures instead of merging the information in an existing
>> annotation, simple due to maintainability reasons. It's easier to
>> inspect unintended behavior several month later that way ...
> Great, I am with you here, feels like I did it the recommended way.
>>
>>> So actually, there is one step missing now: I need to replace merged 
>>> Organism entries with the covering OrganismEnumeration (Person and 
>>> PersonEnumeration in my example).
>>
>> I am not sure what the input/output behavior should be. Don't you have
>> two separate annotations and isn't the enum the merge of the semantic?
> You’re right. And I think I will leave it this way. I’m thinking too 
> complicated.
>>
>> Labels and inlined rules are the two best language features I added in
>> Ruta, really useful. Let me know if you want to learn more about them
>> and if there is information missing in the documentation.
>>
> No, it’s all great. It’s just not that trivial and, honestly, while I had a 
> look at the base syntax, I came quite far with cherry-picking from the 
> documentation what I needed. I did not study the syntax in great detail 
> because I could always make it work with doing it. That’s my bad. But this 
> time I didn’t know where to start so I asked. And you helped me a lot, thank 
> you so much.
> RUTA is a great tool. I only have trouble of a regular exceptions in the 
> Eclipse Workbench but I got used to it and I have probably combined wrong 
> versions of RUTA and Eclipse or something.


There were several reports of problems lately which had their source in
different Java versions used.



Best,


Peter



>
> Thank you!
>
> Erik
>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>>
>>> construction so this enumeration-annotation-merging might actually be easy 
>>> and I just don’t see it.
>>>
>>> Thank you so much!
>>>
>>> Erik
>>>
>>>> On 10. Jan 2021, at 16:21, Peter Klügl <[email protected]> wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> Am 07.01.2021 um 14:55 schrieb Erik Fäßler:
>>>>> Hi Peter and thank you once again for your excellent support of your 
>>>>> excellent RUTA software!
>>>> You are welcome :-)
>>>>
>>>>
>>>>> Your second example was very much what I needed. Thank you so far!
>>>>> I have one last bump in the road:
>>>>>
>>>>> My Person#id feature is an FSArray with ID annotations instead of a plain 
>>>>> uima.cas.String. So, one Person annotation might have multiple IDs per 
>>>>> the type system.
>>>>> The ID type has a feature “entryId”.
>>>>> In my particular case I actually have only one entry in the id array. 
>>>>> Still, I need to access this entry somehow.
>>>>> Is that at all possible in RUTA? I would need something like
>>>>>
>>>>>
>>>>> // collect ids of all covered Persons using an extra list
>>>>> STRINGLIST ids;
>>>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>>>   <-{p:Person{-> ADD(ids,p.id <http://p.id/> <http://p.id/ 
>>>>> <http://p.id/>>[0].entryId)};};
>>>>>
>>>>> This does not seem to be covered by the FeatureExpression grammar in 
>>>>> RUTA. Is there a work around? Otherwise I will have to solve it some 
>>>>> other way.
>>>> there are actual "indexed" expressions like Person.ids[0] but it's not
>>>> yet an "official" and stable feature. However, I think it's not even
>>>> necessary.
>>>>
>>>>
>>>> Is your typesystem available somewhere? JCoRe?
>>>>
>>>> Is this a solution for you?
>>>>
>>>>
>>>> PACKAGE uima.ruta;
>>>>
>>>> // mock types
>>>> DECLARE CC, EnumCC;
>>>> DECLARE Person (FSArray ids);
>>>> DECLARE PersonId (String personId);
>>>> DECLARE PersonEnumeration (StringArray personIds);
>>>>
>>>> // mock annotations
>>>> "Trump" -> Person;
>>>> "Biden" -> Person;
>>>> "and" -> CC;
>>>> INT counter = 1;
>>>> p:Person{-> pid:CREATE(PersonId, "personId" = "id_" + (counter)),
>>>> counter = counter +1, p.ids = pid};
>>>>
>>>> (COMMA? @CC){-> EnumCC};
>>>>
>>>> // identify enum span
>>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>>>
>>>> // collect ids of all covered Persons using a extra list
>>>> STRINGLIST ids;
>>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>>    <-{p:Person{-> ADD(ids,p.ids.personId)};};
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>> Peter
>>>>
>>>>
>>>>
>>>>> Many thanks,
>>>>>
>>>>> Erik
>>>>>
>>>>>> On 7. Jan 2021, at 10:47, Peter Klügl <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>>
>>>>>> Hi Erik,
>>>>>>
>>>>>>
>>>>>> it depends on how you want to represent the information of the ids of
>>>>>> the covered Person annotations. You somehow need to represent the values
>>>>>> in the PersonEnumeration annotation. I assume that the ID feature of
>>>>>> Person is uima.cas.String? PersonEnumeration could either use one String
>>>>>> Feature, a StringArray feature or a FSArray feature (pointing to the
>>>>>> Person annotation which provide the IDs).
>>>>>>
>>>>>> Here are two examples:
>>>>>>
>>>>>>
>>>>>> PACKAGE uima.ruta;
>>>>>>
>>>>>> // mock types
>>>>>> DECLARE CC, EnumCC;
>>>>>> DECLARE Person (STRING id);
>>>>>> DECLARE PersonEnumeration (FSArray persons);
>>>>>>
>>>>>> // mock annotations
>>>>>> "Trump" -> Person ("id" = "1");
>>>>>> "Biden" -> Person ("id" = "2");
>>>>>> "and" -> CC;
>>>>>>
>>>>>> COMMA? @CC{-> EnumCC};
>>>>>>
>>>>>> // identify enum span
>>>>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>>>>>
>>>>>> // collect all covered Persons
>>>>>> pe:PersonEnumeration{-> pe.persons = Person};
>>>>>>
>>>>>> ########################
>>>>>>
>>>>>> ########################
>>>>>>
>>>>>> PACKAGE uima.ruta;
>>>>>>
>>>>>> // mock types
>>>>>> DECLARE CC, EnumCC;
>>>>>> DECLARE Person (STRING id);
>>>>>> DECLARE PersonEnumeration (StringArray personIds);
>>>>>>
>>>>>> // mock annotations
>>>>>> "Trump" -> Person ("id" = "1");
>>>>>> "Biden" -> Person ("id" = "2");
>>>>>> "and" -> CC;
>>>>>>
>>>>>> COMMA? @CC{-> EnumCC};
>>>>>>
>>>>>> // identify enum span
>>>>>> (Person (COMMA Person)* EnumCC Person){-> PersonEnumeration};
>>>>>>
>>>>>> // collect ids of all covered Persons using an extra list
>>>>>> STRINGLIST ids;
>>>>>> pe:PersonEnumeration{-> pe.personIds = ids}
>>>>>>   <-{p:Person{-> ADD(ids,p.id)};};
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>>
>>>>>> Am 06.01.2021 um 08:29 schrieb Erik Fäßler:
>>>>>>> Hello everyone (and a happy new year :-)),
>>>>>>>
>>>>>>> I have been working on the following issue: Whenever there is 
>>>>>>> conjunction in text of two entities (e.g. [...]Biden and Trump ran for 
>>>>>>> president […]) I create a new annotation spanning both entities and the 
>>>>>>> conjunction ([Biden and Trump]_coordination). I can do this fine.
>>>>>>> However, my entities - Biden and Trump - also have the ID feature. The 
>>>>>>> new annotation should receive both IDs from the Biden and Trump 
>>>>>>> annotations. But I couldn’t manage to do this.
>>>>>>>
>>>>>>> I have rules like this:
>>>>>>>
>>>>>>> (Person (
>>>>>>>  ",” (Person)
>>>>>>>   ","? PennBioIEPOSTag.value=="CC"
>>>>>>> Person
>>>>>>> ) {->MARK(PersonEnumeration)};
>>>>>>>
>>>>>>> So an enumeration of Persons are covered with a new annotation of type 
>>>>>>> “PersonEnumeration”. And now “PersonEnumeration” should receive all the 
>>>>>>> ID features from the covered Person annotations. How can I do this?
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Erik
>>>>>> -- 
>>>>>> Dr. Peter Klügl
>>>>>> Head of Text Mining/Machine Learning
>>>>>>
>>>>>> Averbis GmbH
>>>>>> Salzstr. 15
>>>>>> 79098 Freiburg
>>>>>> Germany
>>>>>>
>>>>>> Fon: +49 761 708 394 0
>>>>>> Fax: +49 761 708 394 10
>>>>>> Email: [email protected]
>>>>>> Web: https://averbis.com
>>>>>>
>>>>>> Headquarters: Freiburg im Breisgau
>>>>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>>>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>>>>>
>>>> -- 
>>>> Dr. Peter Klügl
>>>> Head of Text Mining/Machine Learning
>>>>
>>>> Averbis GmbH
>>>> Salzstr. 15
>>>> 79098 Freiburg
>>>> Germany
>>>>
>>>> Fon: +49 761 708 394 0
>>>> Fax: +49 761 708 394 10
>>>> Email: [email protected] <mailto:[email protected]>
>>>> Web: https://averbis.com <https://averbis.com/>
>>>>
>>>> Headquarters: Freiburg im Breisgau
>>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>> -- 
>> Dr. Peter Klügl
>> Head of Text Mining/Machine Learning
>>
>> Averbis GmbH
>> Salzstr. 15
>> 79098 Freiburg
>> Germany
>>
>> Fon: +49 761 708 394 0
>> Fax: +49 761 708 394 10
>> Email: [email protected] <mailto:[email protected]>
>> Web: https://averbis.com <https://averbis.com/>
>>
>> Headquarters: Freiburg im Breisgau
>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>
-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: [email protected]
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Re: RUTA: Copy features into new annotation

Reply via email to