Re: Question about covering annotations in Ruta match semantics

Peter Klügl Tue, 22 Oct 2019 00:11:41 -0700

Hi,

Am 21.10.2019 um 21:46 schrieb Mario Juric:
> Thanks Peter,
>
> No problem with the delay. I was on vacation myself, and sometimes it is just 
> necessary to pull the plug :)
>
> I am just happy that you take the time to answer my questions, and I think 
> your answers help making sense to this. I now have some ideas that I can 
> experiment with to see what works, but it’s possible to use RutaBasic when 
> optional spaces are included in the rules, although it gets more awkward. I 
> would still prefer to avoid this and having a type-based rule-logic feature 
> would makes sense in our case. Shall I create a feature request for this?



Yes, please create a ticket. Even specifiying what should be done helps,
especially including more use cases than my own...


Best,


Peter


>
> I wouldn’t expect you to do this any time soon, but let me know if there is 
> something I could help out with when the time comes.
>
> Cheers,
> Mario
>
>
>
>
>
>
>
>
>
>
>
>
>
>> On 18 Oct 2019, at 10:10 , Peter Klügl <[email protected]> wrote:
>>
>> Hi,
>>
>>
>> sorry for the delayed reply.
>>
>>
>> comments below...
>>
>>
>> Am 09.10.2019 um 22:19 schrieb Mario Juric:
>>> Hi Peter,
>>>
>>> Thanks a lot for the answer.
>>>
>>> I am still trying to wrap my head around this, and I understand the issues 
>>> at play when dealing with a generic rule engine, since I am looking at an 
>>> isolated case only. I was just thinking that in my particular case the 
>>> covering annotation starts before matching 'Dog Cat’, so why would its 
>>> ending right before Cat prevent the rule from firing? It doesn’t follow 
>>> Dog, and a rule like “Dog Covering {->MARK(CHASE)}” wouldn’t therefore be 
>>> matched either, but I understand now that it is enough that something else 
>>> being present in this area between the two rule elements is enough for the 
>>> match to fail. However, as you describe, the presence of SPACE annotations 
>>> and a rule like Dog SPACE Cat { -> MARK(CHASE)} would succeed in matching 
>>> despite the presence of the covering annotation.
>>
>> The main thing here is probably the requirement that the logic for
>> applying the visibility concept should always be symmetric, meaning it
>> should be the same regardless if the rule matches from left to right or
>> from right to left (or inside out).
>>
>> In your example, the rule matches from left to right (I assume), so that
>> behavior that the last space is not skipped is not intuitive at all.
>> However, if the rule would match for some reason from right to left,
>> e.g., because of dynamic anchoring or a manual anchor, then the
>> inference would detect a starting Covering annotation as the next
>> possible position, which is not invisible (since there is nothing at all
>> invisible). So there would actually be something that could be matched,
>> but it is not the correct type (Dog).
>>
>> I do not know if this explanation makes sense... it's easier with a
>> whiteboard ;-)
>>
>>
>>
>>> Have you ever described the implementation of the matching in some paper or 
>>> similar? I would be interested to have a look at it, but maybe it’s better 
>>> just to have a go at the code? I would certainly prefer reading a high 
>>> level abstract specification first though :)
>>
>> The last paper is the NLE journal article, which contains some high
>> level description of the algorithm. However, this is some really
>> specific functionality for a specific scenario. So, if I write a new
>> paper, it will most likely not cover this.
>>
>>
>>> Generally I cannot just trim the annotations in the real application, since 
>>> some of these whitespaces are included in the marking for various reasons. 
>>> I therefore played around with type filtering, since I was hoping that the 
>>> type filter would allow me to match the rules while ignoring any presence 
>>> of filtered types. I was again surprised to find out that filtering the 
>>> Covering type while retaining Cat and Dog would in this case just prevent 
>>> anything from being matched, because it seems to make all those text parts 
>>> invisible where the filtered types appear, no matter if they cover any 
>>> retained annotation types. So this didn’t seem to solve my problem either, 
>>> although I could of course try to mark those areas I otherwise would 
>>> consider trimming and include those in the rules like a space or filter on 
>>> them, which I guess is what you suggested. It suddenly just becomes 
>>> somewhat awkward though, and it may just be more clear to use RutaBasic 
>>> with the rules instead.
>>
>> Yes, the visibility concept in Ruta is not type-based but type
>> coverage-based (and I think that's really cool)
>>
>> It is possible to extend the functionality to additionally support
>> type-based logic, but I do not know when this would be ready.
>>
>> I would not recommend to use RutaBasic in the rules (I actually do not
>> know right now, if it would work), but if you do, then you should
>> probably deactivate the "empty is invisible" option.
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>>> Cheers,
>>> Mario
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> On 9 Oct 2019, at 09:35 , Peter Klügl <[email protected]> wrote:
>>>>
>>>> Hi Mario,
>>>>
>>>>
>>>> I need to take a closer look as this is not the usual scenario :-)
>>>>
>>>>
>>>> However, without testing, I would assume that the second rule does not
>>>> match because the space between dog and cat is not "empty".
>>>>
>>>>
>>>> Normally, you have a complete partitioning provided by the seeding which
>>>> causes the RutaBasic annotations. If there are only a few annotations,
>>>> then there needs to be a decision if a text position is visible or not
>>>> (as you have no SPACE, BREAK and MARKUP annotation). You would expect
>>>> that the space between the annotations is ignored, but there is actually
>>>> no reason why Ruta should do that, as there is no information at all
>>>> that it should be ignored (... generic system, you might want to write
>>>> rules for whitespaces...). In order to avoid this problem in such
>>>> situations there is the option to define empty RutaBasics as invisible.
>>>> That are text position where no annotation begins or ends (and not
>>>> covered by annotations) AFAIR and sequential matching could not match at
>>>> all anyway. Thus, the first space is ignored, but the not the second,
>>>> because the Covering annotation ends there.
>>>>
>>>>
>>>> Does that make sense?
>>>>
>>>>
>>>> I think there are many option how your rules can become more robust, but
>>>> that depends on your complete system/pipeline. Is it an option to trim
>>>> annotations in order to avoid whitespaces at the beginning or ending? Is
>>>> it easy to identify these positions? You could create an annotation
>>>> there and filter it the type.
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>> Peter
>>>>
>>>>
>>>>
>>>> Am 07.10.2019 um 10:21 schrieb Mario Juric:
>>>>> Hi Peter,
>>>>>
>>>>> I have a script that is executed without any seeders for performance 
>>>>> reasons, and we don’t need the seeded annotations in that case. I have an 
>>>>> issue involving annotation elements that partially cover the rule 
>>>>> elements of interest, and I do not have a simple solution for it, so I 
>>>>> have a question about the match semantics. Let me explain it using a 
>>>>> simple example and the text ‘cat dog cat’.
>>>>>
>>>>> Assume the following 4 annotation types and 2 rule statements:
>>>>>
>>>>> DECLARE Covering;
>>>>> DECLARE Cat;
>>>>> DECLARE Dog;
>>>>> DECLARE CHASE;
>>>>> Cat Dog { -> MARK(CHASE)};
>>>>> Dog Cat { -> MARK(CHASE)};
>>>>> Assume prior to script execution the following annotations with 
>>>>> beginnings and endings:
>>>>>
>>>>> Cat[0,3[
>>>>> Dog[4,7[
>>>>> Cat[8,11[
>>>>> Covering[0,8[
>>>>>
>>>>> The Covering annotation is an example of the disturbing element that I 
>>>>> observed, which has nothing or little to do with what I am trying to 
>>>>> match. It just happens to be there for a reason unrelated to these rules, 
>>>>> but it causes the second rule not to match when I expected it. Only the 
>>>>> first rule fires, but the second will also fire when I change Covering 
>>>>> bounds to [0,7[ though.
>>>>>
>>>>> The order in which elements are matched seems very different from how 
>>>>> they are usually selected from the CAS index, where you would get 
>>>>> 'Covering Cat Dog Cat’, and with this order you would intuitvely expect 
>>>>> both rules to match. This would probably be overly simplified though, 
>>>>> since I would not be able to match adjacent covering annotations this 
>>>>> way, so I believe matching is somehow based on edge detection. Sill, I 
>>>>> have difficulties to understand why that extra covering space makes a 
>>>>> difference.
>>>>>
>>>>> I was hoping you could provide me with some details, and I also like to 
>>>>> know what possible workaround options I have. I was considering playing 
>>>>> around with type filtering, but it would require a bit of adding/removing 
>>>>> types to be filtered during the script, so it didn’t seem as the simplest 
>>>>> solution. Ensuring that covering always aligns with the end of a token is 
>>>>> another possibility in this particular case, but I still need to add 
>>>>> general robustness to the Ruta script against these scenarios. Any 
>>>>> feedback is mostly appreciated, thanks :)
>>>>>
>>>>> Cheers,
>>>>> Mario
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> -- 
>>>> Dr. Peter Klügl
>>>> R&D Text Mining/Machine Learning
>>>>
>>>> Averbis GmbH
>>>> Salzstr. 15
>>>> 79098 Freiburg
>>>> Germany
>>>>
>>>> Fon: +49 761 708 394 0
>>>> Fax: +49 761 708 394 10
>>>> Email: [email protected]
>>>> Web: https://averbis.com
>>>>
>>>> Headquarters: Freiburg im Breisgau
>>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>>>
>> -- 
>> Dr. Peter Klügl
>> R&D Text Mining/Machine Learning
>>
>> Averbis GmbH
>> Salzstr. 15
>> 79098 Freiburg
>> Germany
>>
>> Fon: +49 761 708 394 0
>> Fax: +49 761 708 394 10
>> Email: [email protected] <mailto:[email protected]>
>> Web: https://averbis.com <https://averbis.com/>
>>
>> Headquarters: Freiburg im Breisgau
>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>

Re: Question about covering annotations in Ruta match semantics

Reply via email to