Thanks Peter,

No problem with the delay. I was on vacation myself, and sometimes it is just 
necessary to pull the plug :)

I am just happy that you take the time to answer my questions, and I think your 
answers help making sense to this. I now have some ideas that I can experiment 
with to see what works, but it’s possible to use RutaBasic when optional spaces 
are included in the rules, although it gets more awkward. I would still prefer 
to avoid this and having a type-based rule-logic feature would makes sense in 
our case. Shall I create a feature request for this?

I wouldn’t expect you to do this any time soon, but let me know if there is 
something I could help out with when the time comes.

Cheers,
Mario













> On 18 Oct 2019, at 10:10 , Peter Klügl <[email protected]> wrote:
> 
> Hi,
> 
> 
> sorry for the delayed reply.
> 
> 
> comments below...
> 
> 
> Am 09.10.2019 um 22:19 schrieb Mario Juric:
>> Hi Peter,
>> 
>> Thanks a lot for the answer.
>> 
>> I am still trying to wrap my head around this, and I understand the issues 
>> at play when dealing with a generic rule engine, since I am looking at an 
>> isolated case only. I was just thinking that in my particular case the 
>> covering annotation starts before matching 'Dog Cat’, so why would its 
>> ending right before Cat prevent the rule from firing? It doesn’t follow Dog, 
>> and a rule like “Dog Covering {->MARK(CHASE)}” wouldn’t therefore be matched 
>> either, but I understand now that it is enough that something else being 
>> present in this area between the two rule elements is enough for the match 
>> to fail. However, as you describe, the presence of SPACE annotations and a 
>> rule like Dog SPACE Cat { -> MARK(CHASE)} would succeed in matching despite 
>> the presence of the covering annotation.
> 
> 
> The main thing here is probably the requirement that the logic for
> applying the visibility concept should always be symmetric, meaning it
> should be the same regardless if the rule matches from left to right or
> from right to left (or inside out).
> 
> In your example, the rule matches from left to right (I assume), so that
> behavior that the last space is not skipped is not intuitive at all.
> However, if the rule would match for some reason from right to left,
> e.g., because of dynamic anchoring or a manual anchor, then the
> inference would detect a starting Covering annotation as the next
> possible position, which is not invisible (since there is nothing at all
> invisible). So there would actually be something that could be matched,
> but it is not the correct type (Dog).
> 
> I do not know if this explanation makes sense... it's easier with a
> whiteboard ;-)
> 
> 
> 
>> Have you ever described the implementation of the matching in some paper or 
>> similar? I would be interested to have a look at it, but maybe it’s better 
>> just to have a go at the code? I would certainly prefer reading a high level 
>> abstract specification first though :)
> 
> 
> The last paper is the NLE journal article, which contains some high
> level description of the algorithm. However, this is some really
> specific functionality for a specific scenario. So, if I write a new
> paper, it will most likely not cover this.
> 
> 
>> 
>> Generally I cannot just trim the annotations in the real application, since 
>> some of these whitespaces are included in the marking for various reasons. I 
>> therefore played around with type filtering, since I was hoping that the 
>> type filter would allow me to match the rules while ignoring any presence of 
>> filtered types. I was again surprised to find out that filtering the 
>> Covering type while retaining Cat and Dog would in this case just prevent 
>> anything from being matched, because it seems to make all those text parts 
>> invisible where the filtered types appear, no matter if they cover any 
>> retained annotation types. So this didn’t seem to solve my problem either, 
>> although I could of course try to mark those areas I otherwise would 
>> consider trimming and include those in the rules like a space or filter on 
>> them, which I guess is what you suggested. It suddenly just becomes somewhat 
>> awkward though, and it may just be more clear to use RutaBasic with the 
>> rules instead.
> 
> 
> Yes, the visibility concept in Ruta is not type-based but type
> coverage-based (and I think that's really cool)
> 
> It is possible to extend the functionality to additionally support
> type-based logic, but I do not know when this would be ready.
> 
> I would not recommend to use RutaBasic in the rules (I actually do not
> know right now, if it would work), but if you do, then you should
> probably deactivate the "empty is invisible" option.
> 
> 
> Best,
> 
> 
> Peter
> 
> 
>> 
>> Cheers,
>> Mario
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> On 9 Oct 2019, at 09:35 , Peter Klügl <[email protected]> wrote:
>>> 
>>> Hi Mario,
>>> 
>>> 
>>> I need to take a closer look as this is not the usual scenario :-)
>>> 
>>> 
>>> However, without testing, I would assume that the second rule does not
>>> match because the space between dog and cat is not "empty".
>>> 
>>> 
>>> Normally, you have a complete partitioning provided by the seeding which
>>> causes the RutaBasic annotations. If there are only a few annotations,
>>> then there needs to be a decision if a text position is visible or not
>>> (as you have no SPACE, BREAK and MARKUP annotation). You would expect
>>> that the space between the annotations is ignored, but there is actually
>>> no reason why Ruta should do that, as there is no information at all
>>> that it should be ignored (... generic system, you might want to write
>>> rules for whitespaces...). In order to avoid this problem in such
>>> situations there is the option to define empty RutaBasics as invisible.
>>> That are text position where no annotation begins or ends (and not
>>> covered by annotations) AFAIR and sequential matching could not match at
>>> all anyway. Thus, the first space is ignored, but the not the second,
>>> because the Covering annotation ends there.
>>> 
>>> 
>>> Does that make sense?
>>> 
>>> 
>>> I think there are many option how your rules can become more robust, but
>>> that depends on your complete system/pipeline. Is it an option to trim
>>> annotations in order to avoid whitespaces at the beginning or ending? Is
>>> it easy to identify these positions? You could create an annotation
>>> there and filter it the type.
>>> 
>>> 
>>> 
>>> Best,
>>> 
>>> 
>>> Peter
>>> 
>>> 
>>> 
>>> Am 07.10.2019 um 10:21 schrieb Mario Juric:
>>>> Hi Peter,
>>>> 
>>>> I have a script that is executed without any seeders for performance 
>>>> reasons, and we don’t need the seeded annotations in that case. I have an 
>>>> issue involving annotation elements that partially cover the rule elements 
>>>> of interest, and I do not have a simple solution for it, so I have a 
>>>> question about the match semantics. Let me explain it using a simple 
>>>> example and the text ‘cat dog cat’.
>>>> 
>>>> Assume the following 4 annotation types and 2 rule statements:
>>>> 
>>>> DECLARE Covering;
>>>> DECLARE Cat;
>>>> DECLARE Dog;
>>>> DECLARE CHASE;
>>>> Cat Dog { -> MARK(CHASE)};
>>>> Dog Cat { -> MARK(CHASE)};
>>>> Assume prior to script execution the following annotations with beginnings 
>>>> and endings:
>>>> 
>>>> Cat[0,3[
>>>> Dog[4,7[
>>>> Cat[8,11[
>>>> Covering[0,8[
>>>> 
>>>> The Covering annotation is an example of the disturbing element that I 
>>>> observed, which has nothing or little to do with what I am trying to 
>>>> match. It just happens to be there for a reason unrelated to these rules, 
>>>> but it causes the second rule not to match when I expected it. Only the 
>>>> first rule fires, but the second will also fire when I change Covering 
>>>> bounds to [0,7[ though.
>>>> 
>>>> The order in which elements are matched seems very different from how they 
>>>> are usually selected from the CAS index, where you would get 'Covering Cat 
>>>> Dog Cat’, and with this order you would intuitvely expect both rules to 
>>>> match. This would probably be overly simplified though, since I would not 
>>>> be able to match adjacent covering annotations this way, so I believe 
>>>> matching is somehow based on edge detection. Sill, I have difficulties to 
>>>> understand why that extra covering space makes a difference.
>>>> 
>>>> I was hoping you could provide me with some details, and I also like to 
>>>> know what possible workaround options I have. I was considering playing 
>>>> around with type filtering, but it would require a bit of adding/removing 
>>>> types to be filtered during the script, so it didn’t seem as the simplest 
>>>> solution. Ensuring that covering always aligns with the end of a token is 
>>>> another possibility in this particular case, but I still need to add 
>>>> general robustness to the Ruta script against these scenarios. Any 
>>>> feedback is mostly appreciated, thanks :)
>>>> 
>>>> Cheers,
>>>> Mario
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> -- 
>>> Dr. Peter Klügl
>>> R&D Text Mining/Machine Learning
>>> 
>>> Averbis GmbH
>>> Salzstr. 15
>>> 79098 Freiburg
>>> Germany
>>> 
>>> Fon: +49 761 708 394 0
>>> Fax: +49 761 708 394 10
>>> Email: [email protected]
>>> Web: https://averbis.com
>>> 
>>> Headquarters: Freiburg im Breisgau
>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>>> 
>> 
> -- 
> Dr. Peter Klügl
> R&D Text Mining/Machine Learning
> 
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
> 
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: [email protected] <mailto:[email protected]>
> Web: https://averbis.com <https://averbis.com/>
> 
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Reply via email to