Hi, Am 21.10.2019 um 21:46 schrieb Mario Juric: > Thanks Peter, > > No problem with the delay. I was on vacation myself, and sometimes it is just > necessary to pull the plug :) > > I am just happy that you take the time to answer my questions, and I think > your answers help making sense to this. I now have some ideas that I can > experiment with to see what works, but it’s possible to use RutaBasic when > optional spaces are included in the rules, although it gets more awkward. I > would still prefer to avoid this and having a type-based rule-logic feature > would makes sense in our case. Shall I create a feature request for this?
Yes, please create a ticket. Even specifiying what should be done helps, especially including more use cases than my own... Best, Peter > > I wouldn’t expect you to do this any time soon, but let me know if there is > something I could help out with when the time comes. > > Cheers, > Mario > > > > > > > > > > > > > >> On 18 Oct 2019, at 10:10 , Peter Klügl <peter.klu...@averbis.com> wrote: >> >> Hi, >> >> >> sorry for the delayed reply. >> >> >> comments below... >> >> >> Am 09.10.2019 um 22:19 schrieb Mario Juric: >>> Hi Peter, >>> >>> Thanks a lot for the answer. >>> >>> I am still trying to wrap my head around this, and I understand the issues >>> at play when dealing with a generic rule engine, since I am looking at an >>> isolated case only. I was just thinking that in my particular case the >>> covering annotation starts before matching 'Dog Cat’, so why would its >>> ending right before Cat prevent the rule from firing? It doesn’t follow >>> Dog, and a rule like “Dog Covering {->MARK(CHASE)}” wouldn’t therefore be >>> matched either, but I understand now that it is enough that something else >>> being present in this area between the two rule elements is enough for the >>> match to fail. However, as you describe, the presence of SPACE annotations >>> and a rule like Dog SPACE Cat { -> MARK(CHASE)} would succeed in matching >>> despite the presence of the covering annotation. >> >> The main thing here is probably the requirement that the logic for >> applying the visibility concept should always be symmetric, meaning it >> should be the same regardless if the rule matches from left to right or >> from right to left (or inside out). >> >> In your example, the rule matches from left to right (I assume), so that >> behavior that the last space is not skipped is not intuitive at all. >> However, if the rule would match for some reason from right to left, >> e.g., because of dynamic anchoring or a manual anchor, then the >> inference would detect a starting Covering annotation as the next >> possible position, which is not invisible (since there is nothing at all >> invisible). So there would actually be something that could be matched, >> but it is not the correct type (Dog). >> >> I do not know if this explanation makes sense... it's easier with a >> whiteboard ;-) >> >> >> >>> Have you ever described the implementation of the matching in some paper or >>> similar? I would be interested to have a look at it, but maybe it’s better >>> just to have a go at the code? I would certainly prefer reading a high >>> level abstract specification first though :) >> >> The last paper is the NLE journal article, which contains some high >> level description of the algorithm. However, this is some really >> specific functionality for a specific scenario. So, if I write a new >> paper, it will most likely not cover this. >> >> >>> Generally I cannot just trim the annotations in the real application, since >>> some of these whitespaces are included in the marking for various reasons. >>> I therefore played around with type filtering, since I was hoping that the >>> type filter would allow me to match the rules while ignoring any presence >>> of filtered types. I was again surprised to find out that filtering the >>> Covering type while retaining Cat and Dog would in this case just prevent >>> anything from being matched, because it seems to make all those text parts >>> invisible where the filtered types appear, no matter if they cover any >>> retained annotation types. So this didn’t seem to solve my problem either, >>> although I could of course try to mark those areas I otherwise would >>> consider trimming and include those in the rules like a space or filter on >>> them, which I guess is what you suggested. It suddenly just becomes >>> somewhat awkward though, and it may just be more clear to use RutaBasic >>> with the rules instead. >> >> Yes, the visibility concept in Ruta is not type-based but type >> coverage-based (and I think that's really cool) >> >> It is possible to extend the functionality to additionally support >> type-based logic, but I do not know when this would be ready. >> >> I would not recommend to use RutaBasic in the rules (I actually do not >> know right now, if it would work), but if you do, then you should >> probably deactivate the "empty is invisible" option. >> >> >> Best, >> >> >> Peter >> >> >>> Cheers, >>> Mario >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>> On 9 Oct 2019, at 09:35 , Peter Klügl <peter.klu...@averbis.com> wrote: >>>> >>>> Hi Mario, >>>> >>>> >>>> I need to take a closer look as this is not the usual scenario :-) >>>> >>>> >>>> However, without testing, I would assume that the second rule does not >>>> match because the space between dog and cat is not "empty". >>>> >>>> >>>> Normally, you have a complete partitioning provided by the seeding which >>>> causes the RutaBasic annotations. If there are only a few annotations, >>>> then there needs to be a decision if a text position is visible or not >>>> (as you have no SPACE, BREAK and MARKUP annotation). You would expect >>>> that the space between the annotations is ignored, but there is actually >>>> no reason why Ruta should do that, as there is no information at all >>>> that it should be ignored (... generic system, you might want to write >>>> rules for whitespaces...). In order to avoid this problem in such >>>> situations there is the option to define empty RutaBasics as invisible. >>>> That are text position where no annotation begins or ends (and not >>>> covered by annotations) AFAIR and sequential matching could not match at >>>> all anyway. Thus, the first space is ignored, but the not the second, >>>> because the Covering annotation ends there. >>>> >>>> >>>> Does that make sense? >>>> >>>> >>>> I think there are many option how your rules can become more robust, but >>>> that depends on your complete system/pipeline. Is it an option to trim >>>> annotations in order to avoid whitespaces at the beginning or ending? Is >>>> it easy to identify these positions? You could create an annotation >>>> there and filter it the type. >>>> >>>> >>>> >>>> Best, >>>> >>>> >>>> Peter >>>> >>>> >>>> >>>> Am 07.10.2019 um 10:21 schrieb Mario Juric: >>>>> Hi Peter, >>>>> >>>>> I have a script that is executed without any seeders for performance >>>>> reasons, and we don’t need the seeded annotations in that case. I have an >>>>> issue involving annotation elements that partially cover the rule >>>>> elements of interest, and I do not have a simple solution for it, so I >>>>> have a question about the match semantics. Let me explain it using a >>>>> simple example and the text ‘cat dog cat’. >>>>> >>>>> Assume the following 4 annotation types and 2 rule statements: >>>>> >>>>> DECLARE Covering; >>>>> DECLARE Cat; >>>>> DECLARE Dog; >>>>> DECLARE CHASE; >>>>> Cat Dog { -> MARK(CHASE)}; >>>>> Dog Cat { -> MARK(CHASE)}; >>>>> Assume prior to script execution the following annotations with >>>>> beginnings and endings: >>>>> >>>>> Cat[0,3[ >>>>> Dog[4,7[ >>>>> Cat[8,11[ >>>>> Covering[0,8[ >>>>> >>>>> The Covering annotation is an example of the disturbing element that I >>>>> observed, which has nothing or little to do with what I am trying to >>>>> match. It just happens to be there for a reason unrelated to these rules, >>>>> but it causes the second rule not to match when I expected it. Only the >>>>> first rule fires, but the second will also fire when I change Covering >>>>> bounds to [0,7[ though. >>>>> >>>>> The order in which elements are matched seems very different from how >>>>> they are usually selected from the CAS index, where you would get >>>>> 'Covering Cat Dog Cat’, and with this order you would intuitvely expect >>>>> both rules to match. This would probably be overly simplified though, >>>>> since I would not be able to match adjacent covering annotations this >>>>> way, so I believe matching is somehow based on edge detection. Sill, I >>>>> have difficulties to understand why that extra covering space makes a >>>>> difference. >>>>> >>>>> I was hoping you could provide me with some details, and I also like to >>>>> know what possible workaround options I have. I was considering playing >>>>> around with type filtering, but it would require a bit of adding/removing >>>>> types to be filtered during the script, so it didn’t seem as the simplest >>>>> solution. Ensuring that covering always aligns with the end of a token is >>>>> another possibility in this particular case, but I still need to add >>>>> general robustness to the Ruta script against these scenarios. Any >>>>> feedback is mostly appreciated, thanks :) >>>>> >>>>> Cheers, >>>>> Mario >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> -- >>>> Dr. Peter Klügl >>>> R&D Text Mining/Machine Learning >>>> >>>> Averbis GmbH >>>> Salzstr. 15 >>>> 79098 Freiburg >>>> Germany >>>> >>>> Fon: +49 761 708 394 0 >>>> Fax: +49 761 708 394 10 >>>> Email: peter.klu...@averbis.com >>>> Web: https://averbis.com >>>> >>>> Headquarters: Freiburg im Breisgau >>>> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 >>>> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó >>>> >> -- >> Dr. Peter Klügl >> R&D Text Mining/Machine Learning >> >> Averbis GmbH >> Salzstr. 15 >> 79098 Freiburg >> Germany >> >> Fon: +49 761 708 394 0 >> Fax: +49 761 708 394 10 >> Email: peter.klu...@averbis.com <mailto:peter.klu...@averbis.com> >> Web: https://averbis.com <https://averbis.com/> >> >> Headquarters: Freiburg im Breisgau >> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 >> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó >