Hi, Peter! I got that. I restricted MARKFAST on segments. It works just nearly perfect. How does MARKFAST match things? Using
Document{->MARKFAST(MyType, { "a", "b", "a b" }); on a b yields "a b" and "b" but not "a". I would like to have "a" as well. Can this be done? Buy the way: I love Ruta.apply(). That is exactly what I needed. Thanks, Armin -----Ursprüngliche Nachricht----- Von: Peter Klügl [mailto:pklu...@uni-wuerzburg.de] Gesendet: Montag, 30. Juni 2014 12:51 An: user@uima.apache.org Betreff: Re: Ruta - MARKFAST Hi, Am 30.06.2014 11:32, schrieb armin.weg...@bka.bund.de: > Hello! > > On which annotation type does MARFKAST work? It is applied on the annotations, on which the rule element of the action matched. Document{-> MARKFAST(...)}; ... causes a dictionary lookup on the complete document. Sentence{CONTAINS(...) -> MARKFAST(...)}; ... causes a separate dictionary lookup on each of the matched sentences (e.g., no inter-sentence annotations). > Can I restrict MARKFAST to a single annotation Type, say my own token type? No, but there is an issue that includes this functionality. UIMA-3775: Fast multi token dictionary matching on feature values The idea is the apply the dictionary lookup on sequences feature values (e.g., lemmas). If the feature represents the covered text, then this would also support your use case. The issue is not top priority right now, but if you want, then I can try to include it in the next release (August). > It would be nice to restrict a ruta script to a set of annotations by > giving that set of annotations explicitly, like > > Document{-> INPUT(Token, Organization, Location)}; UIMA Ruta follows a different strategy, e.g., compared to JAPE and its input specification. The availability and visibility of annotations is not type-based but coverage-based. This enables the easy specification of complex patterns, but also complicates the things sometimes. If one type is set to invisible (FILTERTYPE), then all annotations of this type and all covered annotations of other types are invisible. The MARKFAST action operates on the RutaStream and thus is lookup is sensitive to the filtering setting. For example, the lookup ignored whitespaces, breaks and markup using the default settings. By extending the set of filtered types, you can also change the behavior of the dictionary lookup. However, mind that annotations covered by one of the types are also not accessible by the dictionary. > > All other annotations should be ignored. Is there a way to do this in Ruta? Can this by done with FILTERTYPE and RETAINTYPE? How? Yes, but it depends on the actual occurrences of types in your document. The easiest way is to filter the types of the annotations that cover the positions that should be skipped. It's not easy to give a generic solution for this. An example: Your tokenizer creates annotations for words and numbers, but not for punctuation marks, and you want to apply the dictionary lookup only for sequences of token annotations skipping punctuation marks. Document{-> FILTERTYPE(PM)}; Document{-> MARKFAST(...)}; There are plans to extend and modify the concept of accessibility and visibility in UIMA Ruta sometime (>= 3.0.0). Any wishes and opinions are welcome :-) Best, Peter > > > Cheers, > Armin >
pgpq34lmv1zxF.pgp
Description: PGP signature