AW: Ruta - MARKFAST

Armin.Wegner Mon, 30 Jun 2014 05:59:37 -0700

Hi, Peter!

I got that. I restricted MARKFAST on segments. It works just nearly perfect. 
How does MARKFAST match things? Using


Document{->MARKFAST(MyType, { "a", "b", "a b" });

on

a b

yields

"a b" and "b" but not "a".

I would like to have "a" as well. Can this be done?

Buy the way: I love Ruta.apply(). That is exactly what I needed.

Thanks,
Armin
 

-----Ursprüngliche Nachricht-----
Von: Peter Klügl [mailto:pklu...@uni-wuerzburg.de] 
Gesendet: Montag, 30. Juni 2014 12:51
An: user@uima.apache.org
Betreff: Re: Ruta - MARKFAST

Hi,

Am 30.06.2014 11:32, schrieb armin.weg...@bka.bund.de:
> Hello!
>
> On which annotation type does MARFKAST work?

It is applied on the annotations, on which the rule element of the action 
matched.

Document{-> MARKFAST(...)};
... causes a dictionary lookup on the complete document.

Sentence{CONTAINS(...) -> MARKFAST(...)}; ... causes a separate dictionary 
lookup on each of the matched sentences (e.g., no inter-sentence annotations).


> Can I restrict MARKFAST to a single annotation Type, say my own token type?

No, but there is an issue that includes this functionality.

UIMA-3775: Fast multi token dictionary matching on feature values

The idea is the apply the dictionary lookup on sequences feature values (e.g., 
lemmas). If the feature represents the covered text, then this would also 
support your use case. The issue is not top priority right now, but if you 
want, then I can try to include it in the next release (August).

> It would be nice to restrict a ruta script to a set of annotations by 
> giving that set of annotations
explicitly, like
>
> Document{-> INPUT(Token, Organization, Location)};

UIMA Ruta follows a different strategy, e.g., compared to JAPE and its input 
specification. The availability and visibility of annotations is not type-based 
but coverage-based. This enables the easy specification of complex patterns, 
but also complicates the things sometimes. If one type is set to invisible 
(FILTERTYPE), then all annotations of this type and all covered annotations of 
other types are invisible.

The MARKFAST action operates on the RutaStream and thus is lookup is sensitive 
to the filtering setting. For example, the lookup ignored whitespaces, breaks 
and markup using the default settings. By extending the set of filtered types, 
you can also change the behavior of the dictionary lookup. However, mind that 
annotations covered by one of the types are also not accessible by the 
dictionary.

>
> All other annotations should be ignored. Is there a way to do this in
Ruta? Can this by done with FILTERTYPE and RETAINTYPE? How?

Yes, but it depends on the actual occurrences of types in your document.
The easiest way is to filter the types of the annotations that cover the 
positions that should be skipped. It's not easy to give a generic solution for 
this.

An example:
Your tokenizer creates annotations for words and numbers, but not for 
punctuation marks, and you want to apply the dictionary lookup only for 
sequences of token annotations skipping punctuation marks.

Document{-> FILTERTYPE(PM)};
Document{-> MARKFAST(...)};


There are plans to extend and modify the concept of accessibility and 
visibility in UIMA Ruta sometime (>= 3.0.0). Any wishes and opinions are 
welcome :-)



Best,

Peter


>
>
> Cheers,
> Armin
>

pgpq34lmv1zxF.pgp
Description: PGP signature

AW: Ruta - MARKFAST

Reply via email to