Am 17.12.2013 18:14, schrieb Alexandre Patry:
On 2013-12-17 12:10, Peter Klügl wrote:
Am 17.12.2013 18:00, schrieb Alexandre Patry:
On 2013-12-17 11:56, Peter Klügl wrote:
Hi,

some of the rules behave as expected. It's maybe a bit counterintuitive,
but I do not see a way to improve it. I will fix the rest in the next
few days.

An example:

(SPECIAL ALL* SPECIAL) {-> MARK(TMP_GenericAllSTAR)};

ALL is a parent type of SPECIAL and * is a greedy quantifier. Therefore ALL matches on all annotations and also on the SPECIAL annotations until
the end of the document. Then, there is no SPECIAL annotation left to
match and the rule fails.
Using a reluctant quantifier should work as expected for this specific
case case:

(SPECIAL ALL*? SPECIAL) {-> MARK(TMP_GenericAllSTAR)};


Just another comment that has nothing to do with the problem :-)

The rule is of course somewhat "slow".

I would rather rewrite it in:

(SPECIAL # SPECIAL) {-> MARK(TMP_GenericAllSTAR)};

Here, the wildcard searches for the next SPECIAL annotation in the index
and has not to match on each token until the next SPECIAL annotation.
Nice trick, thanks for sharing!

Is there a cookbook somewhere where all these tricks are stored?


Nope, but I am thinking for some time about adding another chapter in the documentation for such stuff, e.g., how to easily include DKPro components in Ruta scripts or how to apply Ruta scripts for transformation-based part-of-speech tagging.

However, no time...

Best,

Peter


Reply via email to