
I tried to understand the rules that you suggested and have a few questions 
(see below).
What we have (successfully) implemented so far is a set of rules that change 
the value of the stored string, in order to produce some kind of expression 
that is evaluated subsequently:
a) replace numbers: "eins" becomes "(1)", "zwei|zwan" becomes "(2)"...
b) replaced factors: "zig" becomes "*(10)", "hundert" becomes "*(100)".... and 
remove "and"
c) other ruta rules interpret the expression in chain-like order

a) "(3)millionen(2)tausend(4)hundert(1)und(2)zig"
b) "(3)*(1000000)(2)*(1000)(4)(100)(1)(20)"
c) "(3)*(1000000)(2)*(1000)(400)(21)" => "(3)*(1000000)(2)*(1000)(421)" => 
"(3000000)(2000)(421)" => "(3000000)(2421)" => "(3002421)"

However, we use replaceAll(string, pattern, patter) in all these 
transformations and fear that it might not be the optimal solution for UIMA 
Do you have any suggestion?

Here are the questions for your rules:
> Before you can apply the dictionaries, you need to split the RutaBasics  
> using some conjunction words in order to map the subword segments.
How exactly can I do that? I know there is SPLIT() but that can only split an 
on the basic of another inlaying one, or do I understand it wrong?
Because if I could split words then German agglutinated numbers would be no 
problem (since we have a working solution for English).

Is there a special reason, why you use 3 for 'thousand', when you use it with 
POW(10, x)? Intuitively I would just use 1000.

In your "combination with multipliers like 3 million"-rule (Rule 1), you shift 
the annotation to span over (1,4), should it not be (1,3)?

In Rule 1, is num{IS(NumericValue) )-> SHIFT(NumericValue,1,4)} just a 
different way of writing num:NumericValue{)-> SHIFT(NumericValue,1,4)}?

What exactly is the function of the NEAR() in your Rule 1? Is it there do match 
only "3", "3-Million" and "3-Million" but not "3-"?

I tried to play Rule 1 through in my head with "zweitausendeins" and 
This works good for the first example

(num{IS(NumericValue)-> SHIFT(NumericValue,1,4)}
//value = 2

  (Multiplicator{-> num.value = (2 * (POW(10,3)))}
//value = 2000
    add2:NumericValue?{-> num.value = (2000 + 1), UNMARK(add2)}));
//value = 2001

But fails for the second:

(num{IS(NumericValue)-> SHIFT(NumericValue,1,4)}
//value = 3

  (Multiplicator{-> num.value = (2 * (POW(10,6)))}
//value = 3000000
    add2:NumericValue?{-> num.value = (3000000 + 2), UNMARK(add2)})
//value = 3000002, after 1st iteration

  (Multiplicator{-> num.value = (3000002 * (POW(10,3)))}
//value = 3000002000
    add2:NumericValue?{-> num.value = (3000002000+ 1), UNMARK(add2)}));
//value = 3000002001

On 28.08.19, 13:48, "Peter Klügl" <peter.klu...@averbis.com> wrote:


    we (Averbis) have an annotator which does exactly what you describe, but
    unfortunetly I cannot share it.  However, I can tell that the annotator
    is almost completely implemented in Ruta and uses no Ruta language

    If you want to learn more about language extensions, then there are
    example projects in the Ruta trunk: ruta-core-ext and

    If you want to build the annotator with Ruta rules, I can help you
    create it.

    As a starting point you need some dictionaries (wordtables) for numbers
    (ein;1\neins;1\nzwei;2....) , exponents/multiplicators (tausend;3) and
    special characters (½). For German that's not too much, maybe one
    hundred entries overall is a good start.

    Before you can apply the dictionaries, you need to split the RutaBasics
    using some conjunction words in order to map the subword segments. You
    can do that with a simple regex rule:

    "und" -> ConjunctionFragment;

    Then, you can write some rules that combine numbers using additions,
    multiplications and exponents, e.g., something like:

    FOREACH(num, false) NumericValue{}{

            // combination with multipliers like 3 million
            (num{IS(NumericValue)-> SHIFT(NumericValue,1,4)}
    SPECIAL?{REGEXP("-"), NEAR(W,0,1,true)}
                    Multiplicator{-> num.value = (num.value * (POW(10,
                    add2:NumericValue?{-> num.value = (num.value +
    add2.value), UNMARK(add2)}

            // fünfundzwanzig
            (num{PARTOF(W)-> SHIFT(NumericValue,1,3)} ConjunctionFragment
    add:NumericValue.value!=0{PARTOF(W), IF((NumericValue.value%1) == 0) ->
                {-> num.value = (num.value + add.value)};


    At the end you get about 200 lines of Ruta ...



