Re: Ruta 2.0.2 Textruler problems

Sondes Bannour Mon, 03 Jun 2013 01:35:45 -0700

Hi peter,

Yes actually i added the feature functionality (not really in a proper way
:) ) and i kept the default learning configuration (except that i put my
postag roottype instead of the default one). Everything works well except
that rules with FEATURE (not only Lemma but all rules that contain FEATURE)
don't apply well (always p=0 and n=0). Lemma is not a target type, it is in
fact an input annotation (an annotation that assigns the lemmatised form to
words). There are no exceptions and no errors (have a look at my attached
console file).


It's really hard to give a simple example because i am working on the
source code of textruler itself (i only added a class named
TextRulerAnnotationWithFeatures that inherits from TextRulerAnnotation) and
made whisk work with it (of course i modified the class whiskRuleItem to
add the FEATURE thing), i can send it to you if you want). I didn't change
the way rules are learnt or applied, only their expressions.

PS: I am a little stressed because i am doing this work for my phd :)

Cheers,
Sondes

TypeSystem.MBTO00001402
Load INPUT XMI file: BTID-30038.txt.xmi
Processing... OK
Load INPUT XMI file: BTID-50170.txt.xmi
Processing... OK
Load INPUT XMI file: BTID-30040.txt.xmi
Processing... OK
Load INPUT XMI file: BTID-30031.txt.xmi
Processing... OK
Load INPUT XMI file: BTID-10084.txt.xmi
Processing... OK
Load INPUT XMI file: BTID-10080.txt.xmi
Processing... OK
loading AE...
Error: Some Slot- or Helper-Types were not found in TypeSystem: 
Finding documents...
found document XMI file: processed_BTID-50170.txt.xmi
found document XMI file: processed_BTID-10080.txt.xmi
found document XMI file: processed_BTID-10084.txt.xmi
found document XMI file: processed_BTID-30038.txt.xmi
found document XMI file: processed_BTID-30031.txt.xmi
found document XMI file: processed_BTID-30040.txt.xmi
Starting...
Creating examples...
Creating new rule from seed...
Creating new rule: anchoring...
base1: POSTag{FEATURE("postag","NN")->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE;
CACHE HIT !
        base1: p=0; n=0 --> laplacian = 1.0
        base2: p=0; n=0 --> laplacian = 1.0
CACHE HIT !
base1: Term{FEATURE("lemma","human")->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE;
CACHE HIT !
        base1: p=0; n=0 --> laplacian = 1.0
        base2: p=0; n=0 --> laplacian = 1.0
CACHE HIT !
base1: Token{->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE;
CACHE HIT !
        base1: p=10; n=1215 --> laplacian = 0.9918433931484503
        base2: p=0; n=0 --> laplacian = 1.0
CACHE HIT !
base1: Lemma{FEATURE("lemma","human")->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE;
CACHE HIT !
        base1: p=0; n=0 --> laplacian = 1.0
        base2: p=0; n=0 --> laplacian = 1.0
CACHE HIT !
base1: SW{REGEXP("human")->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE;
CACHE HIT !
        base1: p=5; n=0 --> laplacian = 0.16666666666666666
        base2: p=0; n=0 --> laplacian = 1.0
BBBBBBBBBBBBBEEEEEEEEEEEEEEEEEEESSSSSSSSSSSSSSTTTTTTTTTTTTTTTTT: 
SW{REGEXP("human")->MARKONCE(MBTO00001402)};
Creating new rule: extending...
----------------------------
FINAL RULE IS : SW{REGEXP("human")->MARKONCE(MBTO00001402)};
New Rule added...
Creating new rule from seed...
Creating new rule: anchoring...
CACHE HIT !
base1: POSTag{FEATURE("postag","NNS")->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE;
CACHE HIT !
        base1: p=0; n=0 --> laplacian = 1.0
        base2: p=0; n=0 --> laplacian = 1.0
CACHE HIT !
base1: Term{FEATURE("lemma","human")->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE;
CACHE HIT !
CACHE HIT !
        base1: p=0; n=0 --> laplacian = 1.0
        base2: p=0; n=0 --> laplacian = 1.0
CACHE HIT !
base1: Token{->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE;
CACHE HIT !
CACHE HIT !
        base1: p=10; n=1215 --> laplacian = 0.9918433931484503
        base2: p=0; n=0 --> laplacian = 1.0
CACHE HIT !
base1: Lemma{FEATURE("lemma","human")->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE;
CACHE HIT !
CACHE HIT !
        base1: p=0; n=0 --> laplacian = 1.0
        base2: p=0; n=0 --> laplacian = 1.0
CACHE HIT !
base1: SW{REGEXP("humans")->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE;
CACHE HIT !
        base1: p=4; n=1 --> laplacian = 0.3333333333333333
        base2: p=0; n=0 --> laplacian = 1.0
BBBBBBBBBBBBBEEEEEEEEEEEEEEEEEEESSSSSSSSSSSSSSTTTTTTTTTTTTTTTTT: 
SW{REGEXP("humans")->MARKONCE(MBTO00001402)};
Creating new rule: extending...
Round 2.0 - Testing 15 rules...  - uncovered examples: 5 / 10 ; cs=8
Testing 15 rules on training set...
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} POSTag{FEATURE("postag","NNS")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} SPACE;
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? 
POSTag{FEATURE("postag","CC")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? SPACE;
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? 
POSTag{FEATURE("postag","NN")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? 
Term{FEATURE("lemma","sheep")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? Token;
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? 
Lemma{FEATURE("lemma","sheep")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? MBTO00000255;
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? SW{REGEXP("sheep")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? 
Lemma{FEATURE("lemma","and")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? SW{REGEXP("and")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} Term{FEATURE("lemma","human")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} Token;
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} Lemma{FEATURE("lemma","human")};
Same Laplacian! So prefer more general rule!
Same Laplacian! So prefer more general rule!
----------------------------
BEST EXTENSION IS: SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? Token;
Laplacian: 0.3333333333333333    ; p=4; n=1
----------------------------
FINAL RULE IS : SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? Token;
Creating new rule from seed...
Creating new rule: anchoring...
CACHE HIT !
base1: POSTag{FEATURE("postag","NNS")->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE;
CACHE HIT !
CACHE HIT !
        base1: p=0; n=0 --> laplacian = 1.0
        base2: p=0; n=0 --> laplacian = 1.0
CACHE HIT !
base1: Term{FEATURE("lemma","human")->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE;
CACHE HIT !
CACHE HIT !
        base1: p=0; n=0 --> laplacian = 1.0
        base2: p=0; n=0 --> laplacian = 1.0
CACHE HIT !
base1: Token{->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE;
CACHE HIT !
CACHE HIT !
        base1: p=10; n=1215 --> laplacian = 0.9918433931484503
        base2: p=0; n=0 --> laplacian = 1.0
CACHE HIT !
base1: Lemma{FEATURE("lemma","human")->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE;
CACHE HIT !
CACHE HIT !
        base1: p=0; n=0 --> laplacian = 1.0
        base2: p=0; n=0 --> laplacian = 1.0
CACHE HIT !
base1: SW{REGEXP("humans")->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE;
CACHE HIT !
CACHE HIT !
        base1: p=4; n=1 --> laplacian = 0.3333333333333333
        base2: p=0; n=0 --> laplacian = 1.0
BBBBBBBBBBBBBEEEEEEEEEEEEEEEEEEESSSSSSSSSSSSSSTTTTTTTTTTTTTTTTT: 
SW{REGEXP("humans")->MARKONCE(MBTO00001402)};
Creating new rule: extending...
Round 3.0 - Testing 18 rules...  - uncovered examples: 5 / 10 ; cs=23
Testing 18 rules on training set...
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} POSTag{FEATURE("postag","NNS")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} SPACE;
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? 
POSTag{FEATURE("postag","CC")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? SPACE;
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? 
POSTag{FEATURE("postag","NNS")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? 
POSTag{FEATURE("postag",",")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? Token;
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? Lemma{FEATURE("lemma",",")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? COMMA;
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? 
Term{FEATURE("lemma","animal")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? 
Lemma{FEATURE("lemma","animal")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? MBTO00001660;
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? SW{REGEXP("animals")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? 
Lemma{FEATURE("lemma","and")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? SW{REGEXP("and")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} Term{FEATURE("lemma","human")};
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} Token;
SW{REGEXP("humans")->MARKONCE(MBTO00001402)} Lemma{FEATURE("lemma","human")};
CACHE HIT !
CACHE HIT !
CACHE HIT !
CACHE HIT !
CACHE HIT !
CACHE HIT !
CACHE HIT !
CACHE HIT !
CACHE HIT !
CACHE HIT !
----------------------------
BEST EXTENSION IS: SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? COMMA;
Laplacian: 0.25    ; p=3; n=0
----------------------------
FINAL RULE IS : SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? COMMA;
New Rule added...
Creating new rule from seed...
Creating new rule: anchoring...
base1: POSTag{FEATURE("postag","NN")->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} PERIOD;
CACHE HIT !
CACHE HIT !
        base1: p=0; n=0 --> laplacian = 1.0
        base2: p=1; n=1192 --> laplacian = 0.9991624790619765
CACHE HIT !
CACHE HIT !
CACHE HIT !
CACHE HIT !
base1: Term{FEATURE("lemma","consumer")->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} PERIOD;
CACHE HIT !
        base1: p=0; n=0 --> laplacian = 1.0
        base2: p=1; n=1192 --> laplacian = 0.9991624790619765
CACHE HIT !
CACHE HIT !
CACHE HIT !
CACHE HIT !
base1: Token{->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} PERIOD;
CACHE HIT !
CACHE HIT !
        base1: p=10; n=1215 --> laplacian = 0.9918433931484503
        base2: p=1; n=1192 --> laplacian = 0.9991624790619765
CACHE HIT !
CACHE HIT !
CACHE HIT !
CACHE HIT !
base1: Lemma{FEATURE("lemma","consumer")->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} PERIOD;
CACHE HIT !
        base1: p=0; n=0 --> laplacian = 1.0
        base2: p=1; n=1192 --> laplacian = 0.9991624790619765
CACHE HIT !
CACHE HIT !
CACHE HIT !
CACHE HIT !
base1: SW{REGEXP("consumer")->MARKONCE(MBTO00001402)};
base2: ALL*?{->MARKONCE(MBTO00001402)} PERIOD;
CACHE HIT !
        base1: p=1; n=0 --> laplacian = 0.5
        base2: p=1; n=1192 --> laplacian = 0.9991624790619765
BBBBBBBBBBBBBEEEEEEEEEEEEEEEEEEESSSSSSSSSSSSSSTTTTTTTTTTTTTTTTT: 
SW{REGEXP("consumer")->MARKONCE(MBTO00001402)};
Creating new rule: extending...
----------------------------
FINAL RULE IS : SW{REGEXP("consumer")->MARKONCE(MBTO00001402)};
New Rule added...
Done

Re: Ruta 2.0.2 Textruler problems

Reply via email to