Hi peter, Yes actually i added the feature functionality (not really in a proper way :) ) and i kept the default learning configuration (except that i put my postag roottype instead of the default one). Everything works well except that rules with FEATURE (not only Lemma but all rules that contain FEATURE) don't apply well (always p=0 and n=0). Lemma is not a target type, it is in fact an input annotation (an annotation that assigns the lemmatised form to words). There are no exceptions and no errors (have a look at my attached console file).
It's really hard to give a simple example because i am working on the source code of textruler itself (i only added a class named TextRulerAnnotationWithFeatures that inherits from TextRulerAnnotation) and made whisk work with it (of course i modified the class whiskRuleItem to add the FEATURE thing), i can send it to you if you want). I didn't change the way rules are learnt or applied, only their expressions. PS: I am a little stressed because i am doing this work for my phd :) Cheers, Sondes
TypeSystem.MBTO00001402 Load INPUT XMI file: BTID-30038.txt.xmi Processing... OK Load INPUT XMI file: BTID-50170.txt.xmi Processing... OK Load INPUT XMI file: BTID-30040.txt.xmi Processing... OK Load INPUT XMI file: BTID-30031.txt.xmi Processing... OK Load INPUT XMI file: BTID-10084.txt.xmi Processing... OK Load INPUT XMI file: BTID-10080.txt.xmi Processing... OK loading AE... Error: Some Slot- or Helper-Types were not found in TypeSystem: Finding documents... found document XMI file: processed_BTID-50170.txt.xmi found document XMI file: processed_BTID-10080.txt.xmi found document XMI file: processed_BTID-10084.txt.xmi found document XMI file: processed_BTID-30038.txt.xmi found document XMI file: processed_BTID-30031.txt.xmi found document XMI file: processed_BTID-30040.txt.xmi Starting... Creating examples... Creating new rule from seed... Creating new rule: anchoring... base1: POSTag{FEATURE("postag","NN")->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE; CACHE HIT ! base1: p=0; n=0 --> laplacian = 1.0 base2: p=0; n=0 --> laplacian = 1.0 CACHE HIT ! base1: Term{FEATURE("lemma","human")->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE; CACHE HIT ! base1: p=0; n=0 --> laplacian = 1.0 base2: p=0; n=0 --> laplacian = 1.0 CACHE HIT ! base1: Token{->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE; CACHE HIT ! base1: p=10; n=1215 --> laplacian = 0.9918433931484503 base2: p=0; n=0 --> laplacian = 1.0 CACHE HIT ! base1: Lemma{FEATURE("lemma","human")->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE; CACHE HIT ! base1: p=0; n=0 --> laplacian = 1.0 base2: p=0; n=0 --> laplacian = 1.0 CACHE HIT ! base1: SW{REGEXP("human")->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE; CACHE HIT ! base1: p=5; n=0 --> laplacian = 0.16666666666666666 base2: p=0; n=0 --> laplacian = 1.0 BBBBBBBBBBBBBEEEEEEEEEEEEEEEEEEESSSSSSSSSSSSSSTTTTTTTTTTTTTTTTT: SW{REGEXP("human")->MARKONCE(MBTO00001402)}; Creating new rule: extending... ---------------------------- FINAL RULE IS : SW{REGEXP("human")->MARKONCE(MBTO00001402)}; New Rule added... Creating new rule from seed... Creating new rule: anchoring... CACHE HIT ! base1: POSTag{FEATURE("postag","NNS")->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE; CACHE HIT ! base1: p=0; n=0 --> laplacian = 1.0 base2: p=0; n=0 --> laplacian = 1.0 CACHE HIT ! base1: Term{FEATURE("lemma","human")->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE; CACHE HIT ! CACHE HIT ! base1: p=0; n=0 --> laplacian = 1.0 base2: p=0; n=0 --> laplacian = 1.0 CACHE HIT ! base1: Token{->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE; CACHE HIT ! CACHE HIT ! base1: p=10; n=1215 --> laplacian = 0.9918433931484503 base2: p=0; n=0 --> laplacian = 1.0 CACHE HIT ! base1: Lemma{FEATURE("lemma","human")->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE; CACHE HIT ! CACHE HIT ! base1: p=0; n=0 --> laplacian = 1.0 base2: p=0; n=0 --> laplacian = 1.0 CACHE HIT ! base1: SW{REGEXP("humans")->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE; CACHE HIT ! base1: p=4; n=1 --> laplacian = 0.3333333333333333 base2: p=0; n=0 --> laplacian = 1.0 BBBBBBBBBBBBBEEEEEEEEEEEEEEEEEEESSSSSSSSSSSSSSTTTTTTTTTTTTTTTTT: SW{REGEXP("humans")->MARKONCE(MBTO00001402)}; Creating new rule: extending... Round 2.0 - Testing 15 rules... - uncovered examples: 5 / 10 ; cs=8 Testing 15 rules on training set... SW{REGEXP("humans")->MARKONCE(MBTO00001402)} POSTag{FEATURE("postag","NNS")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} SPACE; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? POSTag{FEATURE("postag","CC")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? SPACE; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? POSTag{FEATURE("postag","NN")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? Term{FEATURE("lemma","sheep")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? Token; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? Lemma{FEATURE("lemma","sheep")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? MBTO00000255; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? SW{REGEXP("sheep")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? Lemma{FEATURE("lemma","and")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? SW{REGEXP("and")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} Term{FEATURE("lemma","human")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} Token; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} Lemma{FEATURE("lemma","human")}; Same Laplacian! So prefer more general rule! Same Laplacian! So prefer more general rule! ---------------------------- BEST EXTENSION IS: SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? Token; Laplacian: 0.3333333333333333 ; p=4; n=1 ---------------------------- FINAL RULE IS : SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? Token; Creating new rule from seed... Creating new rule: anchoring... CACHE HIT ! base1: POSTag{FEATURE("postag","NNS")->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE; CACHE HIT ! CACHE HIT ! base1: p=0; n=0 --> laplacian = 1.0 base2: p=0; n=0 --> laplacian = 1.0 CACHE HIT ! base1: Term{FEATURE("lemma","human")->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE; CACHE HIT ! CACHE HIT ! base1: p=0; n=0 --> laplacian = 1.0 base2: p=0; n=0 --> laplacian = 1.0 CACHE HIT ! base1: Token{->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE; CACHE HIT ! CACHE HIT ! base1: p=10; n=1215 --> laplacian = 0.9918433931484503 base2: p=0; n=0 --> laplacian = 1.0 CACHE HIT ! base1: Lemma{FEATURE("lemma","human")->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE; CACHE HIT ! CACHE HIT ! base1: p=0; n=0 --> laplacian = 1.0 base2: p=0; n=0 --> laplacian = 1.0 CACHE HIT ! base1: SW{REGEXP("humans")->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} SPACE; CACHE HIT ! CACHE HIT ! base1: p=4; n=1 --> laplacian = 0.3333333333333333 base2: p=0; n=0 --> laplacian = 1.0 BBBBBBBBBBBBBEEEEEEEEEEEEEEEEEEESSSSSSSSSSSSSSTTTTTTTTTTTTTTTTT: SW{REGEXP("humans")->MARKONCE(MBTO00001402)}; Creating new rule: extending... Round 3.0 - Testing 18 rules... - uncovered examples: 5 / 10 ; cs=23 Testing 18 rules on training set... SW{REGEXP("humans")->MARKONCE(MBTO00001402)} POSTag{FEATURE("postag","NNS")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} SPACE; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? POSTag{FEATURE("postag","CC")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? SPACE; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? POSTag{FEATURE("postag","NNS")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? POSTag{FEATURE("postag",",")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? Token; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? Lemma{FEATURE("lemma",",")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? COMMA; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? Term{FEATURE("lemma","animal")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? Lemma{FEATURE("lemma","animal")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? MBTO00001660; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? SW{REGEXP("animals")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? Lemma{FEATURE("lemma","and")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? SW{REGEXP("and")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} Term{FEATURE("lemma","human")}; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} Token; SW{REGEXP("humans")->MARKONCE(MBTO00001402)} Lemma{FEATURE("lemma","human")}; CACHE HIT ! CACHE HIT ! CACHE HIT ! CACHE HIT ! CACHE HIT ! CACHE HIT ! CACHE HIT ! CACHE HIT ! CACHE HIT ! CACHE HIT ! ---------------------------- BEST EXTENSION IS: SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? COMMA; Laplacian: 0.25 ; p=3; n=0 ---------------------------- FINAL RULE IS : SW{REGEXP("humans")->MARKONCE(MBTO00001402)} ALL*? COMMA; New Rule added... Creating new rule from seed... Creating new rule: anchoring... base1: POSTag{FEATURE("postag","NN")->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} PERIOD; CACHE HIT ! CACHE HIT ! base1: p=0; n=0 --> laplacian = 1.0 base2: p=1; n=1192 --> laplacian = 0.9991624790619765 CACHE HIT ! CACHE HIT ! CACHE HIT ! CACHE HIT ! base1: Term{FEATURE("lemma","consumer")->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} PERIOD; CACHE HIT ! base1: p=0; n=0 --> laplacian = 1.0 base2: p=1; n=1192 --> laplacian = 0.9991624790619765 CACHE HIT ! CACHE HIT ! CACHE HIT ! CACHE HIT ! base1: Token{->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} PERIOD; CACHE HIT ! CACHE HIT ! base1: p=10; n=1215 --> laplacian = 0.9918433931484503 base2: p=1; n=1192 --> laplacian = 0.9991624790619765 CACHE HIT ! CACHE HIT ! CACHE HIT ! CACHE HIT ! base1: Lemma{FEATURE("lemma","consumer")->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} PERIOD; CACHE HIT ! base1: p=0; n=0 --> laplacian = 1.0 base2: p=1; n=1192 --> laplacian = 0.9991624790619765 CACHE HIT ! CACHE HIT ! CACHE HIT ! CACHE HIT ! base1: SW{REGEXP("consumer")->MARKONCE(MBTO00001402)}; base2: ALL*?{->MARKONCE(MBTO00001402)} PERIOD; CACHE HIT ! base1: p=1; n=0 --> laplacian = 0.5 base2: p=1; n=1192 --> laplacian = 0.9991624790619765 BBBBBBBBBBBBBEEEEEEEEEEEEEEEEEEESSSSSSSSSSSSSSTTTTTTTTTTTTTTTTT: SW{REGEXP("consumer")->MARKONCE(MBTO00001402)}; Creating new rule: extending... ---------------------------- FINAL RULE IS : SW{REGEXP("consumer")->MARKONCE(MBTO00001402)}; New Rule added... Done