Hi,
the exception indicates that there is an annotation in your CAS with
invalid offsets, e.g., the end is bigger than the document length. This
causes an StringIndexOutOfBoundsException when getCoveredText() is
called. (The stupid thing is that the getCoveredText() call in ruta that
causes the exception is probably not required at all.)
Debugging it in Eclipse can be a bit annoying since the UIMA debugging
support will most likely also throw an exception exactly for this
annotation. I would write an additional analysis engine that iterates
over all annotation and checks the validity their offsets. You can also
open the xmi file and search for an offset with 50275.
Best,
Peter
Am 07.10.2015 um 15:09 schrieb Kevin Cousot:
Hi all,
I ran a simple aggregate analysis engine on two pure-text corpora,
performing preprocessing operations such as tokenization, lemmatization,
POS-tagging and so on.
The second step is applying a RUTA script to the resulting .xmi files.
The RUTA script contains rules of the form :
(Token.partOfSpeech == "Det"
NominalPhrase{-> MARK(Cause)}
Token.lemma == "bloquer"
Token.partOfSpeech == "Det"
NominalPhrase{-> MARK(Effect)}){-> MARK(Causality)};
Everything works fine for the first corpus, yet the second fails.
As a UIMA newcomer, I have trouble understanding the situation.
Could someone provide insight regarding this issue ?
Full stack is available at the end of this message, please feel free to
ask for more informations.
Thank you,
Kevin.
oct. 07, 2015 2:08:02 PM
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl
callAnalysisComponentProcess(417)
GRAVE: Exception occurred
org.apache.uima.analysis_engine.AnalysisEngineProcessException:
Annotator processing failed.
at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:547)
at
org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:385)
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:308)
at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:269)
at
org.apache.uima.ruta.ide.launching.RutaLauncher.processFile(RutaLauncher.java:169)
at
org.apache.uima.ruta.ide.launching.RutaLauncher.main(RutaLauncher.java:130)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out
of range: 50275
at java.lang.String.substring(String.java:1950)
at
org.apache.uima.jcas.tcas.Annotation.getCoveredText(Annotation.java:122)
at
org.apache.uima.ruta.expression.feature.FeatureMatchExpression.checkFeatureValue(FeatureMatchExpression.java:121)
at
org.apache.uima.ruta.expression.feature.FeatureMatchExpression.checkFeatureValue(FeatureMatchExpression.java:84)
at
org.apache.uima.ruta.rule.RutaTypeMatcher.checkFeature(RutaTypeMatcher.java:227)
at
org.apache.uima.ruta.rule.RutaTypeMatcher.match(RutaTypeMatcher.java:196)
at
org.apache.uima.ruta.rule.RutaRuleElement.doMatch(RutaRuleElement.java:368)
at
org.apache.uima.ruta.rule.RutaRuleElement.startMatch(RutaRuleElement.java:73)
at
org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:84)
at
org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:74)
at
org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:74)
at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:47)
at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:40)
at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:29)
at org.apache.uima.ruta.RutaScriptBlock.apply(RutaScriptBlock.java:63)
at org.apache.uima.ruta.RutaModule.apply(RutaModule.java:48)
at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:545)
... 6 more
Exception in thread "main"
org.apache.uima.analysis_engine.AnalysisEngineProcessException:
Annotator processing failed.
at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:547)
at
org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:385)
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:308)
at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:269)
at
org.apache.uima.ruta.ide.launching.RutaLauncher.processFile(RutaLauncher.java:169)
at
org.apache.uima.ruta.ide.launching.RutaLauncher.main(RutaLauncher.java:130)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out
of range: 50275
at java.lang.String.substring(String.java:1950)
at
org.apache.uima.jcas.tcas.Annotation.getCoveredText(Annotation.java:122)
at
org.apache.uima.ruta.expression.feature.FeatureMatchExpression.checkFeatureValue(FeatureMatchExpression.java:121)
at
org.apache.uima.ruta.expression.feature.FeatureMatchExpression.checkFeatureValue(FeatureMatchExpression.java:84)
at
org.apache.uima.ruta.rule.RutaTypeMatcher.checkFeature(RutaTypeMatcher.java:227)
at
org.apache.uima.ruta.rule.RutaTypeMatcher.match(RutaTypeMatcher.java:196)
at
org.apache.uima.ruta.rule.RutaRuleElement.doMatch(RutaRuleElement.java:368)
at
org.apache.uima.ruta.rule.RutaRuleElement.startMatch(RutaRuleElement.java:73)
at
org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:84)
at
org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:74)
at
org.apache.uima.ruta.rule.ComposedRuleElement.startMatch(ComposedRuleElement.java:74)
at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:47)
at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:40)
at org.apache.uima.ruta.rule.RutaRule.apply(RutaRule.java:29)
at org.apache.uima.ruta.RutaScriptBlock.apply(RutaScriptBlock.java:63)
at org.apache.uima.ruta.RutaModule.apply(RutaModule.java:48)
at org.apache.uima.ruta.engine.RutaEngine.process(RutaEngine.java:545)
... 6 more