I’m sorry, but I’m a little lost…
I tried the example you suggested, after adapting it to Java and it works
beautifully:
private static void fromGroovy() throws UIMAException {
// Create document to be analyzed
JCas document = JCasFactory.createJCasFromPath(
"http://svn.apache.org/repos/asf/opennlp/tags/opennlp-1.6.0-rc6/opennlp-uima/descriptors/TypeSystem.xml");
document.setDocumentText("The quick brown fox jumps over the
lazy dog. Later, he jumped over the moon.");
document.setDocumentLanguage("en");
Type tokenType =
document.getTypeSystem().getType("opennlp.uima.Token");
Type sentenceType =
document.getTypeSystem().getType("opennlp.uima.Sentence");
Feature posFeature = tokenType.getFeatureByBaseName("pos");
System.out.println(sentenceType.getName());
AnalysisEngineDescription sentenceDetector =
AnalysisEngineFactory.createEngineDescription(
SentenceDetector.class,
UimaUtil.SENTENCE_TYPE_PARAMETER, sentenceType.getName());
// Configure sentence detector
ExternalResourceFactory.createDependencyAndBind(sentenceDetector,
UimaUtil.MODEL_PARAMETER,
SentenceModelResourceImpl.class,
"http://opennlp.sourceforge.net/models-1.5/en-sent.bin");
// Configure tokenizer
AnalysisEngineDescription tokenizer =
AnalysisEngineFactory.createEngineDescription(Tokenizer.class,
UimaUtil.TOKEN_TYPE_PARAMETER,
tokenType.getName(), UimaUtil.SENTENCE_TYPE_PARAMETER,
sentenceType.getName());
ExternalResourceFactory.createDependencyAndBind(tokenizer,
UimaUtil.MODEL_PARAMETER,
TokenizerModelResourceImpl.class,
"http://opennlp.sourceforge.net/models-1.5/en-token.bin");
// Configure part-of-speech tagger
AnalysisEngineDescription posTagger =
AnalysisEngineFactory.createEngineDescription(POSTagger.class,
UimaUtil.TOKEN_TYPE_PARAMETER,
tokenType.getName(), UimaUtil.SENTENCE_TYPE_PARAMETER,
sentenceType.getName(),
UimaUtil.POS_FEATURE_PARAMETER, posFeature.getShortName());
ExternalResourceFactory.createDependencyAndBind(posTagger,
UimaUtil.MODEL_PARAMETER, POSModelResourceImpl.class,
"http://opennlp.sourceforge.net/models-1.5/en-pos-perceptron.bin");
// Run pipeline
SimplePipeline.runPipeline(document, sentenceDetector,
tokenizer, posTagger);
// Display results
for (AnnotationFS sentence : CasUtil.select(document.getCas(),
sentenceType)) {
for (AnnotationFS token :
CasUtil.selectCovered(tokenType, sentence)) {
System.out.println(token.getCoveredText() + " "
+ token.getFeatureValueAsString(posFeature));
}
}
}
The issue is that I would like to use a CollectionReader instead of creating
the document. Something like this:
private static void mine() throws UIMAException, IOException {
CollectionReaderDescription reader = CollectionReaderFactory
.createReaderDescription(AbstractCollectionReader.class,
AbstractCollectionReader.PARAM_VALUE, 33);
AnalysisEngineDescription tokenizer =
AnalysisEngineFactory.createEngineDescription(SimpleTokenizer.class,
UimaUtil.TOKEN_TYPE_PARAMETER,
"pt.ipb.pos.type.Token", UimaUtil.SENTENCE_TYPE_PARAMETER,
"pt.ipb.pos.type.Sentence");
// AnalysisEngineDescription histogramer =
AnalysisEngineFactory.createEngineDescription(HistogramAnnotator.class);
AnalysisEngineDescription ae =
AnalysisEngineFactory.createEngineDescription(GetStartedQuickAE.class);
SimplePipeline.runPipeline(reader, tokenizer, ae);
}
Should I initialise the type system to the OpenNLP one? How can I do that?
It is possible to use a custom type system as in the code above
(“pt.ipb.pos.type.Token”)? How?
Sorry about this probably naive questions, but I confess I must be missing
something basic…
Cheers,
/rp
> On 19 Apr 2016, at 15:18, Richard Eckart de Castilho <[email protected]> wrote:
>
> Short answer: no :)
>
> Longer answer: You don't seem to be using the actual OpenNLP UIMA components.
>
> If you want an example (in Groovy, but should be trivial to transfer to Java)
> on how to use the OpenNLP UIMA components with uimaFIT, see here:
>
> https://cwiki.apache.org/confluence/display/UIMA/uimaFIT+and+Groovy
>
> Cheers,
>
> -- Richard
>