Re: UIMA FIT Pipeline with OpenNLP tokeniser

Rui Lopes Tue, 19 Apr 2016 14:14:34 -0700

I’m sorry, but I’m a little lost…

I tried the example you suggested, after adapting it to Java and it works 
beautifully:


        private static void fromGroovy() throws UIMAException {
                
                // Create document to be analyzed
                JCas document = JCasFactory.createJCasFromPath(
                                
"http://svn.apache.org/repos/asf/opennlp/tags/opennlp-1.6.0-rc6/opennlp-uima/descriptors/TypeSystem.xml";);

                document.setDocumentText("The quick brown fox jumps over the 
lazy dog. Later, he jumped over the moon.");
                document.setDocumentLanguage("en");

                Type tokenType = 
document.getTypeSystem().getType("opennlp.uima.Token");
                Type sentenceType = 
document.getTypeSystem().getType("opennlp.uima.Sentence");
                Feature posFeature = tokenType.getFeatureByBaseName("pos");
                
                System.out.println(sentenceType.getName());

                AnalysisEngineDescription sentenceDetector = 
AnalysisEngineFactory.createEngineDescription(
                                SentenceDetector.class, 
UimaUtil.SENTENCE_TYPE_PARAMETER, sentenceType.getName());

                // Configure sentence detector
                
ExternalResourceFactory.createDependencyAndBind(sentenceDetector, 
UimaUtil.MODEL_PARAMETER,
                                SentenceModelResourceImpl.class, 
"http://opennlp.sourceforge.net/models-1.5/en-sent.bin";);

                // Configure tokenizer
                AnalysisEngineDescription tokenizer = 
AnalysisEngineFactory.createEngineDescription(Tokenizer.class,
                                UimaUtil.TOKEN_TYPE_PARAMETER, 
tokenType.getName(), UimaUtil.SENTENCE_TYPE_PARAMETER,
                                sentenceType.getName());

                ExternalResourceFactory.createDependencyAndBind(tokenizer, 
UimaUtil.MODEL_PARAMETER,
                                TokenizerModelResourceImpl.class, 
"http://opennlp.sourceforge.net/models-1.5/en-token.bin";);

                // Configure part-of-speech tagger
                AnalysisEngineDescription posTagger = 
AnalysisEngineFactory.createEngineDescription(POSTagger.class,
                                UimaUtil.TOKEN_TYPE_PARAMETER, 
tokenType.getName(), UimaUtil.SENTENCE_TYPE_PARAMETER,
                                sentenceType.getName(), 
UimaUtil.POS_FEATURE_PARAMETER, posFeature.getShortName());

                ExternalResourceFactory.createDependencyAndBind(posTagger, 
UimaUtil.MODEL_PARAMETER, POSModelResourceImpl.class,
                                
"http://opennlp.sourceforge.net/models-1.5/en-pos-perceptron.bin";);

                // Run pipeline
                SimplePipeline.runPipeline(document, sentenceDetector, 
tokenizer, posTagger);

                // Display results
                for (AnnotationFS sentence : CasUtil.select(document.getCas(), 
sentenceType)) {
                        for (AnnotationFS token : 
CasUtil.selectCovered(tokenType, sentence)) {
                                System.out.println(token.getCoveredText() + " " 
+ token.getFeatureValueAsString(posFeature));
                        }
                }

        }



The issue is that I would like to use a CollectionReader instead of creating 
the document. Something like this:

        private static void mine() throws UIMAException, IOException {

                CollectionReaderDescription reader = CollectionReaderFactory
                                
.createReaderDescription(AbstractCollectionReader.class, 
AbstractCollectionReader.PARAM_VALUE, 33);

                AnalysisEngineDescription tokenizer = 
AnalysisEngineFactory.createEngineDescription(SimpleTokenizer.class,
                                UimaUtil.TOKEN_TYPE_PARAMETER, 
"pt.ipb.pos.type.Token", UimaUtil.SENTENCE_TYPE_PARAMETER,
                                "pt.ipb.pos.type.Sentence");

                // AnalysisEngineDescription histogramer = 
AnalysisEngineFactory.createEngineDescription(HistogramAnnotator.class);

                AnalysisEngineDescription ae = 
AnalysisEngineFactory.createEngineDescription(GetStartedQuickAE.class);

                SimplePipeline.runPipeline(reader, tokenizer, ae);

        }


Should I initialise the type system to the OpenNLP one? How can I do that?

It is possible to use a custom type system as in the code above 
(“pt.ipb.pos.type.Token”)? How?

Sorry about this probably naive questions, but I confess I must be missing 
something basic…

Cheers,

/rp



> On 19 Apr 2016, at 15:18, Richard Eckart de Castilho <[email protected]> wrote:
> 
> Short answer: no :)
> 
> Longer answer: You don't seem to be using the actual OpenNLP UIMA components.
> 
> If you want an example (in Groovy, but should be trivial to transfer to Java)
> on how to use the OpenNLP UIMA components with uimaFIT, see here:
> 
>  https://cwiki.apache.org/confluence/display/UIMA/uimaFIT+and+Groovy
> 
> Cheers,
> 
> -- Richard
>

Re: UIMA FIT Pipeline with OpenNLP tokeniser

Reply via email to