Re: UIMA FIT Pipeline with OpenNLP tokeniser

Raj kiran Tue, 19 Apr 2016 09:38:36 -0700

I believe you are missing the SentenceDetector engine in the pipeline . It
should be added before SimpleTokenizer .


SimpleTokenizer iterates over sentences in the text/document and in absence
of sentence annotation, tokenizer fails to add any tokens to cas.

Hope it helps.

Regards,
Raj

On Tue, Apr 19, 2016 at 7:48 PM, Richard Eckart de Castilho <[email protected]>
wrote:

> Short answer: no :)
>
> Longer answer: You don't seem to be using the actual OpenNLP UIMA
> components.
>
> If you want an example (in Groovy, but should be trivial to transfer to
> Java)
> on how to use the OpenNLP UIMA components with uimaFIT, see here:
>
>   https://cwiki.apache.org/confluence/display/UIMA/uimaFIT+and+Groovy
>
> Cheers,
>
> -- Richard
>
> > On 19.04.2016, at 16:07, Rui Lopes <[email protected]> wrote:
> >
> > Hi all,
> >
> > I’m trying to use OpenNLP uima to build a very simple pipeline:
> >
> > CollectionReaderDescription reader = CollectionReaderFactory
> >
>  .createReaderDescription(AbstractCollectionReader.class,
> AbstractCollectionReader.PARAM_VALUE, 33);
> >
> > AnalysisEngineDescription tokenizer =
> AnalysisEngineFactory.createEngineDescription(SimpleTokenizer.class,
> >                               "opennlp.uima.SentenceType",
> "pt.ipb.pos.type.Sentence", "opennlp.uima.TokenType",
> >                               "pt.ipb.pos.type.Token");
> >
> >
> > AnalysisEngineDescription ae =
> AnalysisEngineFactory.createEngineDescription(GetStartedQuickAE.class);
> >
> > SimplePipeline.runPipeline(reader, tokenizer, ae);
> >
> >
> > ------
> > The GetStartedQuickAE just prints the Annotations:
> >
> >       @Override
> >       public void process(JCas jCas) throws
> AnalysisEngineProcessException {
> >               System.out.println(jCas.getDocumentText());
> >
> >               for(Annotation a : jCas.getAnnotationIndex()) {
> >                       System.out.println(a);
> >               }
> >
> >               System.out.println("Done");
> >
> >
> >       }
> >
> >
> > ———
> > The output is:
> >
> >
> > Apr 19, 2016 3:04:46 PM opennlp.uima.tokenize.AbstractTokenizer
> initialize(71)
> > INFO: Initializing the OpenNLP Simple Tokenizer annotator.
> > Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
> getOptionalParameter(440)
> > INFO: opennlp.uima.IsRemoveExistingAnnotations = not set
> > Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
> getOptionalParameter(440)
> > INFO: opennlp.uima.SentenceType = pt.ipb.pos.type.Sentence
> > Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
> getOptionalParameter(440)
> > INFO: opennlp.uima.TokenType = pt.ipb.pos.type.Token
> > This article aims to observe the didactic action and its epistemological
> insertion in education trends as well as its role as a medium capable of
> causing changes in this alignment. Main objective is the need to
> consciously integrate between epistemology and education trends didactic
> application. The methodological procedure trend the application relied on
> observations from years in which the subjects were given Cytology and
> Histology in undergraduate courses. The results of observations point to a
> single procedure, with little clarity regarding the alignment epistemology,
> educational trends, teaching action. Associate art practice can provide a
> biological alternative capable of generating a position and "profitable
> shifts" in epistemological and pedagogical articulating. Different
> strategies need to be created to establish conditions that allow the
> configuration of knowledge as a whole, while respecting cultural diversity
> in which knowledge is configured.
> > DocumentAnnotation
> >   sofa: _InitialView
> >   begin: 0
> >   end: 969
> >   language: "x-unspecified"
> >
> > Done
> >
> >
> > There is only one Annotation? Does anyone knows why?
> >
> > Thanks for any feedback!
> >
> > All the best,
> >
> > Rui Lopes
> >
>
>

Re: UIMA FIT Pipeline with OpenNLP tokeniser

Reply via email to