Hi Raj, Got it working now! After trying with success with the OpenNLP type system, I switched to mine and it worked as you suggested.
Moreover, the SentenceDetector advice was also valuable! Thanks a lot! Cheers, /rp > On 20 Apr 2016, at 06:03, Raj kiran <[email protected]> wrote: > > Sorry i thought you already added the new types. you can add your custom > type by defining your own type system. Its actually simple, see the > following link for details > https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.typesystem > > > Basically you have to add types.txt (containing location of type system > xmls) . you can refer OpenNLP type system xml for adding new types for > sentence and token. for example > <typeDescription> > <name>pt.ipb.pos.type.Token</name> > <supertypeName>uima.tcas.Annotation > </supertypeName> > <features> > <featureDescription> > <name>pos</name> > <description>Part of speech</description> > <rangeTypeName>uima.cas.String</rangeTypeName> > </featureDescription> > </features> > </typeDescription> > > Also, In case of missing type some exception should have been thrown. So, > you may have to check your collection reader code. A sample collection > reader is available in uima fit examples in source. You can start with > document approach and once everything is working you can test collection > reader approach. > > > Regards, > Raj > > > On Wed, Apr 20, 2016 at 2:45 AM, Rui Lopes <[email protected]> wrote: > >> Thank you, Raj! >> >> I tried it but no success… the Annotations keep being only one. >> Should it be related to the type system? >> >> Cheers, >> >> /rp >> >> >>> On 19 Apr 2016, at 17:38, Raj kiran <[email protected]> wrote: >>> >>> I believe you are missing the SentenceDetector engine in the pipeline . >> It >>> should be added before SimpleTokenizer . >>> >>> SimpleTokenizer iterates over sentences in the text/document and in >> absence >>> of sentence annotation, tokenizer fails to add any tokens to cas. >>> >>> Hope it helps. >>> >>> Regards, >>> Raj >>> >>> On Tue, Apr 19, 2016 at 7:48 PM, Richard Eckart de Castilho < >> [email protected]> >>> wrote: >>> >>>> Short answer: no :) >>>> >>>> Longer answer: You don't seem to be using the actual OpenNLP UIMA >>>> components. >>>> >>>> If you want an example (in Groovy, but should be trivial to transfer to >>>> Java) >>>> on how to use the OpenNLP UIMA components with uimaFIT, see here: >>>> >>>> https://cwiki.apache.org/confluence/display/UIMA/uimaFIT+and+Groovy >>>> >>>> Cheers, >>>> >>>> -- Richard >>>> >>>>> On 19.04.2016, at 16:07, Rui Lopes <[email protected]> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> I’m trying to use OpenNLP uima to build a very simple pipeline: >>>>> >>>>> CollectionReaderDescription reader = CollectionReaderFactory >>>>> >>>> .createReaderDescription(AbstractCollectionReader.class, >>>> AbstractCollectionReader.PARAM_VALUE, 33); >>>>> >>>>> AnalysisEngineDescription tokenizer = >>>> AnalysisEngineFactory.createEngineDescription(SimpleTokenizer.class, >>>>> "opennlp.uima.SentenceType", >>>> "pt.ipb.pos.type.Sentence", "opennlp.uima.TokenType", >>>>> "pt.ipb.pos.type.Token"); >>>>> >>>>> >>>>> AnalysisEngineDescription ae = >>>> AnalysisEngineFactory.createEngineDescription(GetStartedQuickAE.class); >>>>> >>>>> SimplePipeline.runPipeline(reader, tokenizer, ae); >>>>> >>>>> >>>>> ------ >>>>> The GetStartedQuickAE just prints the Annotations: >>>>> >>>>> @Override >>>>> public void process(JCas jCas) throws >>>> AnalysisEngineProcessException { >>>>> System.out.println(jCas.getDocumentText()); >>>>> >>>>> for(Annotation a : jCas.getAnnotationIndex()) { >>>>> System.out.println(a); >>>>> } >>>>> >>>>> System.out.println("Done"); >>>>> >>>>> >>>>> } >>>>> >>>>> >>>>> ——— >>>>> The output is: >>>>> >>>>> >>>>> Apr 19, 2016 3:04:46 PM opennlp.uima.tokenize.AbstractTokenizer >>>> initialize(71) >>>>> INFO: Initializing the OpenNLP Simple Tokenizer annotator. >>>>> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil >>>> getOptionalParameter(440) >>>>> INFO: opennlp.uima.IsRemoveExistingAnnotations = not set >>>>> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil >>>> getOptionalParameter(440) >>>>> INFO: opennlp.uima.SentenceType = pt.ipb.pos.type.Sentence >>>>> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil >>>> getOptionalParameter(440) >>>>> INFO: opennlp.uima.TokenType = pt.ipb.pos.type.Token >>>>> This article aims to observe the didactic action and its >> epistemological >>>> insertion in education trends as well as its role as a medium capable of >>>> causing changes in this alignment. Main objective is the need to >>>> consciously integrate between epistemology and education trends didactic >>>> application. The methodological procedure trend the application relied >> on >>>> observations from years in which the subjects were given Cytology and >>>> Histology in undergraduate courses. The results of observations point >> to a >>>> single procedure, with little clarity regarding the alignment >> epistemology, >>>> educational trends, teaching action. Associate art practice can provide >> a >>>> biological alternative capable of generating a position and "profitable >>>> shifts" in epistemological and pedagogical articulating. Different >>>> strategies need to be created to establish conditions that allow the >>>> configuration of knowledge as a whole, while respecting cultural >> diversity >>>> in which knowledge is configured. >>>>> DocumentAnnotation >>>>> sofa: _InitialView >>>>> begin: 0 >>>>> end: 969 >>>>> language: "x-unspecified" >>>>> >>>>> Done >>>>> >>>>> >>>>> There is only one Annotation? Does anyone knows why? >>>>> >>>>> Thanks for any feedback! >>>>> >>>>> All the best, >>>>> >>>>> Rui Lopes >>>>> >>>> >>>> >> >>
