Re: UIMA FIT Pipeline with OpenNLP tokeniser

Rui Lopes Wed, 20 Apr 2016 01:15:07 -0700

Hi Raj,

Got it working now!
After trying with success with the OpenNLP type system, I switched to mine and 
it worked as you suggested.


Moreover, the SentenceDetector advice was also valuable!

Thanks a lot!

Cheers,

/rp


> On 20 Apr 2016, at 06:03, Raj kiran <[email protected]> wrote:
> 
> Sorry i thought you already added the new types. you can add your custom
> type by defining your own type system. Its actually simple, see the
> following link for details
> https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.typesystem
> 
> 
> Basically you have to add types.txt (containing location of type system
> xmls) . you can refer OpenNLP type system xml for adding new types for
> sentence and token. for example
> <typeDescription>
> <name>pt.ipb.pos.type.Token</name>
> <supertypeName>uima.tcas.Annotation
> </supertypeName>
> <features>
> <featureDescription>
> <name>pos</name>
> <description>Part of speech</description>
> <rangeTypeName>uima.cas.String</rangeTypeName>
> </featureDescription>
> </features>
> </typeDescription>
> 
> Also, In case of missing type some exception should have been thrown. So,
> you may have to check your collection reader code. A sample collection
> reader is available in uima fit examples in source. You can start with
> document approach and once everything is working you can test collection
> reader approach.
> 
> 
> Regards,
> Raj
> 
> 
> On Wed, Apr 20, 2016 at 2:45 AM, Rui Lopes <[email protected]> wrote:
> 
>> Thank you, Raj!
>> 
>> I tried it but no success… the Annotations keep being only one.
>> Should it be related to the type system?
>> 
>> Cheers,
>> 
>> /rp
>> 
>> 
>>> On 19 Apr 2016, at 17:38, Raj kiran <[email protected]> wrote:
>>> 
>>> I believe you are missing the SentenceDetector engine in the pipeline .
>> It
>>> should be added before SimpleTokenizer .
>>> 
>>> SimpleTokenizer iterates over sentences in the text/document and in
>> absence
>>> of sentence annotation, tokenizer fails to add any tokens to cas.
>>> 
>>> Hope it helps.
>>> 
>>> Regards,
>>> Raj
>>> 
>>> On Tue, Apr 19, 2016 at 7:48 PM, Richard Eckart de Castilho <
>> [email protected]>
>>> wrote:
>>> 
>>>> Short answer: no :)
>>>> 
>>>> Longer answer: You don't seem to be using the actual OpenNLP UIMA
>>>> components.
>>>> 
>>>> If you want an example (in Groovy, but should be trivial to transfer to
>>>> Java)
>>>> on how to use the OpenNLP UIMA components with uimaFIT, see here:
>>>> 
>>>> https://cwiki.apache.org/confluence/display/UIMA/uimaFIT+and+Groovy
>>>> 
>>>> Cheers,
>>>> 
>>>> -- Richard
>>>> 
>>>>> On 19.04.2016, at 16:07, Rui Lopes <[email protected]> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I’m trying to use OpenNLP uima to build a very simple pipeline:
>>>>> 
>>>>> CollectionReaderDescription reader = CollectionReaderFactory
>>>>> 
>>>> .createReaderDescription(AbstractCollectionReader.class,
>>>> AbstractCollectionReader.PARAM_VALUE, 33);
>>>>> 
>>>>> AnalysisEngineDescription tokenizer =
>>>> AnalysisEngineFactory.createEngineDescription(SimpleTokenizer.class,
>>>>>                             "opennlp.uima.SentenceType",
>>>> "pt.ipb.pos.type.Sentence", "opennlp.uima.TokenType",
>>>>>                             "pt.ipb.pos.type.Token");
>>>>> 
>>>>> 
>>>>> AnalysisEngineDescription ae =
>>>> AnalysisEngineFactory.createEngineDescription(GetStartedQuickAE.class);
>>>>> 
>>>>> SimplePipeline.runPipeline(reader, tokenizer, ae);
>>>>> 
>>>>> 
>>>>> ------
>>>>> The GetStartedQuickAE just prints the Annotations:
>>>>> 
>>>>>     @Override
>>>>>     public void process(JCas jCas) throws
>>>> AnalysisEngineProcessException {
>>>>>             System.out.println(jCas.getDocumentText());
>>>>> 
>>>>>             for(Annotation a : jCas.getAnnotationIndex()) {
>>>>>                     System.out.println(a);
>>>>>             }
>>>>> 
>>>>>             System.out.println("Done");
>>>>> 
>>>>> 
>>>>>     }
>>>>> 
>>>>> 
>>>>> ———
>>>>> The output is:
>>>>> 
>>>>> 
>>>>> Apr 19, 2016 3:04:46 PM opennlp.uima.tokenize.AbstractTokenizer
>>>> initialize(71)
>>>>> INFO: Initializing the OpenNLP Simple Tokenizer annotator.
>>>>> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
>>>> getOptionalParameter(440)
>>>>> INFO: opennlp.uima.IsRemoveExistingAnnotations = not set
>>>>> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
>>>> getOptionalParameter(440)
>>>>> INFO: opennlp.uima.SentenceType = pt.ipb.pos.type.Sentence
>>>>> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
>>>> getOptionalParameter(440)
>>>>> INFO: opennlp.uima.TokenType = pt.ipb.pos.type.Token
>>>>> This article aims to observe the didactic action and its
>> epistemological
>>>> insertion in education trends as well as its role as a medium capable of
>>>> causing changes in this alignment. Main objective is the need to
>>>> consciously integrate between epistemology and education trends didactic
>>>> application. The methodological procedure trend the application relied
>> on
>>>> observations from years in which the subjects were given Cytology and
>>>> Histology in undergraduate courses. The results of observations point
>> to a
>>>> single procedure, with little clarity regarding the alignment
>> epistemology,
>>>> educational trends, teaching action. Associate art practice can provide
>> a
>>>> biological alternative capable of generating a position and "profitable
>>>> shifts" in epistemological and pedagogical articulating. Different
>>>> strategies need to be created to establish conditions that allow the
>>>> configuration of knowledge as a whole, while respecting cultural
>> diversity
>>>> in which knowledge is configured.
>>>>> DocumentAnnotation
>>>>> sofa: _InitialView
>>>>> begin: 0
>>>>> end: 969
>>>>> language: "x-unspecified"
>>>>> 
>>>>> Done
>>>>> 
>>>>> 
>>>>> There is only one Annotation? Does anyone knows why?
>>>>> 
>>>>> Thanks for any feedback!
>>>>> 
>>>>> All the best,
>>>>> 
>>>>> Rui Lopes
>>>>> 
>>>> 
>>>> 
>> 
>>

Re: UIMA FIT Pipeline with OpenNLP tokeniser

Reply via email to