Hi Carsten, please see

https://github.com/m09/readability/blob/master/java/uima-corpus-creator/src/main/java/eu/crydee/readability/uima/corpuscreator/DictCreationPipeline.java#L200

for an example pipeline and

https://github.com/m09/readability/blob/master/java/uima-corpus-creator/src/main/java/eu/crydee/readability/uima/corpuscreator/ae/RevisionsFilterAE.java

for an example filter.

This uses uimafit so you'll have to translate it in UIMA terms but it
might be a starting point.

Cheers,
Hugo

On 11/21/2014 11:15 AM, Carsten Schnober wrote:
> Hi Sumit,
> Thanks for your suggestion, it seems like the proper way to go for my
> use case. However, I'm not too familiar with the UIMA internals, so
> could you point me to where or how I can set the dropCasOnException option?
> Thanks!
> Carsten
> 
> 
> Am 07.11.2014 um 10:19 schrieb Sumit Madan:
>> Hi Carsten,
>>
>> I had this experience too that a flow controller is not easy to build.
>> But may be you can use a workaroud. You can put a new AE in-between
>> (BinaryCasReader and Segmenter). This AE would throw an exception when a
>> (J)Cas doesn't fit your rules. With the UIMA options dropCasOnException
>> and ActionOnMaxError, UIMA can drop those (J)Cases and go further with
>> the wanted ones.
>>
>> Regards
>>   Sumit
>>
>> On 07/11/14 09:04, armin.weg...@bka.bund.de wrote:
>>> Hi Carsten,
>>>
>>> I've never used it, but according to the documentation you can do this
>>> with a  flow controller. The bad thing is, Richard told me a while ago
>>> that it is not so easy to build your own flow controller.
>>>
>>> Cheers,
>>> Armin
>>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Carsten Schnober [mailto:schno...@ukp.informatik.tu-darmstadt.de]
>>> Gesendet: Donnerstag, 6. November 2014 14:55
>>> An: user@uima.apache.org
>>> Betreff: Filter Cas from UIMA fit pipeline
>>>
>>> Hi,
>>> I wonder whether there is a recommended way to remove certain (J)Cas'
>>> (i.e. documents) from a pipeline after reading.
>>> The scenario in my case is that I use a standard reader
>>> (BinaryCasReader) which returns many documents. I only want a subset of
>>> these documents to be processed by the following pipeline (comprising a
>>> segmenter, a writer and some other engines), subject to a certain value
>>> in a custom annotation.
>>>
>>> The initial intuition would be to use/implement a reader that only
>>> selects those documents that fulfil the given condition. In my case that
>>> would mean, however, that I'd need to implement a new Reader extending
>>> the BinaryCasReader by the described functionality. From a high-level
>>> view at least, this seems much more complicated than just removing
>>> documents from the pipeline.
>>> Can I avoid that effort somehow without breaking conventions?
>>>
>>> Thanks!
>>> Carsten
>>>
>>
>>
> 

Reply via email to