Hi,
I wonder whether there is a recommended way to remove certain (J)Cas'
(i.e. documents) from a pipeline after reading.
The scenario in my case is that I use a standard reader
(BinaryCasReader) which returns many documents. I only want a subset of
these documents to be processed by the following pipeline (comprising a
segmenter, a writer and some other engines), subject to a certain value
in a custom annotation.

The initial intuition would be to use/implement a reader that only
selects those documents that fulfil the given condition. In my case that
would mean, however, that I'd need to implement a new Reader extending
the BinaryCasReader by the described functionality. From a high-level
view at least, this seems much more complicated than just removing
documents from the pipeline.
Can I avoid that effort somehow without breaking conventions?

Thanks!
Carsten

-- 
Carsten Schnober, M.Sc.
Doctoral Researcher
Ubiquitous Knowledge Processing (UKP Lab)
FB 20 Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone (0)6151 16-6227, room S2/02/B111
www.ukp.tu-darmstadt.de

Reply via email to