as a follow-up of my own message: i implemented my
PDFTextStripperAndReplacerByArea in a non-elegant way (see below). it
could have been more elegant if 2 small changes in PDFStreamEngine were
made:
- add 'currentTokens' as a protected attribute of the class
- replace the tokenIterator by the snippet:
parser.parse();
currentTokens = parser.getTokens();
i don't grasp the pdfbox code enough to decide whether this would break
something else, but if one of the developers is reading this ...
dirk
On Sat, 2011-02-12 at 16:02 +0100, dirk ooms wrote:
> Hello,
>
> I need to do string replacement in a certain region of a pdf page. The
> different pieces of the puzzle to do this are present in pdfbox:
> - org.apache.pdfbox.examples.pdmodel.ReplaceString
> - org.apache.pdfbox.util.PDFTextStripperByArea
>
> I am confident this can be done, but I am struggling to do it in an
> elegant way (to optimize reuse of existing classes).
>
> The first thing I had to do for a replacement is accessing the
> COSString. This can be done by creating a variant of
> org.apache.pdfbox.util.operator.ShowText.
>
> The second thing to do is writing the changed tokens to the page stream,
> but I can not access them: the token iterator is in processSubStream of
> PDFStreamEngine. Overriding this method is not an option because it uses
> private attributes (without getter). Then I started creating my own
> version of PDFStreamEngine, but then I also had to create an own version
> of OperatorProcessor, etc...
>
> I am not familiar with the pdfbox code, so maybe someone can give me a
> tip on how to do this in an elegant way (or tell me it is not possible
> with the current PDFStreamEngine).
>
> cheers,
> dirk