Am 28.04.2016 um 19:39 schrieb Kevin Ternes:
So I have a bunch of source PDFs that I use PDFBox 2.0.0 to fill out and
sometimes edit.
Specifically, for certain business cases I remove or update the text "(signed by
Named Insured)".
I edit using a method similar to the one over on SourceForge,
https://stackoverflow.com/questions/35420609/pdfbox-2-0-rc3-find-and-replace-text
See https://pdfbox.apache.org/2.0/migration.html "Why was the
ReplaceText example removed?"
What you could do instead is to draw a blank rectangle and put your text
on top. However the old text would still exist in text extraction.
Tilman
However, if a PDF gets edited by _Acrobat_ and the change is, for example, "(Signed
by Named Insured.)" where the S is capitalized and a period is inserted, the method
will no longer be able to find the target text even if I make the corresponding changes
in my method call.
Using PDFDebugger, I see that this:
0.699 0.676 0.639 0.747 k
/TT1 8 Tf
0.539 -10.877 Td
(\(signed by Named Insured\)) Tj
0.698 0.675 0.639 0.74 k
/TT1 9.96 Tf
-0.87 -27.115 Td
Has been changed to this:
0.699 0.676 0.639 0.747 k
/TT1 8 Tf
0.539 -10.877 Td
(\() Tj
/C2_2 8 Tf
(\0006) Tj
/TT1 8 Tf
1 0 0 1 113.02 381.017 Tm
(igned by Named Insured) Tj
/C2_2 8 Tf
87.164 0 Td
(\000\021) Tj
/TT1 8 Tf
(\)) Tj
0.698 0.675 0.639 0.74 k
/TT1 9.96 Tf
-96.573 -27.115 Td
And it is obvious why the method will no longer work.
Has anyone any suggestions on how to programmatically deal with this?
Or is there a setting in Acrobat that I can use to tell it to stop doing this
crap?!
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]