Re: PDFTextStripper strips copyright sign

Andreas Lehmkuehler Tue, 09 Apr 2019 08:35:48 -0700

Hi,

Am 09.04.19 um 14:55 schrieb Ronald Bergmann | DTAD AG:

Hello,
this is my first time ever to email to a mailing list so please excuse me if mycontribution does not match any standards.
Apache PDFBox seems to strip copyright signs when parsing PDFs to text and Iwonder why. When I open the PDF with any reader and copy the text I receive thecopyright sign. With PDFBox I get a white space character.
PDFTextStripper stripper =new PDFTextStripper(); String contents = 
stripper.getText(doc);

I use PDFBox 2.0.14 on jdk 1.8.
Is there any trick to get the copyright sign, is it a bug or is it not possibleto retrieve it for it's some magically drawn glyph?

Please upload the PDF in question to a sharehoster or something similar.Attachments are not allowed. Without the document it'll be hart to guess wantswrong.

Thanks in advance!

--


Andreas

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: users-h...@pdfbox.apache.org

Re: PDFTextStripper strips copyright sign

Reply via email to