Perfect. Thank you. I'll open an issue and draft a patch.
-Original Message-
From: Tilman Hausherr [mailto:thaush...@t-online.de]
Sent: Thursday, September 21, 2017 4:21 PM
To: users@pdfbox.apache.org
Subject: Re: tracking missing Unicode mappings?
The standard 14 fonts are c
The standard 14 fonts are cached, but these shouldn't bring any text
extraction trouble.
So all needed would be a map as described for the PDFont type.
Now how to access the fonts... if you grab the TextPosition objects in
an extension of PDFTextStripper (e.g. in the showGlyph method) you
co
All,
How much effort would it be to track/calculate a ratio of characters with
missing Unicode mappings to those with mappings for a given page? It would be
neat after trying to extract text from a page to be able to tell how many
characters are lost. We could use this info on Tika to determi
3 matches
Mail list logo