Hello there,

>
> I'm trying to find a way to determine the font-size used in a pdf document =
> using pdfbox.
> I understand how to get at the fonts using  code like:
>
> List allPages =3D document.getDocumentCatalog().getAllPages();
> PDPage firstPage =3D (PDPage)allPages.get( 0 );
>
> PDResources pdResources = firstPage.findResources();
> Map<String, PDFont> fonts = pdResources.getFonts();
>

PDResources#getFonts returns a map of Font programs that this PDF page
could be using. A single Font program can be used to draw text in any
size - it is rescaled (to 8 pt, 10 pt, 16 pt etc.) on demand.

> however, this does not get you the font-size such as 12pt... etc. I tried:
>
> PDResources pdResources = firstPage.findResources();
>
> Map<String, PDExtendedGraphicsState> graphicsStates =
> pdResources.getGraphicsStates();
>
> but this just returns null for graphicStates so my document must not have a=
> graphics dictionary.

IIRC, PDExtendedGraphicsState becomes initialized when a subclass of
org.apache.pdfbox.util.PDFStreamEngine starts the rendering of a PDF
page. It is incrementally updated as the rendering progresses, and is
nullified again when the rendering completes. Except for very rare
occasions, one shouldn't interact with PDExtendedGraphicsState
directly.

The font size and other font attributes can be queried from class
org.apache.pdfbox.util.TextPosition. You can obtain TextPositions when
you override the method #processTextPosition(TextPosition) in some
subclass of PDFStreamEngine, such as
org.apache.pdfbox.util.PDFTextStripper.

PDFBox offers a special method for querying the average character
width of a font, namely
org.apache.pdfbox.pdmodel.font.PDFont#getAverageFontWidth. Please note
that this value is returned in text units, which must be converted to
display units before use. I've attached a small code snippet that
should show how to do it.


VR

Reply via email to