Hello there, > > I'm trying to find a way to determine the font-size used in a pdf document = > using pdfbox. > I understand how to get at the fonts using code like: > > List allPages =3D document.getDocumentCatalog().getAllPages(); > PDPage firstPage =3D (PDPage)allPages.get( 0 ); > > PDResources pdResources = firstPage.findResources(); > Map<String, PDFont> fonts = pdResources.getFonts(); >
PDResources#getFonts returns a map of Font programs that this PDF page could be using. A single Font program can be used to draw text in any size - it is rescaled (to 8 pt, 10 pt, 16 pt etc.) on demand. > however, this does not get you the font-size such as 12pt... etc. I tried: > > PDResources pdResources = firstPage.findResources(); > > Map<String, PDExtendedGraphicsState> graphicsStates = > pdResources.getGraphicsStates(); > > but this just returns null for graphicStates so my document must not have a= > graphics dictionary. IIRC, PDExtendedGraphicsState becomes initialized when a subclass of org.apache.pdfbox.util.PDFStreamEngine starts the rendering of a PDF page. It is incrementally updated as the rendering progresses, and is nullified again when the rendering completes. Except for very rare occasions, one shouldn't interact with PDExtendedGraphicsState directly. The font size and other font attributes can be queried from class org.apache.pdfbox.util.TextPosition. You can obtain TextPositions when you override the method #processTextPosition(TextPosition) in some subclass of PDFStreamEngine, such as org.apache.pdfbox.util.PDFTextStripper. PDFBox offers a special method for querying the average character width of a font, namely org.apache.pdfbox.pdmodel.font.PDFont#getAverageFontWidth. Please note that this value is returned in text units, which must be converted to display units before use. I've attached a small code snippet that should show how to do it. VR

