Am 08.08.2016 um 23:45 schrieb Melanie Freed:
Hi.  I'm using pdfbox-2.0.2 and am having trouble getting the height of
extracted text from a PDF with Type 3 fonts.

I've been able to successfully get the height for Type 1 fonts by
overriding the writeString function in the PDFTextStripper class and using
the maximum font size in points as the height:

     float height = 0f;
     for (TextPosition textPosition : textPositions)
     {
         height = Math.max(height, textPosition.getFontSizeInPt());
     }

But this doesn't work for Type 3 fonts since they don't use sizes in the
same way.  I tried to use the bounding box like this:

     PDFont font_obj = textPositions.get(0).getFont();
     BoundingBox bbox = font_obj.getBoundingBox();
     float height = bbox.getHeight();

But the results aren't what I would expect.  For example, when I run it on
a document with a Type 1 font, I get a value of 7.0 as the font size in
points (using the first method) and the second method gives me a value of
1156.0.

Am I missing some kind of conversion from units of the bounding box to
points?  Or just approaching this problem in the wrong way?

Please have a look at the DrawPrintTextLocations example in the source code download, this has a solution for (some) type 3 fonts.

at.concatenate(font.getFontMatrix().createAffineTransform());
        if (font instanceof PDType3Font)
        {
            PDType3Font t3Font = (PDType3Font) font;
            PDType3CharProc charProc = t3Font.getCharProc(code);
            if (charProc != null)
            {
                PDRectangle glyphBBox = charProc.getGlyphBBox();
                if (glyphBBox != null)
                {
                    path = glyphBBox.toGeneralPath();
                }
            }
        }

later, at.createTransformedShape(path.getBounds2D()) gets you the bounds.

If the above doesn't make sense, just run the full programme and see whether it draws cyan bounds around your type 3 font glyphs.


It may not always work, because some type3 charprocs don't have a bounding box.

Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to