Tilman Hausherr created PDFBOX-3038: ---------------------------------------
Summary: Text extraction shows glyphs with zero height Key: PDFBOX-3038 URL: https://issues.apache.org/jira/browse/PDFBOX-3038 Project: PDFBox Issue Type: Bug Components: Text extraction Affects Versions: 2.0.0 Reporter: Tilman Hausherr Fix For: 2.0.0 This happens with file 001033.pdf: 2.0: {code} String[108.0,663.6 fs=6.96 xscale=6.96 height=0.0 space=12.1104 width=3.4800034]1 String[144.0,668.4 fs=9.0 xscale=9.0 height=0.0 space=20.25 width=2.996994]I String[147.417,668.4 fs=9.0 xscale=9.0 height=0.0 space=20.25 width=4.5]n String[152.337,668.4 fs=9.0 xscale=9.0 height=0.0 space=20.25 width=2.25] String[154.88701,668.4 fs=9.0 xscale=9.0 height=0.0 space=20.25 width=2.501999]t String[157.809,668.4 fs=9.0 xscale=9.0 height=0.0 space=20.25 width=4.5]h String[162.729,668.4 fs=9.0 xscale=9.0 height=0.0 space=20.25 width=3.9960022]e String[167.145,668.4 fs=9.0 xscale=9.0 height=0.0 space=20.25 width=2.25] {code} 1.8: {code} String[108.0,663.6 fs=6.96 xscale=6.96 height=4.57272 space=1.74 width=3.4800034]1 String[144.0,668.4 fs=9.0 xscale=9.0 height=5.913 space=2.25 width=2.996994]I String[147.417,668.4 fs=9.0 xscale=9.0 height=5.913 space=2.25 width=4.5]n String[152.337,668.4 fs=9.0 xscale=9.0 height=5.913 space=2.25 width=2.25] String[154.88701,668.4 fs=9.0 xscale=9.0 height=5.913 space=2.25 width=2.501999]t String[157.809,668.4 fs=9.0 xscale=9.0 height=5.913 space=2.25 width=4.5]h String[162.729,668.4 fs=9.0 xscale=9.0 height=5.913 space=2.25 width=3.9960022]e String[167.145,668.4 fs=9.0 xscale=9.0 height=5.913 space=2.25 width=2.25] {code} The font has an empty bbox: {code} def /FontBBox {0 0 0 0} {code} 1.8 had this code to get the height (in PDSimpleFont): {code} PDRectangle fontBBox = desc.getFontBoundingBox(); if (fontBBox != null) { retval = fontBBox.getHeight() / 2; } if( retval == 0 ) { retval = desc.getCapHeight(); } if( retval == 0 ) { retval = desc.getAscent(); } if( retval == 0 ) { retval = desc.getXHeight(); if (retval > 0) { retval -= desc.getDescent(); } } {code} 2.0 has only this: {code} float glyphHeight = font.getBoundingBox().getHeight() / 2; {code} So 2.0 takes the height from the font itself, and has no Plan B. Getting the BBox from the font descriptor brings correct heights. (And a better text extraction) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org