Hello Fred,
Andreas tested and it works, and now I tested and it works. What I did
this time (don't know what I did last time) is to use the PDFBox command
line app, and use the "ExtractText" feature. Could you please try again?
If it still doesn't work, please post your code. Preferably to
PDFBOX-2023 in JIRA.
Tilman
Am 11.04.2014 06:40, schrieb Fred Andrews:
I am using PDFTextStripper on some PDF statements from Bank of America,
and everything is coming through as zero height.
I traced it down to getFontHeight in
org.apache.pdfbox.pdmodel.font.PDSimpleFont, which is indeed getting
zero. The font is a type 3 font and I'm not sure how it should work,
but getFontHeight is calling getAFM() and that is returning a null
because its not a type 1 font. Then in the next section in
getFontHeight there are no font descriptors, and the zero just flows
through all the way through getFontHeight.
I searched for anything I could key on to calculate the font height but
couldn't find it. The font size is claimed to be 20 by getFontSize(),
although it appears to be more like 8. I did trace to where it got a
font size command of twenty, but somehow I'm assuming that would need to
be scaled, and I can't see where that might come from.
The font width on the other hand looks accurate, and I would think
something similar to that would be needed, but would really appreciate
some guidance on how it should work. If I have clue on how it should
work I can see what I can do to implement it.
This file displays fine in Acrobat and edits fine in Nitro, so it can't
be that invalid.