Steve, Steve Garcia wrote > Am trying to pull table data out of PDF files that contain non tabular > text as well as the tables. I've successfully parsed the non tabled text > using PdfTextExtractor.GetTextFromPage(), but the resulting text stream is > empty at each table location.
The text in the tables cannot be extracted without OCR. The text in the tables is drawn using type 3 fonts with an ad-hoc encoding, i.e. the first glyph drawn on the page is encoded as 0, the second (differing) glyph as 1, ... E.g. on page 11 the first text drawn is "B6 Summary (Official Form 6 - Summary) (12/14)" and is encoded as 00, 01, 02, 03, 04, 05, 05, 06, 07, 08, 02, 09, 0A, 0B, 0B, 0C, 0D, 0C, 06, 0E, 02, 0F, ... Furthermore the font has not mapping to Unicode. Thus, automated text extraction without some kind of OCR is impossible. Regards, Michael -- View this message in context: http://itext.2136553.n4.nabble.com/iText-help-resources-tp4660980p4660981.html Sent from the iText mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php