_______________________________________________________________________________________
Note: This e-mail is subject to the disclaimer contained at the bottom of this message. _______________________________________________________________________________________ Hi, I have looked at the PDF file. It looks as if text in all the pages were scanned as images. I am certain that one cannot extract text from (text scanned as) images using PDFBox. Could someone correct me if I am wrong. Thanks, Stephen -----Original Message----- From: Big Donkeys [mailto:[email protected]] Sent: Friday, 20 July 2012 6:09 AM To: [email protected] Subject: Can't extract text Adobe-WinCharSetFFFF-UCS2 Hi, I'm having some troubles extracting text from some South Korean PDF files using PDFTextStripper. When I try I get a "severe error could not parse predefined CMAP file for 'Adobe-WinCharSetFFFF-UCS2'" message and then gives me some gibberish. File opens and displays fine in Adobe reader. I'm using pdfbox-app-1.7.0.jar. Here is a link to an example PDF that gives me trouble: http://eng.khoa.go.kr/inc/func/fileDownloadBlob_nori.asp?cmsCd=CM0237&ntNo=626&fNo=4 Any ideas? _______________________________________________________________________________________ The information transmitted in this message and its attachments (if any) is intended only for the person or entity to which it is addressed. The message may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information, by persons or entities other than the intended recipient is prohibited. If you have received this in error, please contact the sender and delete this e-mail and associated material from any computer. The intended recipient of this e-mail may only use, reproduce, disclose or distribute the information contained in this e-mail and any attached files, with the permission of the sender. This message has been scanned for viruses. _______________________________________________________________________________________

