I garbled characters when you import a Chinese PDF.   (in EUC, Shift-JIS, ....)
I want to read in UTF-8.
Or should I what coding?


below, it's my program now.
---------------------
File document = new File(strFile_fullpath);

ContentHandler handler = new BodyContentHandler(Integer.MAX_VALUE);
Metadata metadata = new Metadata();
PDFParser parser = new PDFParser();

parser.getPDFParserConfig().setSuppressDuplicateOverlappingText(true);
parser.getPDFParserConfig().setExtractAnnotationText(false);

parser.parse(new FileInputStream(document), handler, metadata, new 
ParseContext());

System.out.plintln(handler.toString());
---------------------


-- 
Syoshin

Reply via email to