Hi, I have a problem with extracting plain text from PDF documents that contain polish characters. I am using the following approach to extract text: ...... File f = new File(fileName);
PDFParser parser = new PDFParser(new FileInputStream(f)); parser.parse(); COSDocument cosDoc = parser.getDocument(); PDFTextStripper pdfStripper = new PDFTextStripper(); PDDocument pdDoc = new PDDocument(cosDoc); String parsedText = pdfStripper.getText(pdDoc); ...... parsedText is then written to a file using UTF8 encoding. The above code works fine in most cases. Text containing polish characters is extracted correctly. There are, however, the .pdf files for witch the above method does not work. Polish characters are replaced. E.g. polish crossed l (ł) is replaced by %. Is there any way to fix this problem? Regards, Piotr Rychlik

