Hi, please try the setSortByPosition() method of the stripper. See also the PDFBox FAQ. Tilman
------------------------------------------------------------------------ Gesendet mit der Telekom Mail App <https://kommunikationsdienste.t-online.de/redirects/email_app_android_sendmail_footer> --- Original-Nachricht --- Von: 김보섭 Betreff: Text extracting error Datum: 28.11.2018, 11:01 Uhr An: [email protected] We've tried to extract text from PDF When we tried to extract Korean from text in PDF file, the order of those have been broken while English was done well. This does not mean that Korean is not extracted from PDF, it is well done, but sequence has some problem. This Problem occurred when 1. if PDF files have chart 2. size of the character is different one another when we extracted PDF that have chart, then the text in the lowest row shows at the beginning and the text in the highest row shows at the end ex) | 가 | 나 | (in the chart) |다 | 라 | -> 다라 가나(extracted) and when PDF has multiple text size and font the smallest and the the most simple font text have been extracted at the beginning and the largest and less simple text font text have been extracted at the end. please check if this is a bug when extracting Korean public static void extractStringfromPDF() throws IOException{ final FileChooser filechooser = new FileChooser(); File file = filechooser.showOpenDialog(null <http://filechooser.showOpenDialog(null> ); try { PDDocument document = PDDocument.load(file); PDFTextStripper pdfStripper = new PDFTextStripper(); String text = pdfStripper.getText(document <http://pdfStripper.getText(document> ); File txtFile = new File(file.getPath <http://file.getPath> () + ".txt"); FileWriter fw = new FileWriter(txtFile, true); fw.write(text); fw.flush(); fw.close <http://fw.close> (); System.out.println(text <http://System.out.println(text> ); document.close <http://document.close> (); }catch(Exception e) {e.printStackTrace <http://e.printStackTrace> ();} } the above code is that we used in our program

