Hi, > Zak Bennett <[email protected]> hat am 28. August 2013 um 01:20 > geschrieben: > > > Hi guys, > > Firstly I apologise if this question has been repeated often. Having looked > around I have found a number of individuals with the same issue as myself. > > Have you discovered any workarounds to the issue of returning Japanese text > information from a PDF using pdfbox? If not, would this be an issue which > the dev team is currently working to solve? Please be more specific. There are 3 known cases:
- PDFBox can extract the text of pdfs containing foreign (non latin) languages depending on the used font - the text extraction doesn't work because of the used font and a wrong/incomplete Implementation in PDFBox - the text can't be extracted, even the adobe test fails see [1] So, the question is, did you ever try to extract text? If not, give it a try [2] > Best regards, > > Zak BR Andreas Lehmkühler [1] http://pdfbox.apache.org/userguide/faq.html#notext [2] http://pdfbox.apache.org/commandline/

