[ https://issues.apache.org/jira/browse/PDFBOX-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler reassigned PDFBOX-2247: ------------------------------------------ Assignee: Andreas Lehmkühler > Regression in text extraction between 1.8.5 and 1.8.6 > ----------------------------------------------------- > > Key: PDFBOX-2247 > URL: https://issues.apache.org/jira/browse/PDFBOX-2247 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.8.6 > Reporter: Tim Allison > Assignee: Andreas Lehmkühler > Priority: Minor > > Looks like a character mapping issue crept in some time between 1.8.5 and > 1.8.6 on this > [file|http://digitalcorpora.org/corp/nps/files/govdocs1/701/701542.pdf]? > With both seq and NonSeq parsers, the correct text was extracted via > ExtractText in 1.8.5. In 1.8.6, java -jar pdfbox-app-1.8.6.jar ExtractText > yields text starting with: {noformat}7>PFLK>I 9>NH ;BNRF@B > =%;% .BM>NPJBKP LC PEB 3KPBNFLN > 9>@FCF@ -L>OP ;@FBK@B >KA 5B>NKFKD -BKPBN > :BOB>N@E 9NLGB@P ;QJJ>NT .B@BJ?BN (&&* > "&++&,-+Æ$( #&+-&%+$-& !).&)-*+Æ&,{noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)