[ https://issues.apache.org/jira/browse/PDFBOX-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr reopened PDFBOX-2740: ------------------------------------- > Text extraction failed on Korean PDF > ------------------------------------ > > Key: PDFBOX-2740 > URL: https://issues.apache.org/jira/browse/PDFBOX-2740 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.8.7, 1.8.8, 1.8.9, 2.0.0 > Reporter: Julien Ortega > Assignee: John Hewson > Priority: Major > Attachments: g_KO_201506-ReaderDC-cutAndPaste.txt, > g_KO_201506-ReaderDC-saveAsText.txt, g_KO_201506.pdf, g_KO_201506.txt > > > Trying to extract text on a Korean PDF gives me a lot of warnings : > WARNING: No Unicode mapping for US (33) in font > DVCAYA+WtKoBaeumMyungjoL063zb4?Pw > avr. 01, 2015 12:05:32 PM org.apache.pdfbox.pdmodel.font.PDSimpleFont > toUnicode > WARNING: No Unicode mapping for NAK (33) in font > JYLDGG+WtKoBaeumMyungjoL053zb4?Pw > avr. 01, 2015 12:05:32 PM org.apache.pdfbox.pdmodel.font.PDSimpleFont > toUnicode > WARNING: No Unicode mapping for RS (38) in font > WRYULE+WtKoBaeumMyungjoL013zb4?Pw > avr. 01, 2015 12:05:32 PM org.apache.pdfbox.pdmodel.font.PDFont <init> > WARNING: Invalid ToUnicode CMap in font FZEFOY+WtKoBaeumGothicL0422b4?Pw > avr. 01, 2015 12:05:32 PM org.apache.pdfbox.pdmodel.font.PDSimpleFont > toUnicode > WARNING: No Unicode mapping for DEL (33) in font > FZEFOY+WtKoBaeumGothicL0422b4?Pw > avr. 01, 2015 12:05:32 PM org.apache.pdfbox.pdmodel.font.PDFont <init> > WARNING: Invalid ToUnicode CMap in font OOLNBG+WtKoBaeumGothicL0122b4?Pw > avr. 01, 2015 12:05:32 PM org.apache.pdfbox.pdmodel.font.PDSimpleFont > toUnicode > WARNING: No Unicode mapping for SOH (33) in font > OOLNBG+WtKoBaeumGothicL0122b4?Pw > and the result is not readable. The pdf is containing the necessary > conversion table because every pdf reader (Desktop or Mobile) let me copy and > past the text without problem. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org