Yeah, I've stumbled upon this a lot in academic Japanese/Chinese texts. I try to copy some Chinese character, only to find out that it's really a string of random ASCII characters.
Is there only one of those crap PDF pseudo-encodings? If so, I'll use a conversor next time... 2016-03-17 14:57 GMT-03:00 "Jörg Knappen" <jknap...@web.de>: > I inspected the pdf file, and its font encoding is termed "Identity-H". I > couldn't reveal much about this encoding, but it seems to be a private > encoding of Adobe used especially for Asian fonts. > > --Jörg Knappen > > Gesendet: Donnerstag, 17. März 2016 um 17:43 Uhr > Von: "Don Osborn" <d...@bisharat.net> > An: unicode@unicode.org > Betreff: Joined "ti" coded as "Ɵ" in PDF > Odd result when copy/pasting text from a PDF: For some reason "ti" in > the (English) text of the document at > http://web.isanet.org/Web/Conferences/Atlanta%202016/Atlanta%202016%20-%20Full%20Program.pdf > is coded as "Ɵ". Looking more closely at the original text, it does > appear that the glyph is a "ti" ligature (which afaik is not coded as > such in Unicode). > > Out of curiosity, did a web search on "internaƟonal" and got over 11k > hits, apparently all PDFs. > > Anyone have any idea what's going on? Am assuming this is not a > deliberate choice by diverse people creating PDFs and wanting "ti" > ligatures for stylistic reasons. Note the document linked above is > current, so this is not (just) an issue with older documents. > > Don Osborn