Re: Joined "ti" coded as "Ɵ" in PDF

Don Osborn Sat, 19 Mar 2016 09:54:47 -0700

Thanks all for the feedback.

Doug, It may well be my clipboard (running Windows 7 on this particularlaptop). Get same results pasting into Word and EmEditor.

So, when I did a web search on "internaƟonal," as previously mentioned,and come up with a lot of results (mostly PDFs), were those also aconsequence of many not fully Unicode compliant conversions by others?

A web search on what you came up with - "Interna􀆟onal" - yielded manymore (82k+) results, again mostly PDFs, with terms like "interna onal"(such as what Steve noted) and "interna<onal" and perhaps others (giventhe nature of, or how Google interprets, the private use character?).

Searching within the PDF document already mentioned, "international"comes up with nothing (which is a major fail as far as usability).Searching the PDF in a Firefox browser window, only "internaƟonal" findsthe occurrences of what displays as "international." However afterdownloading the document and searching it in Acrobat, only a search for"interna􀆟onal" will find what displays as "international."

A separate web search on "Eīects" came up with 300+ results, includingsome GoogleBooks which in the texts display "effects" (as far as Ichecked). So this is not limited to Adobe?

Jörg, With regard to "Identity H," a quick search gives the impressionthat this encoding has had a fairly wide and not so happy impact, evenif on the surface level it may have facilitated display in a particularstyle of font in ways that no one complains about.

Altogether a mess, from my limited encounter with it. There must havebeen a good reason for or saving grace of this solution?


Don

On 3/17/2016 2:17 PM, Steve Swales wrote:

Yes, it seems like your mileage varies with the PDF 
viewer/interpreter/converter.  Text copied from Preview on the Mac replaces the 
ti ligature with a space.  Certainly not a Unicode problem, per se, but an 
interesting problem nevertheless.

-steve

On Mar 17, 2016, at 11:11 AM, Doug Ewell <d...@ewellic.org> wrote:

Don Osborn wrote:

Odd result when copy/pasting text from a PDF: For some reason "ti" in
the (English) text of the document at
http://web.isanet.org/Web/Conferences/Atlanta%202016/Atlanta%202016%20-%20Full%20Program.pdf
is coded as "Ɵ". Looking more closely at the original text, it does
appear that the glyph is a "ti" ligature (which afaik is not coded as
such in Unicode).

When I copy and paste the PDF text in question into BabelPad, I get:

Interna􀆟onal Order and the Distribu􀆟on of Iden􀆟ty in 1950 (By
invita􀆟on only)

The "ti" ligatures are implemented as U+10019F, a Plane 16 private-use
character.

Truncating this character to 16 bits, which is a Bad Thing™, yields
U+019F LATIN CAPITAL LETTER O WITH MIDDLE TILDE. So it looks like either
Don's clipboard or the editor he pasted it into is not fully
Unicode-compliant.

Don's point about using alternative characters to implement ligatures,
thereby messing up web searches, remains valid.

--
Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸

Re: Joined "ti" coded as "Ɵ" in PDF

Reply via email to