Re: Joined "ti" coded as "O" in PDF

David Perry Sat, 07 May 2016 11:50:16 -0700

I agree that it's a real-world problem -- PDFs really should besearchable -- but I do not see that it's a Unicode issue. Unicodedefines the basic building blocks of LATIN SMALL LETTER T and LATINSMALL LETTER I; that's its job. Unicode is not responsible for fontconstruction or creating PDF software. Furthermore, even if Unicode didwant to do something about it, I can't imagine what that could be --aside perhaps from using its bully pulpit to urge PDF creators and fontcreators to do their jobs better.

The fact that some PDF apps do not search and copy/paste text correctlywhen unencoded characters are given PUA values has been known for manyyears. In the case of Calibri, I looked at the font (version installedon my Win7 system) and found that the 'ti' ligature is named t_i, whichfollows good naming practices, and it does not have a PUA assignment.Given this, any well-constructed PDF app should be able to decode theligature correctly.


David

On 5/6/2016 11:49 AM, Steve Swales wrote:

This discussion seems to have fizzled out, but I’m concerned that
there’s a real world problem here which is at least partially the
concern of the consortium, so let me stir the pot and see if there’s
still any meat left.

On the current release of MacOS (including the developer beta, for
your reference, Peter), if you use Calibri font, for example, in any
app (e.g. notes), to write words with “ti” (like
internationalization), then press “Print" and “Open PDF in Preview”,
you get a PDF document with the joined “ti”.  Subsequently cutting and
pasting produces mojibake, and searching the document for words
with“ti” doesn’t work, as previously noted.

I suppose we can look on this as purely a font handling/MacOS bug, but
I’m wondering if we should be providing accommodations or conveniences
in Unicode for it to work as desired.

-steve

Re: Joined "ti" coded as "O" in PDF

Reply via email to