Re: Joined "ti" coded as "O" in PDF

2016-05-08 Thread Philippe Verdy
2016-05-08 14:42 GMT+02:00 Don Osborn : > Some earlier posts in this thread made the observation that PDF is for > presentation not archiving. > I tend to disagree. PDF are hugely used for archiving and for that purpose it does not matter how it was generated, it is only meant to be a facsimile, p

Re: Joined "ti" coded as "O" in PDF

2016-05-08 Thread Don Osborn
Could it be said that a PDF conversion app generating unusual coding of characters, and doing so without advising users, is an instance of "Unicode malpractice"? (per David's mention of using the "bully pulpit") Some earlier posts in this thread made the observation that PDF is for presentatio

Re: Joined "ti" coded as "O" in PDF

2016-05-08 Thread Andrew Cunningham
The t_i instance will depend on the quality of the font. If its a standard ligature there should be a glyph to codepoints assignment in the cmap table or the ToUnicode mapping in the PDF file. As David indicates, it isnt a Unicode issue. It is an issue with the font used and/or the tools used. P

Re: Joined "ti" coded as "O" in PDF

2016-05-07 Thread David Perry
I agree that it's a real-world problem -- PDFs really should be searchable -- but I do not see that it's a Unicode issue. Unicode defines the basic building blocks of LATIN SMALL LETTER T and LATIN SMALL LETTER I; that's its job. Unicode is not responsible for font construction or creating PDF

Re: Joined "ti" coded as "O" in PDF

2016-05-06 Thread Andrew Cunningham
My understand ing is searchability comes down to twho factors: 1) the ToUnicode mapping ...I which mapps glyphs in the font or subsetted font to Unicode codepoints. Mappings take the form of one glyph to one codepoint or one glyph to two or more codepoints. Obviously any glyph that doesnt resolve

Re: Joined "ti" coded as "O" in PDF

2016-05-06 Thread Steve Swales
This discussion seems to have fizzled out, but I’m concerned that there’s a real world problem here which is at least partially the concern of the consortium, so let me stir the pot and see if there’s still any meat left. On the current release of MacOS (including the developer beta, for your r

Re: Joined "ti" coded as "O" in PDF

2016-03-21 Thread Philippe Verdy
Are those PDF supposed to be searchable inside of them ? For archival purpose, the PDF are stored in their final form, and search is performed by creating a database of descriptive metadata. Each time one wants formal details, they have to read the original the way it was presented (many PDFs are j

Re: Joined "ti" coded as "O" in PDF

2016-03-20 Thread Tom Gewecke
> On Mar 20, 2016, at 12:24 PM, Asmus Freytag (t) > wrote: > > Usually, the archive feature pertains only to the fact that you can reproduce > the final form, not to being able to get at the correct source (plain text > backbone) for the document. My understanding is that PDF/A-1a is suppose

Re: Joined "ti" coded as "O" in PDF

2016-03-20 Thread Asmus Freytag (t)
On 3/20/2016 12:11 AM, Janusz S. Bien wrote: Quote/Cytat - Andrew Cunningham (Sun 20 Mar 2016 12:06:29 AM CET): Hi Don, Latin is fine if you keep to simple well made fonts and avoid using more so