On 29/09/2014 11:02 PM, "Frédéric Grosshans" <frederic.grossh...@gmail.com> wrote: > > Le 27/09/2014 01:10, Andrew Cunningham a écrit : > >> * NEVER try to copy and paste text from PDF. It is a preprint format and should be treated as such. > > Well... Having access to the raw text is often useful (for example, to allow blinds to have acces to the content of pdf documents, or to search a word in a scanned historical document), and cut and pasting text from PDF often works, even if the “rich text” formating is lost. >
The problem is that often the actual text isnt necessarily ths same as the original text used to generate the pdf. Results will vary according to fonts used and tools used to generate the pdf. Even adobe acrobat contains different tools which can give vastly different results. It is best to think of PDF as dealing with glyphs rather than characters. I tend to mainly work with complex scripts, and the results with those is usually not encouraging. I know there is ActualText, but honestly I dont actually ever remember seeing a complex script PDF I could copy and paste from without post-processing of the text. The average person creating PDF files has no knowledge of how to achieve optimal results. Nko is one of the easier scripts to deal with thankfully. > In the case of the Ebola FAQs ( https://sites.google.com/site/athinkra/ebola-faqs) discussed here, it almost worked perfectly on my computer (Ubuntu Linux 14.04) for N’Ko (diacritics are shifted by one character) and Vai. Of course, the Adlam was not working (somehow converted to Arabic), bus it was expected, since Adlam is not (yet?) in Unicode. > > > _______________________________________________ > Unicode mailing list > Unicode@unicode.org > http://unicode.org/mailman/listinfo/unicode
_______________________________________________ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode