2013/12/30 Joe Corneli <holtzerman...@gmail.com>: > Thanks Ross. > > I think in this case all I really need is to revise \href code to > insert /ActualText (because I'm using small caps for hyperlinks in > this doc). Pretty much everything else works fine already. > Small caps have nothing to do with the code points, it is just the shape of the characters. If you enter \textsc{something}, copy&paste should result in lowercase something.
> Joe > > On Sun, Dec 29, 2013 at 11:45 PM, Ross Moore <ross.mo...@mq.edu.au> wrote: >> Hi Joe, >> >> On 30/12/2013, at 8:12 AM, Joe Corneli wrote: >> >>> This answer talks about how to turn off litgatures: >>> http://tex.stackexchange.com/a/5419/4357 >>> >>> Is there a way to turn off *all* special characters (e.g. small caps) >>> and just get ASCII characters in the copy-and-paste level of the PDF? >> >> In short, no! >> — because this is against the idea of making more use of Unicode, >> across all computing platforms. >> >> Certainly a ligature can have an /ActualText replacement consisting >> of the separate characters, but this requires the PDF producer >> to have supplied this within the PDF, as it is being generated. >> >> I've played a lot with this kind of thing, and think that this >> is the wrong approach. One should use /ActualText to provide >> the correct Unicode replacement, when one exists. Thus one >> can extract textual information reliably, even when the PDF >> uses legacy fonts that may not contain a /ToUnicode resource, >> or if that resource is inadequate in special situations. >> >> >> Besides, do you really mean *all* special characters? >> What about simple symbols like: ß∑∂√∫Ω and all the other >> myriad foreign/accented letters and mathematical symbols? >> >> If you want these to Copy/Paste as TeX coding (\beta \Sum \delta >> \sqrt etc.) within documents that you write yourself, then I wrote >> a package called mmap where this is an option for the original >> Computer Modern fonts. >> >> >> Alternatively, a PDF reader might supply a filtering mode that >> converts the ligatures back to separate characters. Then the >> user ought to be able to choose whether or not to use this filter. >> I don't know of any that actually do this. >> (In any case, you would want such a tool to allow you to specify >> which characters to replace, and which to preserve.) >> >> >> Your best option is surely to (get someone else to) write such >> a filter that meets your needs, and use it to post-process the text >> extracted via Copy/Paste or with other text-extraction tools. >> >> Of course this is no use if your aim is to create documents for >> which others get the desired result via Copy/Paste. >> For this, the /ActualText approach is what you need. >> >> >> >> Hope this helps, >> >> Ross >> >> ------------------------------------------------------------------------ >> Ross Moore ross.mo...@mq.edu.au >> Mathematics Department office: E7A-206 >> Macquarie University tel: +61 (0)2 9850 8955 >> Sydney, Australia 2109 fax: +61 (0)2 9850 8114 >> ------------------------------------------------------------------------ >> >> >> >> >> >> >> -------------------------------------------------- >> Subscriptions, Archive, and List information, etc.: >> http://tug.org/mailman/listinfo/xetex >> > > > > -------------------------------------------------- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex -- Zdeněk Wagner http://hroch486.icpf.cas.cz/wagner/ http://icebearsoft.euweb.cz -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex