Thanks Ross. I think in this case all I really need is to revise \href code to insert /ActualText (because I'm using small caps for hyperlinks in this doc). Pretty much everything else works fine already.
Joe On Sun, Dec 29, 2013 at 11:45 PM, Ross Moore <ross.mo...@mq.edu.au> wrote: > Hi Joe, > > On 30/12/2013, at 8:12 AM, Joe Corneli wrote: > >> This answer talks about how to turn off litgatures: >> http://tex.stackexchange.com/a/5419/4357 >> >> Is there a way to turn off *all* special characters (e.g. small caps) >> and just get ASCII characters in the copy-and-paste level of the PDF? > > In short, no! > — because this is against the idea of making more use of Unicode, > across all computing platforms. > > Certainly a ligature can have an /ActualText replacement consisting > of the separate characters, but this requires the PDF producer > to have supplied this within the PDF, as it is being generated. > > I've played a lot with this kind of thing, and think that this > is the wrong approach. One should use /ActualText to provide > the correct Unicode replacement, when one exists. Thus one > can extract textual information reliably, even when the PDF > uses legacy fonts that may not contain a /ToUnicode resource, > or if that resource is inadequate in special situations. > > > Besides, do you really mean *all* special characters? > What about simple symbols like: ß∑∂√∫Ω and all the other > myriad foreign/accented letters and mathematical symbols? > > If you want these to Copy/Paste as TeX coding (\beta \Sum \delta > \sqrt etc.) within documents that you write yourself, then I wrote > a package called mmap where this is an option for the original > Computer Modern fonts. > > > Alternatively, a PDF reader might supply a filtering mode that > converts the ligatures back to separate characters. Then the > user ought to be able to choose whether or not to use this filter. > I don't know of any that actually do this. > (In any case, you would want such a tool to allow you to specify > which characters to replace, and which to preserve.) > > > Your best option is surely to (get someone else to) write such > a filter that meets your needs, and use it to post-process the text > extracted via Copy/Paste or with other text-extraction tools. > > Of course this is no use if your aim is to create documents for > which others get the desired result via Copy/Paste. > For this, the /ActualText approach is what you need. > > > > Hope this helps, > > Ross > > ------------------------------------------------------------------------ > Ross Moore ross.mo...@mq.edu.au > Mathematics Department office: E7A-206 > Macquarie University tel: +61 (0)2 9850 8955 > Sydney, Australia 2109 fax: +61 (0)2 9850 8114 > ------------------------------------------------------------------------ > > > > > > > -------------------------------------------------- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex