Hi All:
I'm glad my message sparked some discussion. My M[N]WE for my
specific use case on tex.stackexchange.com has not gotten much
attention - I recently attached a +200 bounty.
http://tex.stackexchange.com/questions/151835/actualtext-in-small-cap-hyperlinks
I figured I should put in a plug for that here. I already got a reply
from one of the main authors of hyperref, but patching \href at the
necessary level is beyond me. Finally, I realize a detailed
discussion of this issue is probably not germane to this list, so if
you feel that way, please direct further comments there, or to me off
list.
Thank you!
Joe
On Wed, Jan 1, 2014 at 10:34 PM, Zdenek Wagner zdenek.wag...@gmail.com wrote:
2014/1/1 Ross Moore ross.mo...@mq.edu.au:
Hi Zdeněk,
On 02/01/2014, at 2:14 AM, Zdenek Wagner wrote:
2014/1/1 Ross Moore ross.mo...@mq.edu.au:
In the example PDF that I attached to my previous message, each
mathematical
character is mapped to a big-endian UTF-16 hexadecimal string, with Plane-1
alphanumerics expressed using surrogate pairs.
Thank you, now I see it. The book where I read about /ActualText did
not mention that I can use UTF16 if I start the string with BOM.
Fair enough; this I had to discover for myself.
The PDF Reference Manual (e.g. for ISO 32000) has no such examples,
so I had to experiment with different ways to specify strings requiring
non-ascii characters. UTF16 is the most elegant, and avoids the messiness
of using escape characters and octal codes, even for some non-letter
ASCII characters.
Can I
see the source of the PDF? It could help me much to see how you do all
these things.
Each piece of mathematics is captured, saved to a file, converted to MathML,
then run through my Perl script to create alternative (La)TeX source.
This is done to be able to create a fully-tagged PDF description of the
mathematical content, using a special version of pdftex that Han The Thanh
created for me (and others) --- still in experimental stage.
You should not need all of this machinery, but I'm happy to answer
any questions you may have.
I've attached a couple of examples of the output from my Perl script,
in which you can see how the /ActualText replacement strings
are specified, using a macro \SMC -- which ultimately expands to use
the \pdfstartmarkedcontent primitive.
Thank you.
Without the special primitives, you should be able to use \pdfliteral
to insert the tagging needed for just using /ActualText .
I see no reason why Indic character strings could not be done similarly.
You probably need some on-the-fly preprocessing to work out the required
strings to use.
I'm not sure whether there is a LaTeX package that allows you to get the
literal bits into the correct place without upsetting other fine
details of the typesetting with Indic characters.
This certainly should be possible, at least when using pdfLaTeX .
Not sure of the details using XeTeX -- but you work with the source code,
so can devise anything that is needed, right?
Typesetting depends on HarfBuzz and font features, no package is
needed (fontspec and polyglossia just save work that could be done by
primitives), any code can be sent to xdvipdfmx by \special{pdf: code
...} similarly as by \pdfliteral in pdftex. I already know how to do
it.
--
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz
Hope this helps,
Ross
Ross Moore ross.mo...@mq.edu.au
Mathematics Department office: E7A-206
Macquarie University tel: +61 (0)2 9850 8955
Sydney, Australia 2109 fax: +61 (0)2 9850 8114
--
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
--
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz
--
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex
--
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex