Hi Tobias,

On 22/06/2011, at 3:45 PM, Tobias Schoel <liesdieda...@googlemail.com> wrote:

> Hi,
> 
>>> Other viewers don't understand this, as expected. I still feel that it's 
>>> quite useless:
>>> if I find "Louis XIV" I may want to copy it and get the real name, not 
>>> "Louis 14".
>> 
>> Sure.
>> It is your job as author to decide what your readers should get.
> 
> And it's the users right to use what he gets as he wants.

Well yes, but then s/he has to be prepared to use appropriate translation 
methods to get what is wanted from what is supplied, but the author and browser 
software.

> So the author should deliver a product that is easy and variable to use.

The best that can be expected is that unusual representations of data be tagged 
as such, so that software has a chance to recognise that it may not be as 
straight-forward as just the letter content might suggest. Then attributes to 
the tags can supply alternative representations, if the author has sufficient 
forethought to predict that someone might want to use these. But there should 
be no obligation to have done so, as an author cannot be expected to predict 
all the possible uses that someone may make of his/her written ideas.

Thus we are led to appreciate the need for Tagged PDF.

> 
>> If you want them to get  "Louis XIV"  then no /ActualText is required.
>> *unless* that X I and V are really: U+2169 U+2160 U+2164 .
>> In that case you may want the /ActualText  to replace with "XIV" so that
>> your readers don't end up with the undefined character symbol.
> As Unicode says: these Codepoints are deprecated. So the letters XIV should 
> be prefered. (And I'd still say, that the number 14 is preferable.)

Deprecation is a curious thing.
At what level should it be implemented or imposed?

Should TeX stop an author from using these symbols?
If they do get into a PDF, should the browser refuse to display them, 
substituting something else?
Or if a reader does a Copy/Paste, should either the copying or pasting 
application offer to make a substitution?

I'd answer  No, No, Perhaps  to these questions, otherwise it becomes 
impossible to even put into print that the specific characters should no longer 
be used.

>> 
>> Or maybe you want them to get  "Louis quatorze".
>> Probably you do want a screen reader to say "Louis quatorze",
>> but then you'll want to test that AR reads it correctly
>> --- maybe  /Alt(Louie katorze)  will be better.
> 
> Indeed there should be different substitutions depending on the purpose. At 
> least arabic numerals (for numerical use as in spreadsheets), letters (for 
> text use as copypasting to text documents) and screenreader text (for 
> screenreader use) are important. Next to that are different languages. (I 
> don't know about the English way in reading theses names, but in German 
> contexts they are mostly read in German and partly read in the source 
> language: So "Ludwig der Vierzehnte" or "Louis quatorze")

Tagged PDF allows for alternatives, selected according to the reader's language 
locale, in the browser software. Customized choices within the same locale can 
be included, but as yet there is no standard way to extract and deal with such 
choices. A browser plugin might be expecting to look for certain specific kinds 
of choices.

> 
> So one might want a package, which offers a macro \romannumeral{14}, which 
> produces the glyphs (intended to be used by the font author) and adds 
> appropriate PDF-specials whose content is based on the active language. At 
> least, it should offer option keys to set these contents by hand e.g.
> 
> Louis \romannumeral[screenreader="katorze",replacement_text="quatorze", 
> use_font_symbols]{14}

Someone would define a macro  \LouisXIV  to expand to this kind of thing.
An author just puts the macro into his/her LaTeX (or ConTeXt) source.


> 
>> This kind of stuff adds a whole new dimension to typesetting.
> 
> No, it simply allows the first and foremost dimension of typesetting to use 
> the capabilities of modern media: to ease convey the meaning without 
> distracting the user.

There is a lot more for authors and macro-writers to think about and use.
Yes, the intention is to much more effectively convey meaning within electronic 
documents.

BTW, if you have Acrobat Pro 10, then I can show you some PDFs with fully 
tagged, quite complicated mathematical formulas. You'll need the Pro version of 
the browser to be able to see the tagging, and extract the content as XML 
incorporating MathML. No other software can do this yet, to my knowledge — 
oops, not true: the MathPlayer plugin to Adobe Reader should be able to do it 
also.
Copy/Paste from these PDFs gets all the correct math symbols — using Plane 1 
alphanumerics, where appropriate — but cannot sensibly position superscripts, 
subscripts, fractions, etc.

> 
> ciao
> 
> Toscho
> 
> PS: I can't do it myself but would appreciate it a lot, if someone could 
> create such a package.

Maybe in a year or so, when TeX support for Tagged PDF has become more mature.
At present it is very much experimental.

Cheers,

       Ross


--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Reply via email to