It seems to me (having worked with OpenType fonts for some years) that while it might be possible to make an Arabic-->Roman converter at the font level, that's going to be one of the most inefficient possible ways to handle it. With OT you can make a set of rules that says

Here's a 1 followed by two digits; substitute a C;
Here's a 2 followed by one digit; substitute XX
Here's a 3 followed by a something other than a digit; substitute III.

But it can't understand numbers the way a programming language can do. If you want to be able to write XC for 90, the task gets somewhat more complex, because OT definitely can't say

For a number in the range 90-99, do the following . . .

Surely a programmatic solution would be better; and (La)TeX has an understanding of roman numbers built in. With a little Googling I was able to come up with this file, which works:

%&program=xelatex
%&encoding=UTF-8 Unicode

\documentclass[11pt,letterpaper,twoside,openany]{book}

\usepackage{fontspec}

\makeatletter
\newcommand{\rmnum}[1]{\romannumeral #1}
\newcommand{\Rmnum}[1]{\expandafter\@slowromancap\romannumeral #1@}
\makeatother

\begin{document}

There are \rmnum{123}\ fish in the sea.

And there are \Rmnum{5123}\ leaves on the tree.

\end{document}

But I don't know how to make a file that will use /ActualText. Maybe someone here can explain that.

Peter Baker



On 6/19/11 4:43 PM, Ross Moore wrote:
Hello Enrico,

On 20/06/2011, at 5:42 AM, enrico.grego...@univr.it wrote:

What the OP wants is that "CXV" is stored as a unique glyph representing 115.
Maybe this can be done by reserving, say, five thousand slots in Unicode to
contain the numbers from 1 to 5000 in Roman form that are built from the basic
digits, embedding in the font (or in the typesetting engine) the algorithm for 
building
them from the Western/Arabic representation.
No.
In the PDF ISO standard, you have the option of using /ActualText tagging.
The PDF would contain a portion of the page contents stream, such as:

   /Span<</ActualText(115)>>BMC .... (graphics to position and produce
the letters 'C' 'X' and 'V' ) ... EMC

Now *any* attempt to select any portion of the visible string "CXV"
is supposed to result in the whole string being included when copying.

The problem is that not all PDF browsers are fully conformant, so this
behaviour may not be what you actually get with a particular piece of
software.  (BTW, Apple is one of the biggest offenders.)

This might be done in two passes:
represent the number using the codes for Roman numerals and start a ligaturing
process.
Trying to do it character by character at the font level doesn't seem
overly practical to me. The concept is the number "123" but represented
in a non-standard way. The use of /ActualText tagging seems to be much
more helpful to readers, and also to other software that tries to
extract the meaning being represented with a PDF, for whatever purpose.

Note that ISO PDF also has an alternative method of tagging.
E.g.
     /Span<</Alt(123)>>BMC .... EMC
Screen-readng software is meant to use the /Alt tagging.

And both /Alt and /ActualText allow multiple values having been preceded
by a /Lang tag, so that the actual vocalization generated by the
screen-reader can be adjusted for different languages --- the document
author normally would provide this, but a sophisticated PDF browser
plug-in might be programmed to produce a translation on-the-fly.

Actually, Roman numerals are mostly used when the numerical information is
almost irrelevant as such. Nobody uses the "XIV" in "Louis XIV" to perform
calculations. That's just a different way of writing "quatorze".
Right. So /ActualText tagging can support this distinction in meaning.
It is *not* intended to support calculations --- that is the domain
of "Content Tagging" using MathML.

I see it just as the ability to copy "quatorze" from a text and paste it into a
worksheet cell accepting numbers to get 14. In the case of Roman numerals
it may be simpler, of course. But is it useful?
Most certainly it is useful.
It is part of the way of the future for smart PDF documents.


Ciao
Enrico

------------------------------------------------------------------------
Ross Moore                                       ross.mo...@mq.edu.au
Mathematics Department                           office: E7A-419
Macquarie University                             tel: +61 (0)2 9850 8955
Sydney, Australia  2109                          fax: +61 (0)2 9850 8114
------------------------------------------------------------------------






--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
   http://tug.org/mailman/listinfo/xetex



--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex

Reply via email to