On 31 May 2010, at 22:13, Pablo Rodríguez wrote:
Hi there,
I have just accidentally discovered that LetterSpace behaves differently if
the whole paragraph is set with this feature or not.
The minimal example:
\documentclass[12pt]{article}
\usepackage{fontspec}
\setmainfont{Theano Didot}
\begin{document}
χαλεπὰ \addfontfeature{LetterSpace=12}τὰ καλά
χαλεπὰ τὰ καλά
χαλεπὰ \addfontfeature{LetterSpace=0}τὰ καλά
Beauty \addfontfeature{LetterSpace=12}is difficult
Beauty is difficult
Beauty \addfontfeature{LetterSpace=0}is difficult
\end{document}
If you copy the resulting text (from
http://www.ousia.tk/wrong-letterspace.pdf), you will see that only the second
line is properly typeset, or at least, there are no blank spaces between
letters.
I guess this might be a probable cause for wrong hyphenation when using
LetterSpace. (BTW, loading polyglossia makes no difference.)
Have I hit a bug in LetterSpace? Do you know any way to avoid this?
The PDF looks correct to me; where LetterSpace=12 is in effect, the letters are
more widely spaced, and where LetterSpace=0, they're not. I don't see a bug
here. Or am I missing something?
If you're specifically concerned about what happens when you use a viewer to
select and copy the text from this PDF into an editor... well... that's a
chancy operation. It worked fine for me with Acrobat (no extra spaces), but
other viewers may give different results. Basically, this is a poorly-defined
operation. As TeX does not use space characters between words, there is no
clear indication in the PDF data of where the word boundaries should be, and so
the viewer has to guess based on the glyph positions. That works most of the
time for simple running text, but modifying the letter spacing carries a pretty
high risk of confusing it.
As I see it, PDF was not really designed to be an interchange medium for text;
it's designed to convey the graphical appearance of the page. Extracting the
underlying text from the glyphs on the page is an afterthought that has never
been 100% reliable. Added features such as /ActualText can help, but xetex does
not currently support the automatic generation of /ActualText in the PDF output
-- and I'd be reluctant to add it, considering how much it would bloat the
output.
Basically, if you want to get the text reliably, you shouldn't be starting from
the PDF! :)
JK
--
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex