Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3 RC 1]

Steven A Rowe Fri, 29 Aug 2008 14:58:27 -0700

On 08/29/2008 at 3:24 PM, Chris Hostetter wrote:
> I suspect the PDF formatter just doesn't play nicely with the
> non-trivial UTF-8 characters.


This is an Apache FOP FAQ; from 
<http://xmlgraphics.apache.org/fop/faq.html#pdf-characters>:

   6.2. Some characters are not displayed, or displayed
        incorrectly, or displayed as "#".

   This usually means the selected font doesn't have a
   glyph for the character.

   The standard text fonts supplied with Acrobat Reader have
   mostly glyphs for characters from the ISO Latin 1 character
   set. [...]

   If you use your own fonts, the font must have a glyph for the
   desired character. Furthermore the font must be available on
   the machine where the PDF is viewed or it must have been
   embedded in the PDF file. [...]

There's an open Forrest bug for this problem: 
<https://issues.apache.org/jira/browse/FOR-132>, and the discussion there 
includes a link to the Cocoon documentation for embedding fonts in PDF files: 
<http://cocoon.apache.org/2.1/userdocs/pdf-serializer.html#FOP+and+Embedding+Fonts>.

This looks kinda complicated, and AFAICT would require modifications to the 
Forrest installation wherever the site is built.

I suspect that almost nobody looks at the PDF version of the "Who we are" page 
(and I sure am sorry now that I brought this up...)

If things are left as-is, Otis's last name would be displayed properly in the 
HTML, and garbled in the PDF; if the diacritic is removed, then it will be 
displayed improperly in both places :)

Steve

Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3 RC 1]

Reply via email to