Thank you Steve! See, I knew you'd nail it. I don't want to complicate lives of others just because of one little diacritic.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Steven A Rowe <[EMAIL PROTECTED]> > To: solr-dev@lucene.apache.org > Sent: Friday, August 29, 2008 5:57:31 PM > Subject: Forrest PDF non-Latin-1 support [was: RE: prototype Solr 1.3 RC 1] > > On 08/29/2008 at 3:24 PM, Chris Hostetter wrote: > > I suspect the PDF formatter just doesn't play nicely with the > > non-trivial UTF-8 characters. > > This is an Apache FOP FAQ; from > : > > 6.2. Some characters are not displayed, or displayed > incorrectly, or displayed as "#". > > This usually means the selected font doesn't have a > glyph for the character. > > The standard text fonts supplied with Acrobat Reader have > mostly glyphs for characters from the ISO Latin 1 character > set. [...] > > If you use your own fonts, the font must have a glyph for the > desired character. Furthermore the font must be available on > the machine where the PDF is viewed or it must have been > embedded in the PDF file. [...] > > There's an open Forrest bug for this problem: > , and the discussion there > includes a link to the Cocoon documentation for embedding fonts in PDF files: > . > > This looks kinda complicated, and AFAICT would require modifications to the > Forrest installation wherever the site is built. > > I suspect that almost nobody looks at the PDF version of the "Who we are" > page > (and I sure am sorry now that I brought this up...) > > If things are left as-is, Otis's last name would be displayed properly in the > HTML, and garbled in the PDF; if the diacritic is removed, then it will be > displayed improperly in both places :) > > Steve