Anyone at all familiar with bibliographical data (the MARC standards) knows that they can be a real pain to deal with. In this case, the difficulty isn't with the MARC data itself, but with the Library of Congress's Romanization standards and the lack of support for combining half marks in available fonts. I'm trying to help a client properly display Romanized Cyrillic from MARC data on a Unicode-enabled application. The ultimate problem is, I can't find an available font that properly supports the combining half marks FE20 and FE21.
Alan Wood lists these two on his page of fonts by ranges (a truly impressive collection of info, BTW, Mr. Wood): Arial Unicode MS Apparently you can only get this with MS Office or Publisher these days--not a good solution for my client since their budget's very limited and they'd need it on a bunch of workstations. The most important issue from a technical point of view is that the marks may not properly combine and I don't have a copy of the font to test it myself. Does anyone know if these marks will properly combine with T, t, S, s, I, i, A, a, & U, u when using the MS font? Naqsh A cursive font (not practical) and the marks don't appear to combine properly in any case. Any suggestions welcomed! Is there a tool out there that will allow you to edit a font to add a couple of missing characters? (A more extensive explanation of the problem follows for those who want the gory details.) John Craig Alpha-G Consulting, LLC Gory details: The bibliographical data in question follows the Library of Congress Romanization rules (see this link): http://lcweb.loc.gov/catdir/cpso/romanization/russian.pdf An effective conversion to Unicode for the specified Romanizations of these Cyrillic characters is proving elusive: /ts/ Unicode 0426 (capital) & 0446 (lower case) /yu/ Unicode 042E & 044E /ya/ Unicode 042F & 044F The specified Romanization for each of these Cyrillic characters includes a ligature over the top of the two Latin code points in question (to indicate that the Latin characters represent a single Cyrillic character presumably). Now, the proper Unicode sequence for what the Library of Congress wants (based on their own documentation of the correspondances between the MARC ANSEL character set and Unicode) requires the use of the combining half marks left-half ligature U + FE20 and right-half ligature U + FE21: /ts/ Unicode 0078 FE20 0077 FE21 <t> <left half ligature> <s> <right half ligature> /yu/ Unicode 0069 FE20 0075 FE21 <i> <left half ligature> <u> <right half ligature> /ya/ Unicode 0069 FE20 0061 FE21 <i> <left half ligature> <a> <right half ligature> All very well, but the application can't paint it because of the lack of the combining half marks in the available fonts.