Anyone at all familiar with bibliographical data (the MARC standards) 
knows that they can be a real pain to deal with. In this case, the 
difficulty isn't with the MARC data itself, but with the Library of 
Congress's Romanization standards and the lack of support for combining 
half marks in available fonts. I'm trying to help a client properly 
display Romanized Cyrillic from MARC data on a Unicode-enabled 
application. The ultimate problem is, I can't find an available font 
that properly supports the combining half marks FE20 and FE21.

Alan Wood lists these two on his page of fonts by ranges (a truly 
impressive collection of info, BTW, Mr. Wood):

Arial Unicode MS
   Apparently you can only get this with MS Office or Publisher these 
days--not a good solution for my client since their budget's very 
limited and they'd need it on a bunch of workstations. The most 
important issue from a technical point of view is that the marks may not 
properly combine and I don't have a copy of the font to test it myself. 
Does anyone know if these marks will properly combine with T, t, S, s, 
I, i, A, a, & U, u when using the MS font?

Naqsh
   A cursive font (not practical) and the marks don't appear to combine 
properly in any case.

Any suggestions welcomed! Is there a tool out there that will allow you 
to edit a font to add a couple of missing characters?

(A more extensive explanation of the problem follows for those who want 
the gory details.)

John Craig
Alpha-G Consulting, LLC

Gory details:
The bibliographical data in question follows the Library of Congress 
Romanization rules (see this link):

http://lcweb.loc.gov/catdir/cpso/romanization/russian.pdf

An effective conversion to Unicode for the specified Romanizations of 
these Cyrillic characters is proving elusive:

/ts/
Unicode 0426 (capital) & 0446 (lower case)
/yu/
Unicode 042E & 044E
/ya/
Unicode 042F & 044F

The specified Romanization for each of these Cyrillic characters 
includes a ligature over the top of the two Latin code points in 
question (to indicate that the Latin characters represent a single 
Cyrillic character presumably). Now, the proper Unicode sequence for 
what the Library of Congress wants (based on their own documentation of 
the correspondances between the MARC ANSEL character set and Unicode) 
requires the use of the combining half marks left-half ligature U + FE20 
and right-half ligature U + FE21:

/ts/
Unicode 0078 FE20 0077 FE21
<t> <left half ligature> <s> <right half ligature>
/yu/
Unicode 0069 FE20 0075 FE21
<i> <left half ligature> <u> <right half ligature>
/ya/
Unicode 0069 FE20 0061 FE21
<i> <left half ligature> <a> <right half ligature>

All very well, but the application can't paint it because of the lack of 
the combining half marks in the available fonts.




Reply via email to