MARC-XML uses Unicode Normal form D, which means that the base
character is separate from the diacritic.
I am not familiar with the MARC-XML specifications, so at the risk of
embarrasing myself would it be correct to posit that it may not be that
MARC-XML uses Unicode Normal form D, so much as
From: Christopher Morgan [mailto:[EMAIL PROTECTED]
Sent: 01 July, 2004 10:50
Subject: Displaying diacritics in a terminal vs. a browser
I use the $cs-to_utf8 conversion from MARC::Charset to
display MARC Authority records in a browser, and the
diacritics display properly
: Christopher Morgan [mailto:[EMAIL PROTECTED]
Sent: 01 July, 2004 10:50
Subject: Displaying diacritics in a terminal vs. a browser
I use the $cs-to_utf8 conversion from MARC::Charset to
display MARC Authority records in a browser, and the
diacritics display properly there.
But they don't display
From: Paul Hoffman [mailto:[EMAIL PROTECTED]
Sent: 01 July, 2004 11:57
Subject: Re: Displaying diacritics in a terminal vs. a browser
Unless I'm very much mistaken, Chris's code is outputting
UTF-8 to the terminal, not MARC-8.
From: Christopher Morgan [mailto:[EMAIL PROTECTED]
Sent
On Thu, Jul 01, 2004 at 11:22:42AM -0400, Houghton,Andrew wrote:
I'm not sure what MARC::Charset does internally, but MARC-8
defines the diacritic separate from the base character. So
even using binmode(STDOUT,:utf8) will produce two characters,
one for the base character followed by the
A MARC-8 sequence places a combining diacritical mark BEFORE the letter
it's supposed to combine. Whereas Unicode syntax is to put it AFTER the
letter it's supposed to combine with.
Hence for example the letter: Z
is produced by the MARC-8 Sequence:
75 5A (macron below + Z)
but