RE: Displaying diacritics in a terminal vs. a browser

2004-07-06 Thread Michael D Doran
MARC-XML uses Unicode Normal form D, which means that the base character is separate from the diacritic. I am not familiar with the MARC-XML specifications, so at the risk of embarrasing myself would it be correct to posit that it may not be that MARC-XML uses Unicode Normal form D, so much as

RE: Displaying diacritics in a terminal vs. a browser

2004-07-01 Thread Houghton,Andrew
From: Christopher Morgan [mailto:[EMAIL PROTECTED] Sent: 01 July, 2004 10:50 Subject: Displaying diacritics in a terminal vs. a browser I use the $cs-to_utf8 conversion from MARC::Charset to display MARC Authority records in a browser, and the diacritics display properly

Re: Displaying diacritics in a terminal vs. a browser

2004-07-01 Thread Paul Hoffman
: Christopher Morgan [mailto:[EMAIL PROTECTED] Sent: 01 July, 2004 10:50 Subject: Displaying diacritics in a terminal vs. a browser I use the $cs-to_utf8 conversion from MARC::Charset to display MARC Authority records in a browser, and the diacritics display properly there. But they don't display

RE: Displaying diacritics in a terminal vs. a browser

2004-07-01 Thread Houghton,Andrew
From: Paul Hoffman [mailto:[EMAIL PROTECTED] Sent: 01 July, 2004 11:57 Subject: Re: Displaying diacritics in a terminal vs. a browser Unless I'm very much mistaken, Chris's code is outputting UTF-8 to the terminal, not MARC-8. From: Christopher Morgan [mailto:[EMAIL PROTECTED] Sent

Re: Displaying diacritics in a terminal vs. a browser

2004-07-01 Thread Ed Summers
On Thu, Jul 01, 2004 at 11:22:42AM -0400, Houghton,Andrew wrote: I'm not sure what MARC::Charset does internally, but MARC-8 defines the diacritic separate from the base character. So even using binmode(STDOUT,:utf8) will produce two characters, one for the base character followed by the

Re: Displaying diacritics in a terminal vs. a browser

2004-07-01 Thread Ed Summers
A MARC-8 sequence places a combining diacritical mark BEFORE the letter it's supposed to combine. Whereas Unicode syntax is to put it AFTER the letter it's supposed to combine with. Hence for example the letter: Z is produced by the MARC-8 Sequence: 75 5A (macron below + Z) but