At Fri, 23 Mar 2001 00:13:33 -0800, Rick McGowan <[EMAIL PROTECTED]> 
wrote:
>David Starner wrote:
>
>> I have a copy of Shellbear's Practical Malay Grammar that I'm preparing
>> to transcribe for Project Gutenberg. Unfortunately, he represents 
>>the
>> Malaysian alphabet in a Latin transliteration that includes ng as 
>>a
>> single ligatured form, and I don't know how to transcribe in Unicode.
>
>Could you perhaps post or point to a picture of what it looks like? 
> I  
>suppose it's an "N" with a loopy tail of some type.

More like rg. A picture is attached. (Was attached. Rick probably has a 
copy,
but it seems to have got lost between here and the Unicode mailing list.)

>The character you are looking for is probably U+014B in lowercase or 
>U+014A in uppercase.  I would be rather surprised if that's not what 
>you're  
>looking for.

It's not exactly what I was looking for. I may just use it and make 
a note that the glyph is probably not exactly right.

>BTW, a bit off topic here but: I think it's high time that Project 
>Gutenberg adopted some very clear character encoding guidelines now 
>that  
>they're expanding so widely.  Or have they already adopted them and 
>I've  
>just missed the policy statement...?  They're in for a real mess if 
>they  
>don't specify character encodings in a very controlled way.

At some points, they are already a real mess. You can dig 
through Gutenberg archives and find various (unlabeled) 
encodings for the Latin-1 coverage. There's at least one 
Japenese document that just says "you need a Japenese 
OS to read this." 8-bit documents are usually labeled as
8-bit, without any indication of encoding. The Bulgarian files
are clearlly labeled Windows-1251, at least.

OTOH, the policy of doing everything possible in ASCII has
saved Gutenberg some problems. They're moving towards
Unicode for any files that can't be released in a standard 
8-bit encoding (and a few that can are double released), 
and a number of new books are being released in both 
ASCII and Unicode editions.

See
ftp://metalab.unc.edu/pub/docs/books/gutenberg/GUTINDEX.02
and GUTINDEX.01 for recent examples. Most of the unmarked
stuff is ASCII, but there's a number of clearly Unicode marked
and "8-bit German" marked files.

-- 
David Starner - [EMAIL PROTECTED]
Free, encrypted, secure Web-based email at www.hushmail.com

Reply via email to