On Tue, 28 Aug 2012 19:03:14 -0400 CE Whitehead <cewcat...@hotmail.com> wrote:
> For Romanization (conversion to Latin characters) of Arabic, see: > http://en.wikipedia.org/wiki/Romanization_of_Arabic Likewise, a reasonable list for Hebrew can be picked up from http://en.wikipedia.org/wiki/Romanization_of_Hebrew . Should probably include <p, U+0331 COMBINING MACRON BELOW> and <g, U+0331> as variants of <p, U+0304 COMBINING MACRON> and <U+1E21 LATIN SMALL LETTER G WITH MACRON>; I'm not sure if the former may be rendered the same as the latter. The Hebrew list omits a few characters I'm used to from old Hebrew grammars: Hatephs (and shewa): U+1D43 MODIFIER LETTER SMALL A U+1D49 MODIFIER LETTER SMALL E U+1D52 MODIFIER LETTER SMALL O 'Pure' long vowels: U+00E2 LATIN SMALL LETTER A WITH CIRCUMFLEX U+00EA LATIN SMALL LETTER E WITH CIRCUMFLEX U+00EE LATIN SMALL LETTER I WITH CIRCUMFLEX U+00F4 LATIN SMALL LETTER O WITH CIRCUMFLEX U+00FB LATIN SMALL LETTER U WITH CIRCUMFLEX >From other places I pick up the following groups - duplicates are mostly pruned. General Semitic <U+1E6F LATIN SMALL LETTER T WITH LINE BELOW, U+0323 COMBINING DOT BELOW> U+026C LATIN SMALL LETTER L WITH BELT U+00E4 LATIN SMALL LETTER A WITH DIAERESIS (Ethiopian) Affricates may be written with superscript second element, so one should include: U+02E2 MODIFIER LETTER SMALL S U+1DB4 MODIFIER LETTER SMALL ESH (TBC) U+1DBE MODIFIER LETTER SMALL EZH MODIFIER LETTER SMALL LETTER L WITH BELT (Missing! But I'm sure I've seen it.) Arabic transliteration - aleph and ain: U+02BC MODIFIER LETTER APOSTROPHE U+02BD MODIFIER LETTER REVERSED COMMA U+02BE MODIFIER LETTER RIGHT HALF RING U+02BF MODIFIER LETTER LEFT HALF RING U+02C0 MODIFIER LETTER GLOTTAL STOP U+02C1 MODIFIER LETTER REVERSED GLOTTAL STOP More emphatics: U+1E05 LATIN SMALL LETTER B WITH DOT BELOW U+1E37 LATIN SMALL LETTER L WITH DOT BELOW <p, U+0323 COMBINING DOT BELOW> For Cairene Arabic, we could add, though I've only seen the IPA forms: U+1E43 LATIN SMALL LETTER M WITH DOT BELOW (TBC) U+1E5A LATIN CAPITAL LETTER R WITH DOT BELOW (TBC) Emphatics transcribed as velarised: U+1D6D LATIN SMALL LETTER D WITH MIDDLE TILDE U+1D74 LATIN SMALL LETTER S WITH MIDDLE TILDE U+1D75 LATIN SMALL LETTER T WITH MIDDLE TILDE U+1D76 LATIN SMALL LETTER Z WITH MIDDLE TILDE I therefore suspect, but do not recall seeing: U+026B LATIN SMALL LETTER L WITH MIDDLE TILDE U+1D6C LATIN SMALL LETTER B WITH MIDDLE TILDE Non-emphatic v. emphatic is a word feature rather than a segmental feature in some Arabic dialects, so it may be as well to have all the characters named as '... WITH MIDDLE TILDE'. There are several other characters that phonetic descriptions will need; see for example http://en.wikipedia.org/wiki/Arabic_phonology . Emphatics can be treated as glottalised, so for transcriptions we may also have: U+02B9 MODIFIER LETTER PRIME U+02C0 MODIFIER LETTER GLOTTAL STOP For some of the fancier Hebrew transliterations we need: U+0254 LATIN SMALL LETTER OPEN O U+02B0 MODIFIER LETTER SMALL H U+02B7 MODIFIER LETTER SMALL W (Also needed for Ethiopian Semitic) U+02B8 MODIFIER LETTER SMALL Y U+02B2 MODIFIER LETTER SMALL J (TBC) For Akkadian we need: U+00E0 LATIN SMALL LETTER A WITH GRAVE U+00E1 LATIN SMALL LETTER A WITH ACUTE U+00E8 LATIN SMALL LETTER E WITH GRAVE U+00E9 LATIN SMALL LETTER E WITH ACUTE U+00EC LATIN SMALL LETTER I WITH GRAVE U+00EC LATIN SMALL LETTER I WITH ACUTE U+00F9 LATIN SMALL LETTER U WITH GRAVE U+00FA LATIN SMALL LETTER U WITH ACUTE U+2080 SUBSCRIPT ZERO U+2081 SUBSCRIPT ONE U+2082 SUBSCRIPT TWO U+2083 SUBSCRIPT THREE U+2084 SUBSCRIPT FOUR U+2085 SUBSCRIPT FIVE U+2086 SUBSCRIPT SIX U+2087 SUBSCRIPT SEVEN U+2088 SUBSCRIPT EIGHT U+2089 SUBSCRIPT NINE U+00D7 MULTIPLICATION SIGN (Possibly just for Sumerian) Cuneiform determinatives should arguably be transliterated using mark-up. If you'd rather have them as plain text, I can pick out the following list from the examples in 'the World's Writing Systems': U+1D4F MODIFIER LETTER SMALL K U+1D35 MODIFIER LETTER CAPITAL I U+1D48 MODIFIER LETTER SMALL D U+1DA0 MODIFIER LETTER SMALL F U+1D4D MODIFIER LETTER SMALL G U+1D50 MODIFIER LETTER SMALL M U+02B3 MODIFIER LETTER SMALL R U+1D58 MODIFIER LETTER SMALL U U+2071 SUPERSCRIPT LATIN SMALL LETTER I U+1D3F MODIFIER LETTER CAPITAL R U+1D41 MODIFIER LETTER CAPITAL U and one missing character 'MODIFIER LETTER SMALL S WITH CARON'. I suppose one could substitute U+1DB4 MODIFIER LETTER SMALL ESH or use <U+02E2 MODIFIER LETTER SMALL S, U+02B0>. It would be understood, but it wouldn't look right. I'm taking ASCII punctuation for granted. In many cases, capital forms should also be added, depending on whether the writing system transliterated or transcribed to is unicameral. IPA and Akkadian transliteration are unicameral, but Akkadian transcription need not be unicameral. Note that Akkadian transliteration adds information not directly in the text - it is back-transliteration that is lossy! Richard.