My understanding of the Indian scripts coded in Unicode, is that the mapping
from ISCII to Unicode is not straightforward one-to-one, because ISCII uses a
contextual encoding for characters (allowing shifts between several scripts) and
some rich-text features.

The ISCII character model is not exactly the same as the Unicode character
model, even though there was an attempt to make this mapping as simple as
possible by allocating the Unicode code points for each individual
ISCII-supported script in the same relative order, leaving gaps in the
Unicode-encoded scripts for ISCII characters that are not used in one specific
script.

The good reference for how Indian scripts are coded in Unicode is Chapter 9 of
the Unicode 4 reference:
http://www.unicode.org/versions/Unicode4.0.0/ch09.pdf
In summary with Unicode, the model for Devenagari:
- uses consonnantal letters with an implied (default) vowel A, modified by the
next coded dependant vowel sign (matra) that create graphic conjuncts with the
consonnant, or
- uses half-forms of consonnants to drop the implied vowel in initial
consonnants, or
- uses a virama (halant) U+094D, to mark other omissions of the implied vowel on
dead consonnant letters (most often on final consonnants, but this occurs as
well on initial or medial consonnants), by removing the final stem of the full
(live) consonnant that is normally used to depict also a phonetic syllable
boundary with a necessary vowel. So the virama allows creating conjuncts with
other following dead consonnants or live consonnants, and normally attaches both
consonnant letters into the same syllable or conjunct.
- in some cases, the omission of the implied dependant vowel must not create a
ligated conjunct, so the virama still needs to represent the omission of the
vowel without creating a conjunct that would break the perceived phonetic, and a
ZWNJ is used between the dead consonnant (consonnant letter+virama) and the next
live consonnant.

There's a U+0905 pseudo-consonnant /a/ which is used in absence of a phonetic
consonnant, but it follows the same encoding rule as other consonnant letters
/*a/, i.e. coding another isolated vowel requires coding /a/ before the vowel
sign (matra). This encodes approximately the same thing as isolated vowels,
except that the intended rendering is different.

U+0904 DEVANAGARI LETTER SHORT A is used only for the case of an independant
vowel. It can be "viewed" as a conjunct of the independant vowel U+0905
DEVANAGARI LETTER A and the dependant vowel sign U+0946 DEVANAGARI VOWEL SIGN
SHORT E (noted "for transcribing Dravidian vowels" in the Unicode charts). I
don't know why this is not documented, because I can find various sources that
use <U+0904> or <U+0905,U+0946> which have exactly the same rendering and
probably the same meaning and usage. I think that U+0946 was added in ISCII 1991
but was absent from ISCII 1988 (verify, I don't have the ISCII 1988 reference
document), so U+0904 has survived just to allow a mostly one-to-one mapping with
ISCII 1988. But the addition of U+0946

May be I'm wrong here, and there's some reasons for this choice. there's no
canonical or compatibility equivalence defined between <U+0904> and
<U+0905,U+0946> (I think it's too late to define it: ISCII 1988 has been used
consistently before, and the Unicode stability policy forbids now defining now
new equivalences between them).

----- Original Message ----- 
From: "Ernest Cline" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Monday, February 16, 2004 6:28 AM
Subject: Devanagari Letter Short A


> I've been trying to make sense of the Indian scripts, but am
> having one small difficulty.  I can't seem to find the ISCII 1991
> equivalent for U+0904 (DEVANAGARI LETTER SHORT A).
>
> Is this a character that is part of the set accessed by the
> extended code (xF0) or was this part of the ISCII 1988
> standard that did not survive the changes to ISCII 1991?
>
> Alternatively, does ISCII encode this as xA4 + xE0 as this
> would seem to generate the proper glyph even tho it
> violates the syllable grammar given in Section 8 of ISCII?
>
> Or even more alternatively, am I just missing something
> that should be obvious, but which  for some reason I can't see?
> Even with the slight differences in the naming conventions
> between ISCII and Unicode, I don't seem to be misplacing
> any of the other vowels or consonants.
>
> Ernest Cline
> [EMAIL PROTECTED]


Reply via email to