This is a rather late reply, but I think this document should be useful:

http://www.evertype.com/standards/af/af-locales.pdf

The first few pages discuss and recommend various Yeh forms to be used,
and a recommendation for avoiding some in certain forms.

Roozbeh

On Thu, 2010-07-22 at 12:17 -0500, lingu...@artstein.org wrote:
> Hi,
> 
> This is a query I had originally sent to the Linguist List, modified  
> based on feedback I got there. I am hoping that someone in the Unicode  
> community can help resolve this.
> 
> I'm interested in knowing if there is a standard way to encode the  
> various Pashto yeh-characters in Unicode, and if so, what it is. This  
> question is a bit more complicated than it sounds, so here's the  
> background.
> 
> Pashto is written using a derivative of the Arabic script. The Arabic  
> language uses a single character for both /j/ and /i:/ sounds. Like  
> many Arabic characters, this one is composed of a base form (which  
> changes shape based on its position in a word) and dots (in this case,  
> two dots below the base form). In most of the Arabic-speaking world  
> the dots are present with both the medial and final form, though in  
> Egypt (and possibly other places) the convention is to have two dots  
> on the medial form but leave them off the final form. The standard  
> arrangement of the two dots is horizontal, but they can be placed  
> vertically or diagonally with no change in meaning.
> 
> Persian also uses a single character for /j/ and /i:/, with the  
> convention of two dots on the medial form, no dots on the final form  
> (same as in Egypt).
> 
> The two conventions for the /j/-/i:/ character were given distinct  
> code points in unicode despite the fact that they do not contrast;  
> documentation is scarce, but presumably this was done in order to  
> allow writing both Arabic and Persian in the same document. Therefore,  
> Unicode has the following code points (I'm not giving the names, but  
> rather the typical visual representation of the glyphs and typical use).
> 
> U+064A two dots medially and finally (/j/-/i:/ Arabic convention)
> U+06CC two dots medially, none finally (/j/-/i:/ Persian convention)
> 
> There are a few additional yeh-base code points defined, some of which  
> are relevant to Pashto (see below).
> 
> U+0649 no dots medially or finally (Arabic /a/ from etymological /j/)
> U+0626 hamza above medially and finally (Arabic glottal stop in  
> certain contexts)
> U+06D0 two dots medially and finally in vertical arrangement
> U+06CD tail and no dots in final position
> 
> As it so happens, there is much confusion in how these characters are  
> used in actual electronic documents, which is not surprising given  
> that U+06CC looks like U+064A in medial position but like U+0649 in  
> final position. There is an excellent article by Jonathan Kew that  
> sorts out what this means for various languages that use derivatives  
> of the Arabic script.
> 
> http://scripts.sil.org/cms/scripts/render_download.php?site_id=nrsi&format=file&media_id=arabicletterusagenotes&filename=ArabicLetterUsageNotes.pdf
> 
> Unfortunately, this article does not discuss Pashto. I have little  
> knowledge of the language, but here's what I managed to understand  
> from the inspection of a few documents and with the help of friendly  
> people on the Linguist List (and please correct me if I'm wrong).
> 
> Traditionally, Pashto used a single character with the same convention  
> as in Persian, of two dots in the medial form and none on the final  
> form, and with no significance attached to the visual arrangement of  
> the dots. The character was 3-ways ambiguous between the sounds /j/,  
> /i:/ and /e/. In recent decades (probably since the 1970s or 1980s)  
> there has been some differentiation, partly due to changes in the  
> typesetting process and partly due to a deliberate effort of the  
> Pashto Academy at the University of Peshawar, Pakistan.
> 
> One convention that has gained fairly wide acceptance is a distinction  
> between a horizontal arrangement of the dots, representing /j/ or /i:/  
> as in Arabic and Persian, and a vertical arrangement representing the  
> sound /e/. This distinction is the same as in Uighur, and the  
> character with vertical dots has been codified as U+06D0. Additional  
> conventions include a hamza (U+0626) or tail (U+06CD) to represent /j/  
> at the end of a word in certain grammatical markers. All of these are  
> quite standard by now and do not pose much of a problem.
> 
> However, a further convention appears to have arisen, which as far as  
> I can tell is unique to Pashto in that it distinguishes between /j/  
> and /i:/ (though only in word-final position):
> 
> /j/ is written with two dots medially, none finally
> /i:/ is written with two dots both medially and finally
> 
> I have never seen this codified explicitly, but this is the impression  
> I get from examining a few recent Pashto documents. Which brings me to  
> my original question, of how to represent these characters in Unicode.  
> The linguist in me notices a correspondence between sounds and Unicode  
> code points (which, given the history I have just described, is most  
> certainly accidental):
> 
> /j/ corresponds to U+06CC
> /i:/ corresponds to U+064A
> 
> The wikipedia article on the Pashto alphabet  
> http://en.wikipedia.org/wiki/Pashto_alphabet gives a different  
> correspondence, based on visual appearance:
> 
> forms with dots: U+064A (/i:/ and /j/ medially, /i:/ finally)
> forms without dots: U+0649 (only /j/ in word-final position)
> 
> And there is yet a third convention, which I encountered in an  
> electronic lexicon and also appears in the following document:
> http://www.afghanan.net/pashto/pashto%20alifba.pdf
> 
> U+06CC: medial forms with dots (/i:/ and /j/) and dotless final form (/j/)
> U+064A: final form with dots (/i:/)
> 
> To wrap up, are my observations about the Pashto writing conventions  
> correct? And is there a standard for assigning the Pashto characters  
> representing /j/ and /i:/ to Unicode code points?
> 
> -Ron.
> 
> 


Reply via email to