I have been trying to track down a definition of Canonical Combining Class 7: Nuktas (and of the other combining classes): can anyone point me in its direction? A clear definition of the Canonical Combining Clases, would presumably form the basis of an evaulation of the viability of a spacing-headline-height nukta as a separate Unicode code point. The document I posted previously, which I attach again for reference, lists printed documents in which various placements for (I hardly dare say) nuktas are used, including more than one use by more than one author, in both India and Bangladesh, of the double nukta on Ja. The document was prepared for Dr Anthony P. Stone, Project Leader, ISO/TC46/SC2/WG12 Transliteration of Indic scripts, by Abu Jar M Akkas. Judging by this document the dot is found, in the case of Perso-Arabic transcription below, to the right, or aligned with the headline. Only in the first case is it non-spacing. In one case, both below, and to the lower right are found in the same dictionary, which suggests fairly strongly that there is no real difference between those two positions, one a spacing, the other a non-spacing, form of the dot. While the details of the schemes vary slightly, they are united in the principle that the dot does the trick: in other words, the simplest representation is of a Bengali Character, with a dot. There are personal, practical and typographic preferences for where the dot should be, but these are not basic. Solaiman, I was not suggesting that the placement of the nukta should be controlled in any way, nor that it is not useful, placed at headline/matra height, nor that it has not been used in books, but merely that there doesn't seem to be much of a case for making a top nukta an additional letter in Unicode, when you can place the dot which is represented by the current code point anywhere you want in relation to grpahemes in fancy text by constructing a font with ligatures in that form. As it is, the Nukta is listed as having General Category Mn, which is a Mark, Non-Spacing. It has the Canonical Combining Class 7: Nuktas. The Top Nukta you have identified definitely has the appearance of being General Category Mc, Mark Spacing Combining. Nevertheless, the documentation also suggests that the combining classes are not to be taken literally as applied to fancy text, whcih is what your scan is: an example of real-world, fancy, text. Michael, when you say that a second nukta should be stacked on top of a first, do you mean, in principle, in in a plain text representation only - i.e., one in which, symptomatically, no conjunct forms at all would be found? That would seem fair enough. The only form of the double-nuktaed Ja that I have seen does have the nuktas side-by-side, and was prepared by Linotype. I presume this was not done without some research, taking it back to the Bose instance, probably. However, this refleects fancy text, obviously. Typographically, the priority with nuktas is to place them so that they remain distinguishable at small sizes when other elements are combined within the same grapheme. Stacking ( I presume this implies one above the other, both remaining visible) in this instance is a bit counter-productive, since it inevitably results either in an increase in line spacing, or the danger that a further stacked element will crash into an element of the line below, becoming illegible. This would apply in both plain and fancy text. Mike
_____ From: Omi Azad [mailto:[EMAIL PROTECTED] Sent: 05 August 2003 19:27 To: Solaiman Karim Cc: Paul Nelson (TYPOGRAPHY); Kenneth Whistler; [EMAIL PROTECTED] What will be the result man? Solaiman Karim wrote: hello all I don't know if I misunderstood or not but someone said it is useless to add in unicode. Someone is saying somthing which he doesn't even know what is it he is talkign about. Are you guys saying is just made up what I show to you that it is not only Arbic it is also use in english to translate it some other language such as French so and so. Please let me know if I misunderstood you guys and it seems to me that Bangla should be limited. Solaiman ----- Original Message ----- From: "Paul Nelson (TYPOGRAPHY)" <mailto:[EMAIL PROTECTED]> <[EMAIL PROTECTED]> To: "Kenneth Whistler" <mailto:[EMAIL PROTECTED]> <[EMAIL PROTECTED]> Cc: <mailto:[EMAIL PROTECTED]> <[EMAIL PROTECTED]> Sent: Monday, August 04, 2003 7:16 PM Subject: [indic] Re: Top Nukta... and double nuktas ... and more nuktas Sorry, I guess I totally misunderstood what Omi was stating then. It seems there are no less than 8 different ways to transliterate this stuff. Paul -----Original Message----- From: Kenneth Whistler [ mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> ] Sent: Monday, August 04, 2003 4:14 PM To: Paul Nelson (TYPOGRAPHY) Cc: [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> ; [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> Subject: Re: [indic] Re: Top Nukta... and double nuktas ... and more nuktas Paul said: Mike, Your proposal fails the plain text case and can only be obtained with higher level markup to set this notion of a "spacing Nukta". Why not simply encode a BENGALI SPACING NUKTA character be encoded? That removes any requirement for a higher level markup and allows for disambiuous use in plain text. Paul Nelson I don't understand this contention. What Mike was reporting is that there are two schools of practice for placing these nuktas on Bengali letters required for transcribing Arabic letters: a. place the nukta (generally) under the base letter, in a typographically pleasing location for that particular base letter's shape. b. place the nukta (roughly) on the head line, often implemented simply with a full stop as an ad hoc solution, even if placed oddly. That is what Abu Jar M Akkas reported as part of the detailed analysis of usage. Particularly if (a) is the preferred approach and (b) is considered an unpleasing typographical hack, then surely we are simply talking about the already existing Bengali nukta, which, after all, was encoded precisely for such usage in Bengali. Nobody is claiming that in *plain* text a distinction has to be maintained between a nukta which displays appropriately underneath a consonant and a nukta which displays up to the right on a head line. So I don't see any need to *encode* a new character distinction here. Plain text should be focussed on representing the text *content*, not on the fine details of its display. And in this case, it seems appropriate to me that style or font markup should be the means for indicating any such distinctions in placements of a nukta. --Ken -- Regards Omi Azad Altruists International http://www.altruists.org <http://www.altruists.org>
Subject: transliteration listing from Abu Jar M Akkas [Bengali part only - APS] Date: Sun, 24 Oct 1999 16:16:48 +0600 From: holiday <[EMAIL PROTECTED]> To: Dr Anthony P Stone <[EMAIL PROTECTED]> =========================================================== BENGALI (SCRIPT) =========================================================== The use of Bengali characters to express the Perso-Arabic script, especially by lexicographers and grammarians, is very old. The religious books, in transcriptions of the Qur'anic text, also employs some devices and this has influenced the lexicographers and grammarians to a large extent, in some cases. But the official attempt at such extension began in 1936. On May 8, 1936, the University of Calcutta adopted a resolution called "The Rules of Bengali Spelling" where Clause 21 says that j{underdot}a may be used for the English sound <z>. And this has influenced Jnanendramohan Das, the compiler of the largest dictionary of the Bengali language, to use under-dot characters in representing Perso-Arabic and other foreign characters. This work was supervised by Suniti Kumar Chatterji. But Chatterji favoured an after-dot in all the cases (and an open single quote before the vowel for the 'ayin) if under-dot characters were not available in the printing press. While some lexicographers prefer to use the original Devanagari and Assamese characters, especially for <va>, <ba> and l{underdot}a, some others prefer to use diacritised Bengali characters. But there is no standard scheme as such. In cases of representing Perso-Arabic characters with the Bengali script, Jnanendramohan Das and Suniti Kumar Chatterji's schemes are popular; most scholars prefer either of them. There are some other schemes in currency. But either an under-dot or a full-stop (afterdot) after the character is gaining grounds. I have also come across in a book on Indian languages, written by some Russian scholar(s), using n{underdash}a, r{underdash}a and l{underdash}a equating with Tamil characters. But I have never seen these characters anywhere else. Scheme 1: Characters with underdot ---------------------------------- This scheme uses an underdot with the Bengali characters to show the representation of the Perso-Arabic characters. The largest dictionary of the Bengali language "A Dictionary of the Bengali Language" <baa;mgaalaa bhaa.saara abhidhaana>, by Jnanendramohan Das (Calcutta, 1st edn. 1916; 2nd edn. 1937 [I have the 1986 reprint of the second edition at my disposal]) uses this scheme. Scheme 2: Characters with afterdot ---------------------------------- This scheme uses an afterdot with the Bengali characters to show the representation of the Perso-Arabic characters. Suniti Kumar Chatterji, in his book "On the Bengali Language" <baa;mlaa bhaa.saa prasa;nge> (Calcutta, 1975) and "A Grammar of the Bengali Language" <bhaa.saa-prakaa;sa baa;nlaa byaakara.na> (Calcutta, 1939; Rupa edn in 1988) uses this scheme. This method is, in principle, similar to the one adopted by Raj Shekhar Basu in his handy dictionary "Calantika" <calantikaa> (MC Sarkar and Sons Pvt. Ltd., Calcutta, 13th edn [This is the one I have]). Haricharan Bondyopadhyaya, in his dictionary "Bengali Lexicon" <ba;ngiiya ;sabdako.sa> (Sahitya Akademi, New Delhi, 1st edn. 1932-51 in five volumes [I have the 1966 reprint]) and "Golden Bengali Dictionary" <sonaara baa;mlaa abhidhaana> by Abdur Rahim (National Publishers, 1971) also use a dot after the character down the base line. Note on Scheme 1 and 2: ----------------------- One thing must be made clear: both the under-dot Bengali characters and the after-dot characters are essentially the same. The only differences that exist between the schemes of J. Das and SK Chatterji are: -- J. Das favours an under-dot for the vowels to represent Arabic 'ayin, but Chatterji favours the open single quote before the vowel letters; -- the second one is J. Das uses under-dot <ba> to represent the Devanagari <va> and Perso-Arabic <wa> while Chatterji uses the Assamese <va>; -- the last one is Das uses under-dot <ga> for Arabic 'ghain (but he says in the preface to his dictionary that under-dot <gha> is also possible) and Chatterji uses under-dot <gha>. Going beyond Persian and Urdu, Chatterji has also employed four conjunct characters (sb, db, tb, and jb) for Arabic <saad, dhaad, toe and zoe>, which do not have accurate representation in J. Das's scheme. Category 3: Characters with afterdot aligning the maatraa --------------------------------------------------------- In the "Islamic Encyclopaedia" <islaamii bi;sbako.sa>, published by the Islamic Foundation Bangladesh, Dhaka, in 1997, a dot on the right of the characters aligning the upper horizontal line (maatraa) is used to show the Arabic characters. Category 4: Characters with apostrophe aligning the maatraa ---------------------------------------------------------- "Lexicon of Rabbani" <farhang-e-rabbani>, an Urdu-Bengali dictionary by Siraj Rabbani, Rabbani Publications, Calcutta, 1952, uses an apostrophe after the Bengali character for the Perso-Arabic characters. But whenever it comes to long-<ii>, the apostrophe is used after the vowel mark; in other cases, the apostrophe appears before the vowel mark. Category 5: Characters with graphological imitation of Arabic ones ----------------------------------------------------------- "Arabic-Bengali Dictionary" <aarabii-baa;mlaa abhidhaana>, compiled by Md. Alauddin al-Azhari, first published by *kendriiya baa;mlaa-unna.yana bor.da (Central Bengali-language Development Board), Dhaka, in 1970 and later reprinted by the Bangla Academy, Dhaka, in 1976, uses a cumbersome way to show the Perso-Arabic script. It is mainly a graphological imitation of how the Arabic or Persian characters look. I have never seen this scheme in use anywhere else. Probably for this reason, The Dictionary of Dialects of Bangladesh, <baa;mlaade;sera aa~ncalika bhaa.saara abhidhaana>, compiled and edited by Muhammad Shahidullah and published by the Bangla Academy, Dhaka, used Perso-Arabic script in showing the etymology. Note 1: Buddhadev Bose used two characters: one to represent the Cyrillic character, representing the sound of French <j> as in <je>, or the Persian character "zha-i-farsi"; and the other to represent English <z>. Bose used the characters in the translation of the Pasternak novel Dr. Zhivago. Since then, many scholars have used the forms. Hayat Mamud, in his book "Gerasim Stepanovich Lebedev", published by the Bangla Academy in 1985, used both the characters. The first one is <ja+two under-dots> and the second one ins <ja+under-dot>. The characters perfectly go with Scheme 1. Note 2: <baa;ngaalaa ;sabdako.sa> by Yogeshchandra Rai, Bangiya Sahitya Parishad, Calcutta, 1913, uses the Devanagari cerebral <.l> and Assamese semi-vowel <va> to show the etymology. Haricharan Bandyopadhyaya uses the Assamese labial <ba> to show where it is labial in the origin. Note 3: Persian-Urdu pronunciation differs from that of the Arabic letters and Jnanendramohan Das did not have a provision to show the different characters, sounding alike, in the original scripts. Both Jnanendramohan Das and Raj Shekhar Basu used j{underdot}a for four Perso-Arabic characters and <sa> for three Perso-Arabic characters the characters (thaa' [saa'], sin and saad). But Suniti Kumar Chatterji devised a scheme to represent all these characters differently. Instances: ========== Class F (fonts and keyboard layout): ------------------------------------ The use of computers in the Bengali printing is fairly recent. There is no standard, either prescribed by the government or any private body like the Bangla Academy, prevailing in the print industry. In 1995, the government (Bangladesh Standards and Testing Institution-BSTI) was reported (in newspapers) to have worked out a standard for the fontfile, glyps and keyboard layout, but I have seen none to go by the norms. The most-widely used Bengali add-on on computers, called Bijoy created by an Apple-enthusiast, (the program runs both on IBM-compatible machines and Apple Macs) is far from adequate. It works well when it comes to day to day matters, but for scholarly purposes, it falls flat, not allowing users to create certain conjuncts. Some of the conjuncts formed using the program looks messy. This has proved very ineffective for the typesetting of a dictionary. In the case of Jnanendramohan Das's dictionary, he had a foundry cast types to serve his purpose. Suniti Kumar Chatterji had an advantage, since he used after-dots, it was easy for the typesetters to make necessary arrangements. I know nothing about the Bangla-Academy Arabic-Bengali Dictionary. But it seems the books was printed using photo-typesetting method in the original version by Central Bangla-language Development Board and later when Bangla Academy reprinted it, the case was a sort of reduced facsimile print. Letter-press ------------ It is possible to print books using all the five schemes in the traditional letter-press system, with purpose-cast types in the foundry. There is at least one book published using each of the five schemes described above. System fonts ------------ The use of the Bengali fonts (fontsets) on computers is still going without a standard. There are more than half a dozen fontsets which further compounds the matter. But, I have come across at least two fontfiles, one by a company called Southern Software Inc and the other by the Indian Centre for Development of Advanced Computing (C-DAC), that allow users to create extended characters to a limited extent. The C-DAC keyboard layout also provides for the composition of the dotted characters. Both fontfiles use underdots. But afterdots are easy to compose. Class P (published works): -------------------------- Scheme 1: --------- 1. "Doctor Zhivago" <.daaktaara j{two-underdots}ibhaago> by Boris Pasternak. In Bengali, translated by Minakshi Datta and Manavendra Bandyopadhyaya; edited and poems translated by Buddhadev Bose; Papyrus, Calcutta; 1st edn. September 1960; 2nd edn. November 1990. Instances of extended characters: j{underdot}a and j{two-underdots}a "Editor's Note" "...j{underdot}a has been used in place of the English 'z' and j{two-underdots}a in place of the French 'j' or the Russian 'zh' throughout this book." [translation mine] 2. "Baudelaire: His Poems" <bodale.yara [:] taa~ra kabitaa>. An anthology of Baudelaire poems in Bengali, translated by Buddhadev Bose; Dey's, Calcutta; 1st edn. January 1961; Dey's 2nd edn. April 1988. Instances of extended characters: j{underdot}a and j{two-underdots}a "Translator's Note" "Two new characters have been used in this book: j{underdot}a and j{two-underdots}a. j{underdot}a is pronounced like the English 'z' and j{two-underdots}a like the French 'j' (zh) or the sound of 's' as in the English word 'pleasure'." [translation mine] 3. "Gerasim Stepanovich Lebedev" <geraasima stepaanobhica li.yebedepha>. A dissertation on a Russian playwright called Lebedev in Bengali, by Hayat Mamud; Bangla Academy, Dhaka, December 1985. Instances of extended characters: j{underdot}a and j{two-underdots}a "Preface" "I have used the character j{two-underdots}a, propounded by Buddhadev Bose, in the Bengali transliteration since 's' in the English word 'pleasure' sounds like the Russian {zhe} or the French 'j'; I have also employed the character j{underdot}a in place of the English z or the Russian {ze} following the examples of Jnanendramohan Das and Buddhadeb Bose." [translation mine] 4. Charles Baudelaire: A Unique Philosopher <;saarl bodale.yaara [:] ananya dra.s.taa>. In Bengali, by Surabhi Bandyopadhyaya; Dey's Publishig, Calcutta; April 1992. Instances of extended characters: j{underdot}a and j{two-underdots}a 5. "Ghalib's Poems" <;ser-i-gaaliba>. In Bengali, a collection of Ghalib's ghazals translated by Bimalendu Majumdar; Byatikram Prakashani, Dhaka, 1998. Instances of extended characters: k{underdot}a for qaaf in <;sauk{underdot}> kh{underdot}a for KHaa' in <kh{underdot}aak> g{underdot}a for ghayin in <g{underdot}am> ph{underdot}a for faa' in <kaph{underdot}n> j{underdot}a for dhaal, ze, dhaa' and Zaa' in <maj{underdot}aa> b{underdot}a for waaw (The scheme is explained with phonetic examples in the preface to the book) Scheme 2 -------- 1. "On the Bengali Language" <baa;mlaa bhaa.saa prasa;nge> by Suniti Kumar Chatterji; Calcutta, 1975. Instances of extended characters: k{afterdot}a kh{afterdot}a gh{afterdot}a j{afterdot}a jh{afterdot}a t{afterdot}a for <toe> th{afterdot}a for <thaa'> d{afterdot}a for <dhaad> dh{afterdot}a ph{afterdot}a b{afterdot}a (Assamese <va> is preferred) bh{afterdot}a l{afterdot}a h{afterdot}a s{afterdot}a for <saad> Chatterji uses this scheme in many of his articles, published from time to time. In the preface to the dictionary "Farhang-e-Rabbani," he says that he holds brief for diacritised 'ja', such as j{afterdot}a, or j{underdot}a, for the sound of 'z' rather than <ya>. In another article, he says that as long as j{underdot}a, kh{underdot}a and Assamese <va> are not available at all the printers', writing English z and w or German ch remains a difficult task. "I propose not to cast new types like j{underdot}a or kh{underdot}a; rather English full-stop can easily do the work." 2. "A Grammar of the Bengali Language" <bhaa.saa-prakaa;sa baa;nlaa byaakara.na> by Suniti Kumar Chatterji; Calcutta, 1939; Rupa edn. 1988. Instances of extended characters: 'a, 'aa, 'i, 'ii, 'u, 'uu th{afterdot}a kh{afterdot}a dh{afterdot}a j{afterdot}a jh{afterdot}a ph{afterdot}a k{afterdot}a Assamese <va> 3. "An Introduction to the Bengali Linguistics" <baa;ngaalaa bhaa.saatattbera bhuumikaa> by Suniti Kumar Chatterji, Calcutta University, Calcutta, September 1924. Instances of extended characters: k{afterdot}a kh{afterdot}a gh{afterdot}a j{afterdot}a ph{afterdot}a l{afterdot}a Assamese <va> in padumavat (a book in Hindi literature) 4. "Languages of the World: Indo-European Family" <p,rthibiira bhaa.saa [:] indoiuropii.ya prasa;nga> by Pareshchandra Majumdar; Pashchimbangal Bangla Akademi, Calcutta, January 1997. Instances of extended characters: k{afterdot}a in k{afterdot}aasidaa kh{afterdot}a in kh{afterdot}uuba g{afterdot}a in baag{afterdot}a j{afterdot}a in cij{afterdot}a ph{afterdot}a in goph{afterdot}tan Assamese <va> in gaav-e-nar 5. "The solar eclipse in ancient literature" <praaciina saahitye suuryagraha.nera chaa.yaa> by Jyotibhushan Chaki. This is an article which was published in Desh, a Bengali-language magazine from Calcutta; it was in the August 7, 1999 issue (66 years, 20 issue). Instances of extended characters: "<suraata-aal-phalak{afterdot}a" (the name of a verse in the Qur'an) To my surprise, this is the first instance of Indic character extension I have come across in any popular magazine. Class D (dictionaries and charts): ---------------------------------- Scheme 1 -------- 1. Dictionary of the Bengali Language by Jnanendramohan Das, Sahitya Samsad, Calcutta, 1st edn. 1916; 2nd edn. 1937 (enlarged). Instances of extended characters: a{underdot} aa{underdot} i{underdot} ii{underdot} u{underdot} uu{underdot} k{underdot}a kh{underdot}a g{underdot}a (The compiler says that gh{underdot}a is also okay) j{underdot}a jh{underdot}a t{underdot}a th{underdot}a d{underdot}a dh{underdot}a ph{underdot}a b{underdot}a bh{underdot}a s{underdot}a h{underdot}a J. Das explains the full scheme in the preface to the dictionary. But there was a problem in composing the dotted characters. He had a foundry cast new under-dotted types for his dictionary. But at certain stage into the printing of the book, he needed some other new types such as jh{underdot}a, t{underdot}a, th{underdot}a, d{underdot}a, dh{underdot}a, bh{underdot}a, s{underdot}a and h{underdot}a. But at that time he did not have the space to cast these characters afresh. So he continued printing the dictionary with afterdots in many cases. 2. Bangla Academy English-Bengali Dictionary (ed. Zillur Rahman Siddiqui; Bangla Academy, Dhaka, 1st edn. August 1993; 5th rep. January 1995) Instances of the extended characters: ph{underdot}a for the English "f" as in "first" bh{underdot}a for the English "v" as in "vast" th{underdot}a for the English "th" as in "thirst" d{underdot}a for the English "th" as in "the" j{underdot}a for the English "z" as in "zone" j{two-underdots}a for the English "s" as in "vision" (The extended characters are explained in the preface with detailed examples. Among the translation dictionaries, this one is very popular in Bangladesh). 3. Samsad English Bengali Dictionary (ed. Sailendra Biswas; Sahitya Samsad, Calcutta; 1st edn. 1959; 5th edn. (15th rep.) 1995, this is the 40th impression of the reprint) Instances of the extended characters: j{underdot}a for the English /z/ in showing the pronunciation Scheme 2 -------- 1. "Calantika" <calantikaa> by Raj Shekhar Basu; MC Sarkar and Sons Pvt Ltd, Calcutta, 13th edn. Instances of the extended characters: j{afterdot}a for 'z' (The Calcutta University Spelling Regulation, that is printed at the end of the dictionary, shows j{afterdot}a, which was j{underdot}a in the original documentation.) Assamese <va> for 'w' 2. "Bengali Lexicon" <ba;ngiiya ;sabdako.sa> by Haricharan Bondyopadhyaya; Sahitya Akademi, New Delhi, 1st edn. 1932-51 in five volumes. Instances of the extended characters: j{afterdot}a kh{afterdot}a t{afterdot}a g{afterdot} Assamese <va> 3. "Golden Bengali Dictionary" <sonaara baa;mlaa abhidhaana> by Abdur Rahim; National Publishers, 1971. Instances of the extended characters: j{afterdot}a b{afterdot}a ===========================================================