RE: [indic] Re: Top Nukta... and double nuktas ... and more nuktas

Mike Meir Thu, 14 Aug 2003 13:06:33 -0700

I have been trying to track down a definition of Canonical Combining
Class 7: Nuktas (and of the other combining classes): can anyone point
me in its direction? A clear definition of the Canonical Combining
Clases, would presumably form the basis of an evaulation of the
viability of a spacing-headline-height nukta as a separate Unicode code
point.
 
The document I posted previously,  which I attach again for reference,
lists printed documents in which various placements for (I hardly dare
say) nuktas are used, including more than one use by more than one
author, in both India and Bangladesh, of the double nukta on Ja. The
document was prepared for Dr Anthony P. Stone, Project Leader,
ISO/TC46/SC2/WG12 Transliteration of Indic scripts, by Abu Jar M Akkas.
 
Judging by this document the dot is found, in the case of Perso-Arabic
transcription below, to the right, or aligned with the headline. Only in
the first case is it non-spacing. In one case, both below, and to the
lower right are found in the same dictionary, which suggests fairly
strongly that there is no real difference between those two positions,
one a spacing, the other a non-spacing, form of the dot.
 
While the details of the schemes vary slightly, they are united in the
principle that the dot does the trick: in other words, the simplest
representation is of a Bengali Character, with a dot. There are
personal, practical and typographic preferences for where the dot should
be, but these are not basic. 
 
Solaiman, I was not suggesting that the placement of the nukta should be
controlled in any way, nor that it is not useful, placed at
headline/matra height, nor that it has not been used in books, but
merely that there doesn't seem to be much of a case for making a top
nukta an additional letter in Unicode, when you can place the dot which
is represented by the current code point anywhere you want in relation
to grpahemes in fancy text by constructing a font with ligatures in that
form.
 
As it is, the Nukta is listed as having  General Category Mn, which is a
Mark, Non-Spacing. It has the Canonical Combining Class 7: Nuktas. The
Top Nukta you have identified definitely has the appearance of being
General Category Mc, Mark Spacing Combining. Nevertheless, the
documentation also suggests that the combining classes are not to be
taken literally as applied to fancy text, whcih is what your scan is: an
example of real-world, fancy, text.     
 
Michael, when you say that a second nukta should be stacked on top of a
first, do you mean, in principle, in in a plain text representation only
- i.e., one in which, symptomatically, no conjunct forms at all would be
found? That would seem fair enough.
 
The only form of the double-nuktaed Ja that I have seen does have the
nuktas side-by-side, and was prepared by Linotype. I presume this was
not done without some research, taking it back to the Bose instance,
probably. However, this refleects fancy text, obviously.
 
Typographically, the priority with nuktas is to place them so that they
remain distinguishable at small sizes when other elements are combined
within the same grapheme. Stacking ( I presume this implies one above
the other, both remaining visible) in this instance is a bit
counter-productive, since it inevitably results either in an increase in
line spacing, or the danger that a further stacked element will crash
into an element of the line below, becoming illegible. This would apply
in both plain and fancy text.     
 
 
Mike

  _____  

From: Omi Azad [mailto:[EMAIL PROTECTED] 
Sent: 05 August 2003 19:27
To: Solaiman Karim
Cc: Paul Nelson (TYPOGRAPHY); Kenneth Whistler; [EMAIL PROTECTED]

What will be the result man?

Solaiman Karim wrote:

hello all

   I don't know if I misunderstood or not but someone said it is useless
to

add in unicode. Someone is saying somthing which he doesn't even know
what

is it he is talkign about. Are you guys saying is just made up what I
show

to you that it is not only Arbic it is also use in english to translate
it

some other language such as French so and so. Please let me know if I

misunderstood you guys and it seems to me that Bangla should be limited.

Solaiman

----- Original Message -----

From: "Paul Nelson (TYPOGRAPHY)"   <mailto:[EMAIL PROTECTED]>
<[EMAIL PROTECTED]>

To: "Kenneth Whistler"   <mailto:[EMAIL PROTECTED]> <[EMAIL PROTECTED]>

Cc:   <mailto:[EMAIL PROTECTED]> <[EMAIL PROTECTED]>

Sent: Monday, August 04, 2003 7:16 PM

Subject: [indic] Re: Top Nukta... and double nuktas ... and more nuktas

Sorry,

I guess I totally misunderstood what Omi was stating then.

It seems there are no less than 8 different ways to transliterate this

stuff.

Paul

-----Original Message-----

From: Kenneth Whistler [ mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
]

Sent: Monday, August 04, 2003 4:14 PM

To: Paul Nelson (TYPOGRAPHY)

Cc:  [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> ;  [EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]> 

Subject: Re: [indic] Re: Top Nukta... and double nuktas ... and more

nuktas

Paul said:

Mike,

Your proposal fails the plain text case and can only be obtained with

higher level markup to set this notion of a "spacing Nukta". Why not

simply encode a BENGALI SPACING NUKTA character be encoded? That

removes any requirement for a higher level markup and allows for

disambiuous use in plain text.

Paul Nelson

I don't understand this contention. What Mike was reporting is that

there are two schools of practice for placing these nuktas on Bengali

letters required for transcribing Arabic

letters:

  a. place the nukta (generally) under the base letter, in

     a typographically pleasing location for that particular

     base letter's shape.

  b. place the nukta (roughly) on the head line, often

     implemented simply with a full stop as an ad hoc

     solution, even if placed oddly.

That is what Abu Jar M Akkas reported as part of the detailed analysis

of usage.

Particularly if (a) is the preferred approach and (b) is considered an

unpleasing typographical hack, then surely we are simply talking about

the already existing Bengali nukta, which, after all, was encoded

precisely for such usage in Bengali.

Nobody is claiming that in *plain* text a distinction has to be

maintained between a nukta which displays appropriately underneath a

consonant and a nukta which displays up to the right on a head line. So

I don't see any need to

*encode* a new character distinction here.

Plain text should be focussed on representing the text *content*, not on

the fine details of its display. And in this case, it seems appropriate

to me that style or font markup should be the means for indicating any

such distinctions in placements of a nukta.

--Ken

-- 

Regards
Omi Azad
Altruists International
http://www.altruists.org <http://www.altruists.org>

Subject: transliteration listing from Abu Jar M Akkas [Bengali part only -  APS]
Date: Sun, 24 Oct 1999 16:16:48 +0600
From: holiday <[EMAIL PROTECTED]>
To: Dr Anthony P Stone <[EMAIL PROTECTED]>

===========================================================
BENGALI (SCRIPT)
===========================================================

The use of Bengali characters to express the Perso-Arabic script, especially
by lexicographers and grammarians, is very old. The religious books, in
transcriptions of the Qur'anic text, also employs some devices and this has
influenced the lexicographers and grammarians to a large extent, in some
cases.

But the official attempt at such extension began in 1936. On May 8, 1936,
the University of Calcutta adopted a resolution called "The Rules of Bengali
Spelling" where Clause 21 says that j{underdot}a may be used for the English
sound <z>. And this has influenced Jnanendramohan Das, the compiler of the
largest dictionary of the Bengali language, to use under-dot characters in
representing Perso-Arabic and other foreign characters. This work was
supervised by Suniti Kumar Chatterji. But Chatterji favoured an after-dot in
all the cases (and an open single quote before the vowel for the 'ayin) if
under-dot characters were not available in the printing press.

While some lexicographers prefer to use the original Devanagari and Assamese
characters, especially for <va>, <ba> and l{underdot}a, some others prefer
to use diacritised Bengali characters. But there is no standard scheme as
such. In cases of representing Perso-Arabic characters with the Bengali
script, Jnanendramohan Das and Suniti Kumar Chatterji's schemes are popular;
most scholars prefer either of them. There are some other schemes in
currency. But either an under-dot or a full-stop (afterdot) after the
character is gaining grounds.

I have also come across in a book on Indian languages, written by some
Russian scholar(s), using n{underdash}a, r{underdash}a and l{underdash}a
equating with Tamil characters. But I have never seen these characters
anywhere else.

Scheme 1: Characters with underdot
----------------------------------
This scheme uses an underdot with the Bengali characters to show the
representation of the Perso-Arabic characters. The largest dictionary of the
Bengali language "A Dictionary of the Bengali Language" <baa;mgaalaa
bhaa.saara abhidhaana>, by Jnanendramohan Das (Calcutta, 1st edn. 1916; 2nd
edn. 1937 [I have the 1986 reprint of the second edition at my disposal])
uses this scheme.

Scheme 2: Characters with afterdot
----------------------------------
This scheme uses an afterdot with the Bengali characters to show the
representation of the Perso-Arabic characters. Suniti Kumar Chatterji, in
his book "On the Bengali Language" <baa;mlaa bhaa.saa prasa;nge> (Calcutta,
1975) and "A Grammar of the Bengali Language" <bhaa.saa-prakaa;sa baa;nlaa
byaakara.na> (Calcutta, 1939; Rupa edn in 1988) uses this scheme. This
method is, in principle, similar to the one adopted by Raj Shekhar Basu in
his handy dictionary "Calantika" <calantikaa> (MC Sarkar and Sons Pvt. Ltd.,
Calcutta, 13th edn [This is the one I have]). Haricharan Bondyopadhyaya, in
his dictionary "Bengali Lexicon" <ba;ngiiya ;sabdako.sa> (Sahitya Akademi,
New Delhi, 1st edn. 1932-51 in five volumes [I have the 1966 reprint]) and
"Golden Bengali Dictionary" <sonaara baa;mlaa abhidhaana> by Abdur Rahim
(National Publishers, 1971) also use a dot after the character down the base
line.

Note on Scheme 1 and 2:
-----------------------
One thing must be made clear: both the under-dot Bengali characters and the
after-dot characters are essentially the same. The only differences that
exist between the schemes of J. Das and SK Chatterji are:

-- J. Das favours an under-dot for the vowels to represent Arabic 'ayin, but
Chatterji favours the open single quote before the vowel letters;

-- the second one is J. Das uses under-dot <ba> to represent the Devanagari
<va> and Perso-Arabic <wa> while Chatterji uses the Assamese <va>;

-- the last one is Das uses under-dot <ga> for Arabic 'ghain (but he says in
the preface to his dictionary that under-dot <gha> is also possible) and
Chatterji uses under-dot <gha>. Going beyond Persian and Urdu, Chatterji has
also employed four conjunct characters (sb, db, tb, and jb) for Arabic
<saad, dhaad, toe and zoe>, which do not have accurate representation in J.
Das's scheme.

Category 3: Characters with afterdot aligning the maatraa
---------------------------------------------------------
In the "Islamic Encyclopaedia" <islaamii bi;sbako.sa>, published by the
Islamic Foundation Bangladesh, Dhaka, in 1997, a dot on the right of the
characters aligning the upper horizontal line (maatraa) is used to show the
Arabic characters.

Category 4: Characters with apostrophe aligning the maatraa
----------------------------------------------------------
"Lexicon of Rabbani" <farhang-e-rabbani>, an Urdu-Bengali dictionary by
Siraj Rabbani, Rabbani Publications, Calcutta, 1952, uses an apostrophe
after the Bengali character for the Perso-Arabic characters. But whenever it
comes to long-<ii>, the apostrophe is used after the vowel mark; in other
cases, the apostrophe appears before the vowel mark.

Category 5: Characters with graphological imitation of Arabic ones
-----------------------------------------------------------
"Arabic-Bengali Dictionary" <aarabii-baa;mlaa abhidhaana>, compiled by Md.
Alauddin al-Azhari, first published by *kendriiya baa;mlaa-unna.yana bor.da
(Central Bengali-language Development Board), Dhaka, in 1970 and later
reprinted by the Bangla Academy, Dhaka, in 1976, uses a cumbersome way to
show the Perso-Arabic script. It is mainly a graphological imitation of how
the Arabic or Persian characters look.

I have never seen this scheme in use anywhere else. Probably for this
reason, The Dictionary of Dialects of Bangladesh, <baa;mlaade;sera
aa~ncalika bhaa.saara abhidhaana>, compiled and edited by Muhammad
Shahidullah and published by the Bangla Academy, Dhaka, used Perso-Arabic
script in showing the etymology.

Note 1:
Buddhadev Bose used two characters: one to represent the Cyrillic character,
representing the sound of French <j> as in <je>, or the Persian character
"zha-i-farsi"; and the other to represent English <z>. Bose used the
characters in the translation of the Pasternak novel Dr. Zhivago. Since
then, many scholars have used the forms. Hayat Mamud, in his book "Gerasim
Stepanovich Lebedev", published by the Bangla Academy in 1985, used both the
characters. The first one is <ja+two under-dots> and the second one ins
<ja+under-dot>. The characters perfectly go with Scheme 1.

Note 2:
<baa;ngaalaa ;sabdako.sa> by Yogeshchandra Rai, Bangiya Sahitya Parishad,
Calcutta, 1913, uses the Devanagari cerebral <.l> and Assamese semi-vowel
<va> to show the etymology. Haricharan Bandyopadhyaya uses the Assamese
labial <ba> to show where it is labial in the origin.

Note 3:
Persian-Urdu pronunciation differs from that of the Arabic letters and
Jnanendramohan Das did not have a provision to show the different
characters, sounding alike, in the original scripts. Both Jnanendramohan Das
and Raj Shekhar Basu used j{underdot}a for four Perso-Arabic characters and
<sa> for three Perso-Arabic characters the characters (thaa' [saa'], sin and
saad). But Suniti Kumar Chatterji devised a scheme to represent all these
characters differently.

Instances:
==========
Class F (fonts and keyboard layout):
------------------------------------
The use of computers in the Bengali printing is fairly recent. There is no
standard, either prescribed by the government or any private body like the
Bangla Academy, prevailing in the print industry.

In 1995, the government (Bangladesh Standards and Testing Institution-BSTI)
was reported (in newspapers) to have worked out a standard for the fontfile,
glyps and keyboard layout, but I have seen none to go by the norms. The
most-widely used Bengali add-on on computers, called Bijoy created by an
Apple-enthusiast, (the program runs both on IBM-compatible machines and
Apple Macs) is far from adequate. It works well when it comes to day to day
matters, but for scholarly purposes, it falls flat, not allowing users to
create certain conjuncts. Some of the conjuncts formed using the program
looks messy. This has proved very ineffective for the typesetting of a
dictionary.

In the case of Jnanendramohan Das's dictionary, he had a foundry cast types
to serve his purpose. Suniti Kumar Chatterji had an advantage, since he used
after-dots, it was easy for the typesetters to make necessary arrangements.
I know nothing about the Bangla-Academy Arabic-Bengali Dictionary. But it
seems the books was printed using photo-typesetting method in the original
version by Central Bangla-language Development Board and later when Bangla
Academy reprinted it, the case was a sort of reduced facsimile print.

Letter-press
------------
It is possible to print books using all the five schemes in the traditional
letter-press system, with purpose-cast types in the foundry. There is at
least one book published using each of the five schemes described above.

System fonts
------------
The use of the Bengali fonts (fontsets) on computers is still going without
a standard. There are more than half a dozen fontsets which further
compounds the matter. But, I have come across at least two fontfiles, one by
a company called Southern Software Inc and the other by the Indian Centre
for Development of Advanced Computing (C-DAC), that allow users to create
extended characters to a limited extent. The C-DAC keyboard layout also
provides for the composition of the dotted characters. Both fontfiles use
underdots. But afterdots are easy to compose.

Class P (published works):
--------------------------
Scheme 1:
---------
1. "Doctor Zhivago" <.daaktaara j{two-underdots}ibhaago> by Boris Pasternak.
In Bengali, translated by Minakshi Datta and Manavendra Bandyopadhyaya;
edited and poems translated by Buddhadev Bose; Papyrus, Calcutta; 1st edn.
September 1960; 2nd edn. November 1990.

Instances of extended characters:

j{underdot}a and j{two-underdots}a

"Editor's Note"
"...j{underdot}a has been used in place of the English 'z' and
j{two-underdots}a in place of the French 'j' or the Russian 'zh' throughout
this book." [translation mine]

2. "Baudelaire: His Poems" <bodale.yara [:] taa~ra kabitaa>. An anthology of
Baudelaire poems in Bengali, translated by Buddhadev Bose; Dey's, Calcutta;
1st edn. January 1961; Dey's 2nd edn. April 1988.

Instances of extended characters:

j{underdot}a and j{two-underdots}a

"Translator's Note"
"Two new characters have been used in this book: j{underdot}a and
j{two-underdots}a. j{underdot}a is pronounced like the English 'z' and
j{two-underdots}a like the French 'j' (zh) or the sound of 's' as in the
English word 'pleasure'." [translation mine]

3. "Gerasim Stepanovich Lebedev" <geraasima stepaanobhica li.yebedepha>. A
dissertation on a Russian playwright called Lebedev in Bengali, by Hayat
Mamud; Bangla Academy, Dhaka, December 1985.

Instances of extended characters:

j{underdot}a and j{two-underdots}a

"Preface"
"I have used the character j{two-underdots}a, propounded by Buddhadev Bose,
in the Bengali transliteration since 's' in the English word 'pleasure'
sounds like the Russian {zhe} or the French 'j'; I have also employed the
character j{underdot}a in place of the English z or the Russian {ze}
following the examples of Jnanendramohan Das and Buddhadeb Bose."
[translation mine]

4. Charles Baudelaire: A Unique Philosopher <;saarl bodale.yaara [:] ananya
dra.s.taa>. In Bengali, by Surabhi Bandyopadhyaya; Dey's Publishig,
Calcutta; April 1992.

Instances of extended characters:

j{underdot}a and j{two-underdots}a

5. "Ghalib's Poems" <;ser-i-gaaliba>. In Bengali, a collection of Ghalib's
ghazals translated by Bimalendu Majumdar; Byatikram Prakashani, Dhaka, 1998.

Instances of extended characters:

k{underdot}a for qaaf in <;sauk{underdot}>
kh{underdot}a for KHaa' in <kh{underdot}aak>
g{underdot}a for ghayin in <g{underdot}am>
ph{underdot}a for faa' in <kaph{underdot}n>
j{underdot}a for dhaal, ze, dhaa' and Zaa' in <maj{underdot}aa>
b{underdot}a for waaw

(The scheme is explained with phonetic examples in the preface to the book)

Scheme 2
--------

1. "On the Bengali Language" <baa;mlaa bhaa.saa prasa;nge> by Suniti Kumar
Chatterji; Calcutta, 1975.

Instances of extended characters:

k{afterdot}a
kh{afterdot}a
gh{afterdot}a
j{afterdot}a
jh{afterdot}a
t{afterdot}a for <toe>
th{afterdot}a for <thaa'>
d{afterdot}a for <dhaad>
dh{afterdot}a
ph{afterdot}a
b{afterdot}a (Assamese <va> is preferred)
bh{afterdot}a
l{afterdot}a
h{afterdot}a
s{afterdot}a for <saad>

Chatterji uses this scheme in many of his articles, published from time to
time. In the preface to the dictionary "Farhang-e-Rabbani," he says that he
holds brief for diacritised 'ja', such as j{afterdot}a, or j{underdot}a, for
the sound of 'z' rather than <ya>.

In another article, he says that as long as j{underdot}a, kh{underdot}a and
Assamese <va> are not available at all the printers', writing English z and
w or German ch remains a difficult task. "I propose not to cast new types
like j{underdot}a or kh{underdot}a; rather English full-stop can easily do
the work."

2. "A Grammar of the Bengali Language" <bhaa.saa-prakaa;sa baa;nlaa
byaakara.na> by Suniti Kumar Chatterji; Calcutta, 1939; Rupa edn. 1988.

Instances of extended characters:

'a, 'aa, 'i, 'ii, 'u, 'uu
th{afterdot}a
kh{afterdot}a
dh{afterdot}a
j{afterdot}a
jh{afterdot}a
ph{afterdot}a
k{afterdot}a
Assamese <va>

3. "An Introduction to the Bengali Linguistics" <baa;ngaalaa
bhaa.saatattbera bhuumikaa> by Suniti Kumar Chatterji, Calcutta University,
Calcutta, September 1924.

Instances of extended characters:

k{afterdot}a
kh{afterdot}a
gh{afterdot}a
j{afterdot}a
ph{afterdot}a
l{afterdot}a
Assamese <va> in padumavat (a book in Hindi literature)

4. "Languages of the World: Indo-European Family" <p,rthibiira bhaa.saa [:]
indoiuropii.ya prasa;nga> by Pareshchandra Majumdar; Pashchimbangal Bangla
Akademi, Calcutta, January 1997.

Instances of extended characters:

k{afterdot}a in k{afterdot}aasidaa
kh{afterdot}a in kh{afterdot}uuba
g{afterdot}a in baag{afterdot}a
j{afterdot}a in cij{afterdot}a
ph{afterdot}a in goph{afterdot}tan
Assamese <va> in gaav-e-nar

5. "The solar eclipse in ancient literature" <praaciina saahitye
suuryagraha.nera chaa.yaa> by Jyotibhushan Chaki. This is an article which
was published in Desh, a Bengali-language magazine from Calcutta; it was in
the August 7, 1999 issue (66 years, 20 issue).

Instances of extended characters:

"<suraata-aal-phalak{afterdot}a" (the name of a verse in the Qur'an)

To my surprise, this is the first instance of Indic character extension I
have come across in any popular magazine.

Class D (dictionaries and charts):
----------------------------------
Scheme 1
--------

1. Dictionary of the Bengali Language by Jnanendramohan Das, Sahitya Samsad,
Calcutta, 1st edn. 1916; 2nd edn. 1937 (enlarged).

Instances of extended characters:

a{underdot}
aa{underdot}
i{underdot}
ii{underdot}
u{underdot}
uu{underdot}
k{underdot}a
kh{underdot}a
g{underdot}a (The compiler says that gh{underdot}a is also okay)
j{underdot}a
jh{underdot}a
t{underdot}a
th{underdot}a
d{underdot}a
dh{underdot}a
ph{underdot}a
b{underdot}a
bh{underdot}a
s{underdot}a
h{underdot}a

J. Das explains the full scheme in the preface to the dictionary. But there
was a problem in composing the dotted characters. He had a foundry cast new
under-dotted types for his dictionary. But at certain stage into the
printing of the book, he needed some other new types such as jh{underdot}a,
t{underdot}a, th{underdot}a, d{underdot}a, dh{underdot}a, bh{underdot}a,
s{underdot}a and h{underdot}a. But at that time he did not have the space to
cast these characters afresh. So he continued printing the dictionary with
afterdots in many cases.

2. Bangla Academy English-Bengali Dictionary (ed. Zillur Rahman Siddiqui;
Bangla Academy, Dhaka, 1st edn. August 1993; 5th rep. January 1995)

Instances of the extended characters:

ph{underdot}a for the English "f" as in "first"
bh{underdot}a for the English "v" as in "vast"
th{underdot}a for the English "th" as in "thirst"
d{underdot}a for the English "th" as in "the"
j{underdot}a for the English "z" as in "zone"
j{two-underdots}a for the English "s" as in "vision"

(The extended characters are explained in the preface with detailed
examples. Among the translation dictionaries, this one is very popular in
Bangladesh).

3. Samsad English Bengali Dictionary (ed. Sailendra Biswas; Sahitya Samsad,
Calcutta; 1st edn. 1959; 5th edn. (15th rep.) 1995, this is the 40th
impression of the reprint)

Instances of the extended characters:

j{underdot}a for the English /z/ in showing the pronunciation

Scheme 2
--------
1. "Calantika" <calantikaa> by Raj Shekhar Basu; MC Sarkar and Sons Pvt Ltd,
Calcutta, 13th edn.

Instances of the extended characters:

j{afterdot}a for 'z'
(The Calcutta University Spelling Regulation, that is printed at the end of
the dictionary, shows j{afterdot}a, which was j{underdot}a in the original
documentation.)

Assamese <va> for 'w'

2. "Bengali Lexicon" <ba;ngiiya ;sabdako.sa> by Haricharan Bondyopadhyaya;
Sahitya Akademi, New Delhi, 1st edn. 1932-51 in five volumes.

Instances of the extended characters:
j{afterdot}a
kh{afterdot}a
t{afterdot}a
g{afterdot}
Assamese <va>

3. "Golden Bengali Dictionary" <sonaara baa;mlaa abhidhaana> by Abdur Rahim;
National Publishers, 1971.

Instances of the extended characters:
j{afterdot}a
b{afterdot}a

===========================================================

RE: [indic] Re: Top Nukta... and double nuktas ... and more nuktas

Reply via email to