RE: Indic editing (was: RE: The real solution)

pani Wed, 28 Nov 2001 09:18:17 -0800

Hello,

I agree with Mr. Hudson and Mr. Banerjee. The discussion would be more interesting to 
consider and also would have more convincing reasons to invite attentions if there had 
been really any implementation available with editing level rendering. The glyph based 
editing would certainly be amusing and fruitful if there has been any users who have 
used such a system and find it more useful and easy to understand. I will want to 
second Mr. Cimarosti if he could give me such an application to try out or a link from 
where I can get such a system.


Typists who have been using conventional typewriters are certainly used to a system 
which had been developed under technical limitations which the computer overcomes. In 
Hudson's words, Most of the time, all you have to do is type the phonetic 
representation of the words, and let the shaping engine and font get on with the job 
of displaying it properly. There is no extra learning essential to start typing using 
this kind of a system. The kids are never taught "repha" and "rakaara" as characters 
in their learning of the script. They are made to understand that a halanta removes 
the implicit vowels from a consonant and such consonants take certain visual forms 
when joined with (followed by) other consonants. So, "repha" or "rakaara" are such 
forms of the consonant "ra". With this knowledge, even a kid can understand and start 
typing as he uses the language. The property of the Indic script is phonetic and so 
would anyone type. The way of representation is left for the computer t!
o do.

Words using Indic scripts are made of syllables. A word like 
"prajnyenaatmanaasmaallokaadutkramyamusminsvarge" is made of syllables like "pra + 
jnye + naa + tma + naa + smaa + llo + kaa + du + tkra + mya + mu + smi + nsva + rge". 
As agreed by all, when one makes a mistake in a relatively small word like arjun, one 
would delete the whole word and retype. And if one makes a mistake in any syllable of 
a word like the above, one could as well move to the syllable and delete it and 
retype. The rule holds true for all, be it Indic or Latin. The difference is that in 
linear scripts, each character is a unit and in complex scripts, a syllable is a unit.

Vivekananda Pani

[EMAIL PROTECTED] wrote
Hi,
  I am sorry for the delay.

On Tue, 27 Nov 2001 Marco Cimarosti wrote :

> > Or, in terms of backing store:
> > >    a  ra  virama  ja  -u  |  na
> >    a  ra  virama  ja  |  na
> >    a  ra  virama  |  na
> >    a  ra |  na
> >    a  |  na
> >    a  la |  na
> >    a  la  virama |  na
> >    a  la  virama  ja |  na
> >    a  la  virama  ja  -u  |  na
>
>But it is at the graphic level that your solution shows all it weirdness.

It should not show any weirdness.

>Have you tried "rendering" these nine steps?
>
>I have done this for you (see it also in the attached ALJUN.GIF):
>
>1.1:   a ja -u repha | na
>1.2:   a ja repha | na
>1.3:   a | repha na
>1.4:   a ra | na
>1.5:   a | na
>1.6:   a la | na
>1.7:   a l- | na
>1.8:   a l- ja | na
>1.9:   a l- ja -u | na

Which should be,
1.1:    a ja repha -u | na
1.2:    a ja repha | na
1.3:    a ra virama |na
1.4:    a ra | na
1.5:    a | na
1.6:    a la | na
1.7:    a la virama | na
1.8:    a l- ja | na
1.9:    a l- ja -u | na

Very similiar to the character sequence shown by Kenneth.
This is more intuitive, follows Unicode/ISCII and adheres exactly the 
thinking pattern of the Indian user. Contrary to expectations the Indian 
users who already know writing Hindi/Marathi will also find the 'thinking 
abstractly' quite intiutive because that is how they have learnt it. For 
them, for example, 'ja'
is always a full character and the 'j- + danda' is quite alien. Only a 
halant(which Unicode mistakenly identifies as virama; more about this later) 
can make it change into a half character.

So this is totally dependant on how your software application handles it.
I have checked 2 ISCII based products for this. The older DOS based product 
called ALP Personal gives it absolutely correct. It is available at
http://www.cdacindia.com/html/gist/down.asp
and is free.
The other product(iLeap) from the same vendor does it as you have said and 
makes the conjunct formation ('na' with a repha on top)quite strange and 
amusing for me.

>By the point of view of the user, many things look totally puzzling here:
>
>- In step 1.2, your backspace deletes the ja, but the repha survives,

It should not. The software should put a ZW(N)J/INV character to display the 
cursor after ra virama and before na.

>- In step 1.3, you have to backspace (=delete to left) in order to delete
>the "repha", which is now at the extreme right of the word, and to make a
>new "ra" appear where you just deleted.

Rather you would delete the virama and ra which have appeared.

>- In step 1.7, you have to enter something (virama) to make something
>disappear (the danda of the la).

Since la has an implicit vowel the halant(virama) is there to delete the 
vowel from it. The halant is thus the vowel omission sign required for 
characters to combine. Indian home users may not understand the technical 
details but they can easily understand that
la + halant + ja gives you l-ja.

>The glyphic method would require these keys: backspace, left, left, left,
>half la.
>
>It is not only much shorter, it also looks more consistent on screen (see 
>it
>also in the attached ALJUN.GIF):
>
>2.1:   a ja -u repha | na
>2.2:   a ja -u | na
>2.3:   a ja | -u na
>2.4:   a j- | -danda -u na
>2.5:   a | ja -u na
>2.6:   a l- | ja -u na
>
>The only counterintuitive thing I see is that, in step 2.1, it is not clear
>whether backspace will delete repha (over ja) or -u (under ja). But this is
>a general ambiguity, when you allow to delete non-spacing marks.

This is a big ambiguity, not a small one, which many Indian language 
developers would have faced. When your cursor is after 'na' it is fine.
When it shifts over 'na' what are you about to remove with the backspace, 
the reph or the u?
2.1:    a ja -u repha | na

Suppose you do not want to delete anything, and are just shifting over 
characters to delete the initial 'a'. Then in your case you need to move 
over na, -u, reph, danda, j-. 5 keys.
If it was handled correctly characterwise the j- danda reph -u would be one 
single syllable for the software and one shift would do. 1 key.

Another ambiguity in the above case when you are here
2.1:    a ja -u repha | na
is that when you press the left key, visibly nothing will happen
as both -u and repha are zero width glyphs.
How do you make sure that the user understands that the cursor has moved 
over one glyph?

> > And I'm done. 8 keystrokes after the cursor down, but more efficient
> > than trying to mess with selecting the repha.
>
>More efficient!?

I think more efficient software-wise.

>By the way, your method too requires deleting a non-spacing mark (the -u
>after step 1), and even deleting an invisible mark (the virama after step
>3).

The virama is not supposed to be invisible when the joining consonant has 
been deleted.

The character based editing system that we have been discussing has been in 
existence for more than one decade now being used in a lot of Indian 
language software. It already has quite an user base in India and abroad. 
Complaints against the current system are usually only against fonts and 
software handling, not against the system itself. Before proposing a new 
'glyph based editing system' I think a feasibility study of user acceptance 
should be done by creating a small application using such a system.

>I am talking again about REPHA IN ISOLATION: ISCII has a way of 
>representing
>it, but Unicode does not. This is needed, even only for encoding didactic
>texts, and a solution to encode it (with ZWJ, probably) should be found.

I think the same way it is done in ISCII would be quite okay.
In ISCII you get it by typing the INV character after ra virama.
A similiar solution may be provided for, in Unicode, by using ZW(N)J.

By the way just to make an old point once more. I am sorry if I keep 
confusing you with the halant/virama but for me halant and virama are two 
completely different characters.
Halant is the inherent vowel suppressor while
Virama is the equivalent of Full stop in devanagari.
What joins with j- to make a full 'ja' is commonly called 'kana' or Danda. 
This only exists as a glyph.
Sorry if I have missed any discussions on this.

regards,
Dhrubajyoti Banerjee


_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp

RE: Indic editing (was: RE: The real solution)

Reply via email to