RE: Suggestions in Unicode Indic FAQ

Kent Karlsson Mon, 03 Feb 2003 10:57:38 -0800

> > No, with proper reordering (and "normal" display mode), the e-matra at
> > the beginning of the second word would appear to be last glyph of the
> > first "word".  Similarly, for the second case, the e-matra glyph would
> > have come to the left of the pa.  The fluent reader (ok, not me...)
> > would then see those errors anyway, just like I can find spelling
> > errors in Swedish, most often without any kind of special marking. (I'm
> > assuming through-out that reordrant combining characters 
> are reordered.)
> 
> Illegal sequences

There are no illegal sequences.

> are not reordered as you indicated.

Then that is a problem with the display software you are using.

> Also, as far as I
> know there is no mention of reordering of illegal input sequence (or
> invalid combining mark) in Unicode standard.

Again, there are no "illegal input sequences".

> Consider the last set of glyphs (left-to-right, top-to-bottom) in the
> attached image. It is the rendering effect of illegal input sequence

See above.

> "Devanagari Vowel Sign I" [U+093F] + "Devanagari Letter Ka" 
> [U+0915] and without any dotted circle.

Let's see if I understand you. <093F, 0915> is the input.  Since
093F is a combining character, one should (not must, but should)
treat this *as if* the input was <0020, 093F, 0915>.  Since 093F
is also reordrant, one must reorder it before the preceding base
character (at least, more for consonant clusters), so the output
glyphs would be <<glyph for 0915, space, glyph for 0915>>. 
(But your image does not show that.)

> As you might be knowing the correct input
> sequence should be U+0915 followed by U+093F.

That would be a different input (whether that is correct or
not depends on the authors intent).

> In that case the result would
> have been similar to what appears right now. 

Similar ONLY if you disregard the space "glyph" that should
have been there.

> (Though some more
> sophisticated font/application may want to replace the 
> appearing glyph for
> U+093F to be substituted by some other glyph with proper 
> attachment point).

That may be.

> Now there is no way that user can identify this illegal input sequence
> without dotted circle.

Yes, there is.  Don't disregard the space "glyph".

> In the worst case even this rendered glyph is
> attached to the character from a class (for example, 
> consonant cluster of
> "Ka" "Virama" "Ma") for which the glyph has been designed to 
> render with.
> In such case even a fluent reader can not identify the error.
> 
> > 
> > There are spelling errors, yes.  But there are other ways 
> of indicating
> > spelling errors, that are (by now) fairly conventional for 
> any language
> > (as long as there is an appropriate dictionary installed), 
> and that also
> > are more general (in catching more spelling errors) and 
> less obtrusive
> > (the author really wants to write it that way, for some reason).
> > 
> > > Apparently, Michka used a non-OpenType Bengali Unicode font when
> > > he embedded the fonts into the page.  As long as you are looking
> > > at the page on-line, with the embedded fonts, these errors are
> > > invisible.  
> > > 
> > > It may be typographically horrible.  It *should* be 
> typographically
> > > horrible in order to illustrate bad sequences clearly.
> > 
> > I'd prefer little red wiggly lines under the word, or 
> yellow background
> > or some such (just for screen display, not for printing; 
> screen grabs
> > not counted).  And that for any spelling "error".
> 
> Spelling mistakes can be categorized into two different classes.

???

> One
> arising from illegal input sequence (e.g., Vowel Sign E as the first
> character in a word)

There are no illegal input sequences.

> and the other one is legal input sequence with no
> contextual meaning in the dictionary.

A simple spell checker just checks if the word is in the 
dictionary or not (without worrying about the context).
That would catch what you call "illegal input sequences" too.

> While indication of the  second type
> of mistake is generally used only in sophisticated 
> applications like word processor, 

Why?  There is nothing in principle hindering a spell checker
to be used in a "plain text" editor.

> everyone wants to know the first kind of mistake.

Without a spell checker, but with proper rendering, spelling
errors can be detected by a fluent reader, since they look
different also without any dotted circles. For some ambiguous
Indic cases, like a prefix matra, consonant, postfix matra, all
possible character sequences for them are misspellings (as far
as I know).

> With your
> explanation it seems that even plain text editor is not 
> useful at all to identify such common typing mistakes!

Consider English.  If I write "nnnn", that may well be a spell error.
Do I deserve to get the rendering of that string to be littered by
dotted circles just because a sequence of four n's "has to" be
a spell error?

                /Kent K

> - Keyur
RE: Suggestions in Unicode Indic FAQ

Reply via email to