Re: Transcoding Tamil in the presence of markup

Jungshik Shin Sun, 07 Dec 2003 08:08:28 -0800

On Sun, 7 Dec 2003, Peter Jacobi wrote:

> > In Unicode U+0BBE, U+0BC6 and U+0BCA are all dependent vowel signs
>
> Yes, but just this fact doesn't meet user's expectations. It is inherited
> from
> the ISCII unification.


  I don't think there's any rule that dependent vowel signs cannot be
styled independently from consonants preceding them.

> But the core problem is not on the theoretical, but on the practical side.
>
> As it was possible to style individual characters in legacy encodings
> (heck, it was possible using a mechanical Tamil typewriter!), what is to
> be done in migration to Unicode?
>
> So, I'm still wondering whether Unicode and HTML4 will consider
>   <span style='color:#00f'>&#x0BB2;</span>&#x0BBE;
> valid and it is the task of the user agent to make the best out of it.

  I think this is valid. A more interesting case has to do with
W3 CHARMOD in which NFC is required/recommended (it's not yet complete
and W3C I18N-WG has been discussing it).  Consider the following case.

  &#x0BB2;<span class="left_part">&#0x0BC7;</span>
 <span class="right_part">&#0x0BBE;</span>

Because <U+0BC7, U+0BBE> is equivalent to U+0BCB, we couldn't use
the above if NFC is required even though in legacy TSCII encoding,
it's possible. The same is true of Korean syllables(see below) as
Philippe pointed out.

  &#x1100;<span class="vowel">&#x1161;</span>&#x11a8;


> > In Mozilla you may be completely breaking the font lookups by separately
> > formatting the different parts of a conjunct.
>
> As I've understood Mozilla (i.e. Jungshik Shin) internally transcodes to
> TSCII
> before display. Or is this only be done on Linux?

  It's only done on Linux and on Win 9x/ME. On Win 2k/XP, it relies on
TextOut (the exact name of Win32 API is escaping me at the moment)
'implicitly invoking Uniscribe API' (on our behalf). Mozilla needs to
invoke Uniscribe APIs directly  on Windows as is done by MS IE (and Pango
APIs on Linux ATSUI on Mac OS X).  Either way, what's currently happening
is that enclosing U+0BB2 with <span> breaks it apart from U+0BBE so that
the 'context' is lost and both U+0BB2 and U+0BBE are rendered separately
(so that reordering doesn't happen)

  Jungshik

Re: Transcoding Tamil in the presence of markup

Reply via email to