Re: Alternative encodings for Malayalam “nta”

梁海 Liang Hai via Unicode Mon, 07 Oct 2019 13:08:52 -0700

[Putting the public mailing list back to the recipient list.]

Cibu,

Thanks for your L2/19-348 
<https://www.unicode.org/L2/L2019/19348-malayalam-response.pdf> (Response to 
L2/19-345). My comments:

> I am curious to know the reference for the phonetic analysis described in 
> section A chillu-less analysis in the proposal L2/19-345. How can a phonetic 
> analysis be the basis for an important double encoding decision?

The basis is not the phonetic analysis (the phonetic analysis is only provided 
in the document as an fyi, so readers understand why many people use it), but 
the fact of a widespread alternative encoding.

Basically we need to properly recognize the failure of ensuring a single, ideal 
encoding. It’s not helpful to keep the Core Spec detached from the reality.

> Anycase, the sequence implied by this particular analysis is an artifact of 
> the evolution of Unicode for Malayalam; it is not grounded in any prior 
> writing traditions or academic literature.

We’re not talking about legitimacy of the phonetic encoding.

> In Malayalam, dental /n̪/ and alveolar /n/ are not allophones as implied in 
> the proposal.

I actually didn’t suggest any allophone relationship, on purpose. If it’s 
helpful, I can change the “~” notation in “[n̪a ~ na]” (and [ra ~ ta]) to “/” 
or “,” in a revision.

> So using <NA, VIRAMA> for CHILLU N is not phonetically accurate.

This is not a valid argument (see the next paragraph), although accuracy is not 
relevant  anyway (as I said, I was trying to explain why people use <NA, 
VIRAMA, RRA>, not trying to legitimize it.).

The written form ൻ is the syllable-coda specific form of the written form ന, 
and the pronunciation of ൻ being limited to [n] is a result of Malayalam’s 
phonology ([n̪] not usually appearing in a syllable-coda position, unless 
preceding another dental sound).

The reason for ന് being used in the phonetic encoding is mostly because ൻ is 
not considered to be eligible for conjunct forming, and ന് is the natural 
fallback. Again, I’m not trying to legitimize the encoding, but only explaining 
my observation of the widespread encoding.

> Moreover, if you show the visual ന്‌റ (<<NA, visual VIRAMA, RRA>>) to a 
> native user (who is unaware of Unicode particulars), they will not identify 
> it as (<<chillu N, subscript RRA>> /ntʌ/); instead, they would read it as 
> /nərʌ/.

Not relevant. I avoided “ന്‌റ” particularly for this kind of argument. The ് 
was only there to mark an inherent vowel suppressed ന. I almost avoided ് 
altogether because of its ambiguity, but didn’t do it, because that would make 
the document too obscure. The point of an an inherent vowel suppressed ന is 
used in the phonetic encoding, and ് just happens to be used there.

> This proposal does not address the remaining chillu conjuncts described in 
> L2/19-086R.

The document doesn’t propose any productive encoding rule. Why does it need to 
address other cases?

> It also does not address the legacy sequence supported by MS Windows <NA, 
> VIRAMA, ZWJ, RRA> for (<<chillu N, subscript RRA>>).

I can make it clearer that <NA, VIRAMA, ZWJ, RRA> is just plainly unacceptable 
as it clashes with our general rule of chillu not forming a conjunct with its 
following letter automatically (without a conjoiner), in Section 4, Real-world 
encodings.

> I am not sure how this proposal is going to solve the issue of inadequate 
> support for <CHILLU N, VIRAMA, RRA>, without explicitly rescinding this 
> sequence. Double encoding for (<<chillu N, subscript RRA>>) is not going to 
> solve any issue, if not, making the issue more acute. Double encoding is 
> never a desirable quality for Unicode. So the decision should not be taken 
> lightly or hastly. It needs to be clearly thought through, probably through a 
> PRI.

Double encoding will not be solved. The proposal is about recognizing the 
reality of failure. With Windows on the loose for so many years, we’ve already 
missed the opportunity of ensuring a single encoding for the written form.

Now the standard needs to first recognize the widespread encoding that won’t go 
away, so implementers are informed. Then we see which direction we should push 
Microsoft and Apple to converge.

I agree that the Unicode Standard might need to have a clear 
disposition/preference between the graphic and phonetic encodings, so the two 
are not considered to be just equal, so we can have a direction for pushing the 
implementations to converge.

> Prior to Unicode 5.2, the encoding of the cluster [glyph] (<<chillu N, 
> subscript RRA>> /ntʌ/) was not clearly defined. …

You mean 5.1, right? The encoding has been specified since 5.1.

> … and <NA, VIRAMA, ZWJ, RRA> …

How can implementations support this encoding without breaking the side-by-side 
form ൻറ though?

Best,
梁海 Liang Hai
https://lianghai.github.io <https://lianghai.github.io/>

>> On Oct 6, 2019, at 15:10, Cibu <[email protected] <mailto:[email protected]>> 
>> wrote:
>> 
>> Yes; it is now available as L2/19-348 
>> <http://www.unicode.org/cgi-bin/GetMatchingDocs.pl?L2/19-348>.
>> 
>> On Sun, Oct 6, 2019 at 11:03 PM Asmus Freytag (c) <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Have you submitted that response as a UTC document?
>> A./
>> 
>> On 10/6/2019 2:08 PM, Cibu wrote:
>>> Thanks for addressing this. Here is my response: 
>>> https://docs.google.com/document/d/1K6L82VRmCGc9Fb4AOitNk4MT7Nu4V8aKUJo_1mW5X1o/
>>>  
>>> <https://docs.google.com/document/d/1K6L82VRmCGc9Fb4AOitNk4MT7Nu4V8aKUJo_1mW5X1o/>
>>> 
>>> In summary, my take is:
>>> 
>>> The sequence <NA, VIRAMA, RRA> for ൻ്റ (<<chillu N, subscript RRA>>) should 
>>> not be legitimized as an alternate encoding; but should be recognized as a 
>>> prevailing non-standard legacy encoding.
>>> 
>>> 
>>> On Sun, Oct 6, 2019 at 7:57 PM 梁海 Liang Hai <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Folks,
>>> 
>>> (Microsoft Peter and Andrew, search for “Windows” in the document.)
>>> 
>>> (Asmus, in the document there’s a section 5, ICANN RZ-LGR situation—let me 
>>> know if there’s some news.)
>>> 
>>> This is a pretty straightforward document about the notoriously problematic 
>>> encoding of Malayalam <chillu n, bottom-side sign of rra>. I always wanted 
>>> to properly document this, so finally here it is:
>>> 
>>> L2/19-345 <http://www.unicode.org/cgi-bin/GetMatchingDocs.pl?L2/19-345>
>>> Alternative encodings for Malayalam "nta"
>>> Liang Hai
>>> 2019-10-06
>>> 
>>> Unfortunately, as <NA, VIRAMA, RRA> has already become the de facto 
>>> standard encoding, now we have to recognize it in the Core Spec. It’s a bit 
>>> like another Tamil srī situation.
>>> 
>>> An excerpt of the proposal:
>>> 
>>> Document the following widely used encoding in the Core Specification as an 
>>> alternative representation for Malayalam [glyph] (<chillu n, bottom-side 
>>> sign of rra>) that is a special case and does not suggest any productive 
>>> rule in the encoding model:
>>> 
>>> <U+0D28 ന MALAYALAM LETTER NA, U+0D4D ◌് MALAYALAM SIGN VIRAMA, U+0D31 റ 
>>> MALAYALAM LETTER RRA>
>>> 
>>> Best,
>>> 梁海 Liang Hai
>>> https://lianghai.github.io <https://lianghai.github.io/>
>> 
>

Re: Alternative encodings for Malayalam “nta”

Reply via email to