Gautam asked: > I stand corrected. Long syllabic /r l/ as well as > Assamese /r v/ are indeed additions beyond the ISCII > code chart. My objection, however, was not against > their inclusion but against their placement. I > understand why long syllabic /r l/ could not be placed > with the vowels, but why were Assamese /r v/ assigned > U+09F0 and U+09F1 instead of U+09B1 and U+09B5 > respectively?
Because the 7th and 8th rows in each of these Indic scripts was where additions beyond the ISCII repertoire were added. > > In the case of the Assamese letters, these > > additions separate out the *distinct* forms for > > Assamese /r/ and /v/ from the Bangla forms, and > > *enable* correct sorting, rather than inhibiting it. > > I fail to understand why Assamese /r v/ wouldn't be > correctly sorted if placed in U+09F0 and U+09F1. I presume you mean U+09B1 and U+09B5. The answer is that no Indic script is correctly sorted simply by using code point order, anyway. You need a more sophisticated algorithm. And since such an algorithm will have weight tables, it doesn't *matter* where a particular character is in the code chart. See: http://www.unicode.org/notes/tn1/ for a discussion of these issues. > Why > do they need to be separated out from the Bangla forms > in order to enable correct sorting? So that a tailored sorting for Assamese can be based on Assamese letters, and a tailored sorting for Bangla can be based on Bangla letters. > > > The addition of the long syllabic /r/ and /l/ > > *enables* the representation of Sanskrit > > material in the Bengali script, and the code > > position in the charts is immaterial. > > As stated earlier, my objection is not against their > inclusion, but against their positioning on the code > chart. Why is their relative position in the chart > immaterial for sorting? See the above technical note. If it will help you visualize the answer in some way, here is an excerpt from the Default Unicode Collation Element Table for the Unicode Collation Algorithm (Version 4.0), showing the default weight assignments for the relevant portion of the Bengali script: 09AA ; [.15C4.0020.0002.09AA] # BENGALI LETTER PA 09AB ; [.15C5.0020.0002.09AB] # BENGALI LETTER PHA 09AC ; [.15C6.0020.0002.09AC] # BENGALI LETTER BA 09AD ; [.15C7.0020.0002.09AD] # BENGALI LETTER BHA 09AE ; [.15C8.0020.0002.09AE] # BENGALI LETTER MA 09AF ; [.15C9.0020.0002.09AF] # BENGALI LETTER YA 09DF ; [.15C9.0020.0002.09AF][.0000.00FD.0002.09BC] # BENGALI LETTER YYA; QQCM 09B0 ; [.15CA.0020.0002.09B0] # BENGALI LETTER RA 09F0 ; [.15CB.0020.0002.09F0] # BENGALI LETTER RA WITH MIDDLE DIAGONAL <--- 09B2 ; [.15CC.0020.0002.09B2] # BENGALI LETTER LA 09F1 ; [.15CD.0020.0002.09F1] # BENGALI LETTER RA WITH LOWER DIAGONAL <--- 09B6 ; [.15CE.0020.0002.09B6] # BENGALI LETTER SHA 09B7 ; [.15CF.0020.0002.09B7] # BENGALI LETTER SSA 09B8 ; [.15D0.0020.0002.09B8] # BENGALI LETTER SA ^^^^ primary weights, in sorted order As you can see, the two additional letters in question, in the default table, sort in exactly the order you are suggesting, and as I said, the position in the *code chart* doesn't matter. > If it is merely because there > are script-specific sorting mechanisms already in > place, then it's just a bad excuse for a sloppy job. I > sincerely hope there is more to it than just that. It truly does not matter. *No* script in the Unicode Standard is encoded completely in a collation order. *All* scripts must be handled via weight tables in order to produce desired sorting behavior. That is true for Latin, Greek, Cyrillic, ..., as well as Devanagari, Bengali, Gujarati, ..., so this is nothing particularly different about the encoding of Bengali. > > > But be that as it may, they (TDIL) have nothing to > > do with the code point choices in the range > > U+09E0..U+09FF ... > > If this is indeed the case, then I must say it's > rather unfortunate. As a full corporate member > representing the Republic of India, the Ministry of > Information Technology should have had a BIG say in > the matter. Were they ever consulted on the issue? Of course, once they got involved. And they have been making suggestions ever since. But you need to recognize that the particular characters you are concerned about were standardized and published by ISO in 1993 (based, it is true, on charts published by Unicode even earlier, which in turn were based on the ISCII standard), well before the Government of India became a member of the Unicode Consortium. --Ken > Did > they try to intervene suo moto? Will a Unicode > official kindly let us know? Best, -Gautam.