I'm curious about the issue you are discussing ... is this similar to a long-standing bug that affects Bengali, Assamese, and Bishnupriya Manipuri wikipedias? https://bugzilla.wikimedia.org/show_bug.cgi?id=5948
Ragib User:Ragib on en and bn -- Ragib Hasan, Ph.D NSF Computing Innovation Fellow and Assistant Research Scientist Dept of Computer Science Johns Hopkins University 3400 N Charles Street Baltimore, MD 21218 Website: http://www.ragibhasan.com On Mon, Dec 27, 2010 at 1:29 AM, BalaSundaraRaman <sundarbe...@yahoo.com> wrote: >> Unicode's decision to bring the second encoding in > >> standard was widely debated and opposed mainly by FOSS developer >> community from Malayalam. Unicode announced the dual encoding scheme >> without canonical equivalence definition in 2005 and reverted it when >> scholars and developers opposed it. > > Sadly, you're not alone in this, Santhosh. > We have had canonical non-equivalence issues and many more (similar to the > atomic chillu issue) in Tamil too. :( > Part of it was inherited from the umbrellaish ISCII model (done with good > intentions, I believe). > They put the abugidas of the Indo-Aryan languages and other systems like Tamil > (haven't studied other writing systems enough to comment upon) into one bucket > and we're still suffering for that. They cite stability when legitimate > changes > are sought, but allow such breaking changes. > > I'm sure you'll be working with the search engines to map the equivalent glyph > sequences. Also, please explore mediawiki tech solutions to add redirects or > hidden texts (though not ideal). > > - Sundar > > "That language is an instrument of human reason, and not merely a medium for > the > expression of thought, is a truth generally admitted." > - George Boole, quoted in Iverson's Turing Award Lecture > > > > ----- Original Message ---- >> From: Santhosh Thottingal <santhosh.thottin...@gmail.com> >> To: Discussion list on Indian language projects of Wikimedia. >><wikimediaindia-l@lists.wikimedia.org> >> Sent: Sun, December 26, 2010 10:28:17 PM >> Subject: Re: [Wikimediaindia-l] Indic languages & unicode issues. >> >> On Sun, Dec 26, 2010 at 7:43 PM, CherianTinu Abraham >> <tinucher...@gmail.com> wrote: >> > Hi all, >> > Happened to see Gerard's blog post on issues with Malayalam Wikipedia >> > & Unicode upgrade to >> > 5.1 http://ultimategerardm.blogspot.com/2010/12/malayalam-enigma.html >> >> >> The issue is very complex. There were heated debates around this topic >> in Unicode Indic Mailing list for years. In short the issue is about >> dual encoding- representing a letter using two types of unicode >> character codes. Unicode's decision to bring the second encoding in >> standard was widely debated and opposed mainly by FOSS developer >> community from Malayalam. Unicode announced the dual encoding scheme >> without canonical equivalence definition in 2005 and reverted it when >> scholars and developers opposed it. >> The same proposal again introduced. Foss community, language scholars >> protested the proposal. The SMC community submitted a document with 17 >> reasons why dual encoding should not be introduced.- see >> http://wiki.smc.org.in/images/2/23/SMC_Unicode_5.1.pdf >> Similarly a seminar conducted to discuss the issue by University of >> Kerala opposed the proposal. see >>http://images2.wikia.nocookie.net/__cb20080131071131/fci/images/1/19/Report_of_Workshop.pdf >>f >> But Unicode technical consortium did not bother to answer both of >> these reports and went ahead with the decision in Unicode 5.1. The >> dual encoding scheme is with out any canonical equivalence definition. >> Since it is not there in standard I doubt whether Operating systems >> will implement it, not to mention about search engines. >> >> Since the new encoding scheme is defined without backward >> compatibility, or against unicode's stability policy, Malayalam FOSS >> community decided not to implement it until issues are resolved and >> continuing with unicode 5.0 encoding. Malayalam news portals also >> follow unicode 5.0. Most of the tools from Google also continue with >> unicode 5.0 based encoding. Malayalam wikipedia decided to go ahead >> with latest version of unicode. I had resisted this move in the >> discussion pages of Malayalam wikipedia. The decision was taken based >> on voting by a small community of editors and not based on proper >> technical analysis. >> >> >> Believe it or not, this is how Malayalam wiki is rendered inWindows XP >> IE 8 box with OS default font: >> http://thottingal.in/tmp/ml-wiki-winxp-IE8.png >> I hope it gives some clue about the issue that Gerard mentioned. >> >> Most of the discussions happened around the encoding issue was in >> Malayalam(in Malayalam wiki or in blogs), but this English blog post >> might summarize it >> http://www.j4v4m4n.in/2009/11/07/unicode-or-malayalam/ >> >> >> Discussions happened in Malayalam wikipedia(content in Malayalam >> language) >>http://ml.wikipedia.org/wiki/വിക്കിപീഡിയ:പഞ്ചായത്ത്_(സാങ്കേതികം)/യൂണികോഡ്_5.1.0/ചർച്ച_(പഴയവ) >> >> >> Thanks >> Santhosh Thottingal >> http://thottingal.in >> >> _______________________________________________ >> Wikimediaindia-l l mailing list >> Wikimediaindia-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l >> > > _______________________________________________ > Wikimediaindia-l mailing list > Wikimediaindia-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l > _______________________________________________ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l