Ragib, (copied Tamil Wiki list)
We've faced an issue similar to Bug #5948. Due to non-canonicalisation, there are two articles on the same title in Tamil Wikipedia! http://ta.wikipedia.org/wiki/%E0%AE%AA%E0%AF%87%E0%AE%9A%E0%AF%8D%E0%AE%9A%E0%AF%81:%E0%AE%AE%E2%80%8C%E0%AE%9E%E0%AF%8D%E0%AE%9A%E2%80%8C%E0%AE%B3%E0%AF%8D_%E0%AE%95%E0%AE%BE%E0%AE%AE%E0%AE%BE%E0%AE%B2%E0%AF%88 (Tamil discussion) - Sundar "That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted." - George Boole, quoted in Iverson's Turing Award Lecture ----- Original Message ---- > From: Ragib Hasan <ragibha...@gmail.com> > To: Discussion list on Indian language projects of Wikimedia. ><wikimediaindia-l@lists.wikimedia.org> > Sent: Wed, December 29, 2010 10:23:06 AM > Subject: Re: [Wikimediaindia-l] Indic languages & unicode issues. > > I'm curious about the issue you are discussing ... is this similar to > a long-standing bug that affects Bengali, Assamese, and Bishnupriya > Manipuri wikipedias? > https://bugzilla.wikimedia.org/show_bug.cgi?id=5948 > > > Ragib > > > User:Ragib on en and bn > > > -- > Ragib Hasan, Ph.D > NSF Computing Innovation Fellow and > Assistant Research Scientist > > Dept of Computer Science > Johns Hopkins University > 3400 N Charles Street > Baltimore, MD 21218 > > Website: > http://www.ragibhasan.com > > > > On Mon, Dec 27, 2010 at 1:29 AM, BalaSundaraRaman <sundarbe...@yahoo.com> >wrote: > >> Unicode's decision to bring the second encoding in > > > >> standard was widely debated and opposed mainly by FOSS developer > >> community from Malayalam. Unicode announced the dual encoding scheme > >> without canonical equivalence definition in 2005 and reverted it when > >> scholars and developers opposed it. > > > > Sadly, you're not alone in this, Santhosh. > > We have had canonical non-equivalence issues and many more (similar to the > > atomic chillu issue) in Tamil too. :( > > Part of it was inherited from the umbrellaish ISCII model (done with good > > intentions, I believe). > > They put the abugidas of the Indo-Aryan languages and other systems like >Tamil > > (haven't studied other writing systems enough to comment upon) into one >bucket > > and we're still suffering for that. They cite stability when legitimate >changes > > are sought, but allow such breaking changes. > > > > I'm sure you'll be working with the search engines to map the equivalent >glyph > > sequences. Also, please explore mediawiki tech solutions to add redirects or > > hidden texts (though not ideal). > > > > - Sundar > > > > "That language is an instrument of human reason, and not merely a medium > > for >the > > expression of thought, is a truth generally admitted." > > - George Boole, quoted in Iverson's Turing Award Lecture > > > > > > > > ----- Original Message ---- > >> From: Santhosh Thottingal <santhosh.thottin...@gmail.com> > >> To: Discussion list on Indian language projects of Wikimedia. > >><wikimediaindia-l@lists.wikimedia.org> > >> Sent: Sun, December 26, 2010 10:28:17 PM > >> Subject: Re: [Wikimediaindia-l] Indic languages & unicode issues. > >> > >> On Sun, Dec 26, 2010 at 7:43 PM, CherianTinu Abraham > >> <tinucher...@gmail.com> wrote: > >> > Hi all, > >> > Happened to see Gerard's blog post on issues with Malayalam Wikipedia > >> > & Unicode upgrade to > >> > 5.1 http://ultimategerardm.blogspot.com/2010/12/malayalam-enigma.html > >> > >> > >> The issue is very complex. There were heated debates around this topic > >> in Unicode Indic Mailing list for years. In short the issue is about > >> dual encoding- representing a letter using two types of unicode > >> character codes. Unicode's decision to bring the second encoding in > >> standard was widely debated and opposed mainly by FOSS developer > >> community from Malayalam. Unicode announced the dual encoding scheme > >> without canonical equivalence definition in 2005 and reverted it when > >> scholars and developers opposed it. > >> The same proposal again introduced. Foss community, language scholars > >> protested the proposal. The SMC community submitted a document with 17 > >> reasons why dual encoding should not be introduced.- see > >> http://wiki.smc.org.in/images/2/23/SMC_Unicode_5.1.pdf > >> Similarly a seminar conducted to discuss the issue by University of > >> Kerala opposed the proposal. see >>>http://images2.wikia.nocookie.net/__cb20080131071131/fci/images/1/19/Report_of_Workshop.pdf >f > >>f > >> But Unicode technical consortium did not bother to answer both of > >> these reports and went ahead with the decision in Unicode 5.1. The > >> dual encoding scheme is with out any canonical equivalence definition. > >> Since it is not there in standard I doubt whether Operating systems > >> will implement it, not to mention about search engines. > >> > >> Since the new encoding scheme is defined without backward > >> compatibility, or against unicode's stability policy, Malayalam FOSS > >> community decided not to implement it until issues are resolved and > >> continuing with unicode 5.0 encoding. Malayalam news portals also > >> follow unicode 5.0. Most of the tools from Google also continue with > >> unicode 5.0 based encoding. Malayalam wikipedia decided to go ahead > >> with latest version of unicode. I had resisted this move in the > >> discussion pages of Malayalam wikipedia. The decision was taken based > >> on voting by a small community of editors and not based on proper > >> technical analysis. > >> > >> > >> Believe it or not, this is how Malayalam wiki is rendered inWindows XP > >> IE 8 box with OS default font: > >> http://thottingal.in/tmp/ml-wiki-winxp-IE8.png > >> I hope it gives some clue about the issue that Gerard mentioned. > >> > >> Most of the discussions happened around the encoding issue was in > >> Malayalam(in Malayalam wiki or in blogs), but this English blog post > >> might summarize it > >> http://www.j4v4m4n.in/2009/11/07/unicode-or-malayalam/ > >> > >> > >> Discussions happened in Malayalam wikipedia(content in Malayalam > >> language) >>>http://ml.wikipedia.org/wiki/വിക്കിപീഡിയ:പഞ്ചായത്ത്_(സാങ്കേതികം)/യൂണികോഡ്_5.1.0/ചർച്ച_(പഴയവ) >) > >> > >> > >> Thanks > >> Santhosh Thottingal > >> http://thottingal.in > >> > >> _______________________________________________ > >> Wikimediaindia-l l mailing list > >> Wikimediaindia-l@lists.wikimedia.org > >> https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l > >> > > > > _______________________________________________ > > Wikimediaindia-l mailing list > > Wikimediaindia-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l > > > > _______________________________________________ > Wikimediaindia-l mailing list > Wikimediaindia-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l > _______________________________________________ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l