I'm curious about the issue you are discussing ... is this similar to
a long-standing bug that affects Bengali, Assamese, and Bishnupriya
Manipuri wikipedias?
https://bugzilla.wikimedia.org/show_bug.cgi?id=5948


Ragib


User:Ragib on en and bn


--
Ragib Hasan, Ph.D
NSF Computing Innovation Fellow and
Assistant Research Scientist

Dept of Computer Science
Johns Hopkins University
3400 N Charles Street
Baltimore, MD 21218

Website:
http://www.ragibhasan.com



On Mon, Dec 27, 2010 at 1:29 AM, BalaSundaraRaman <sundarbe...@yahoo.com> wrote:
>> Unicode's decision to bring the second encoding in
>
>> standard was widely  debated  and opposed mainly by FOSS developer
>> community from Malayalam.  Unicode announced the dual encoding scheme
>> without canonical equivalence  definition in 2005 and reverted it when
>> scholars and developers opposed  it.
>
> Sadly, you're not alone in this, Santhosh.
> We have had canonical non-equivalence issues and many more (similar to the
> atomic chillu issue) in Tamil too. :(
> Part of it was inherited from the umbrellaish ISCII model (done with good
> intentions, I believe).
> They put the abugidas of the Indo-Aryan languages and other systems like Tamil
> (haven't studied other writing systems enough to comment upon) into one bucket
> and we're still suffering for that. They cite stability when legitimate 
> changes
> are sought, but allow such breaking changes.
>
> I'm sure you'll be working with the search engines to map the equivalent glyph
> sequences. Also, please explore mediawiki tech solutions to add redirects or
> hidden texts (though not ideal).
>
> - Sundar
>
> "That language is an instrument of human reason, and not merely a medium for 
> the
> expression of thought, is a truth generally admitted."
> - George Boole, quoted in Iverson's Turing Award Lecture
>
>
>
> ----- Original Message ----
>> From: Santhosh Thottingal <santhosh.thottin...@gmail.com>
>> To: Discussion list on Indian language projects of Wikimedia.
>><wikimediaindia-l@lists.wikimedia.org>
>> Sent: Sun, December 26, 2010 10:28:17 PM
>> Subject: Re: [Wikimediaindia-l] Indic languages & unicode issues.
>>
>> On Sun, Dec 26, 2010 at 7:43 PM, CherianTinu Abraham
>> <tinucher...@gmail.com> wrote:
>> >  Hi all,
>> > Happened to see Gerard's blog post on issues with Malayalam  Wikipedia
>> > & Unicode upgrade to
>> >  5.1 http://ultimategerardm.blogspot.com/2010/12/malayalam-enigma.html
>>
>>
>> The  issue is very complex. There were heated debates around this topic
>> in  Unicode Indic Mailing list for years. In short the issue is about
>> dual  encoding- representing a letter using two types of unicode
>> character codes.  Unicode's decision to bring the second encoding in
>> standard was widely  debated  and opposed mainly by FOSS developer
>> community from Malayalam.  Unicode announced the dual encoding scheme
>> without canonical equivalence  definition in 2005 and reverted it when
>> scholars and developers opposed  it.
>> The same proposal again introduced. Foss community, language  scholars
>> protested the proposal. The SMC community submitted a document with  17
>> reasons why dual encoding should not be introduced.-  see
>> http://wiki.smc.org.in/images/2/23/SMC_Unicode_5.1.pdf
>> Similarly a  seminar conducted to discuss the issue by University of
>> Kerala opposed the  proposal.  see
>>http://images2.wikia.nocookie.net/__cb20080131071131/fci/images/1/19/Report_of_Workshop.pdf
>>f
>>   But Unicode technical consortium did not bother to answer both of
>> these  reports and went ahead with the decision in Unicode 5.1. The
>> dual encoding  scheme is with out any canonical equivalence definition.
>> Since it is not  there in standard I doubt whether Operating systems
>> will implement it, not to  mention about search engines.
>>
>> Since the new encoding scheme is defined  without backward
>> compatibility, or against unicode's stability policy,   Malayalam FOSS
>> community decided not to implement it until issues are  resolved and
>> continuing with unicode 5.0 encoding. Malayalam news portals  also
>> follow unicode 5.0. Most of the tools from Google also continue  with
>> unicode 5.0 based encoding. Malayalam wikipedia decided to go  ahead
>> with latest version of unicode. I had resisted this move in  the
>> discussion pages of Malayalam wikipedia. The decision was taken  based
>> on voting by a small community of editors and not based on  proper
>> technical analysis.
>>
>>
>> Believe it or not, this is how  Malayalam wiki is rendered inWindows XP
>> IE 8 box with OS default  font:
>> http://thottingal.in/tmp/ml-wiki-winxp-IE8.png
>> I hope it gives some  clue about the issue that Gerard mentioned.
>>
>> Most of the discussions  happened around the encoding issue was in
>> Malayalam(in Malayalam wiki or in  blogs), but this English blog post
>> might summarize  it
>> http://www.j4v4m4n.in/2009/11/07/unicode-or-malayalam/
>>
>>
>> Discussions  happened in Malayalam wikipedia(content in Malayalam
>> language)
>>http://ml.wikipedia.org/wiki/വിക്കിപീഡിയ:പഞ്ചായത്ത്_(സാങ്കേതികം)/യൂണികോഡ്_5.1.0/ചർച്ച_(പഴയവ)
>>
>>
>> Thanks
>> Santhosh Thottingal
>> http://thottingal.in
>>
>> _______________________________________________
>> Wikimediaindia-l l mailing list
>> Wikimediaindia-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
>>
>
> _______________________________________________
> Wikimediaindia-l mailing list
> Wikimediaindia-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
>

_______________________________________________
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

Reply via email to