[Wikimediaindia-l] [Press]: Medianma - Wikipedians Digitizing Out-Of-Copyright Texts In Eight Indian Languages
Dear all, Here's an article by Medianama on Wikisource in Indic languages ( http://www.medianama.com/2012/05/223-wikipedians-digitizing-out-of-copyright-text-in-eight-indian-languageshttp://www.medianama.com/2012/05/223-wikipedians-digitizing-out-of-copyright-text-in-eight-indian-languages/?utm_source=feedburnerutm_medium=twitterutm_campaign=Feed%3A+medianama+%28Medianama%3A+Digital+Media+In+India%29). There are a couple of minor misses on the article - but it does refer to two important aspects about Wikisource: a) It is a door through which many have entered our projects and communities (i.e., they start with Wikisource, and indeed Wiktionary, because it's relatively easier to contribute to, and then they move on to contribute to other projects too, such as Wikipedia) - especially in Indic languages. b) The initiative run by Malayalam community (written about in the article) to encourage school children to contribute to Wikisource is something that could be of interest to many other communities. If anyone wants any help to start conversations with schools in their states, Nitika and I would be happy to help out. Please reach out to us at noo...@wikimedia.org or nit...@wikimedia.org -- *Wikipedians Digitizing Out-Of-Copyright Texts In Eight Indian Languages**-Nikhil Pahwa In what is a painstaking process, Wikipedians are digitizing Indian language, out-of-copyright texts online, trying to address the comparative paucity of Indic language texts online. Wikisource is a repository of documents and archived material that serves as a reference source for Wikipedia, and a means of improving access to information sources. Of the 64 languages Wikisource is available in, 8 are Indian: Tamilhttp://stats.wikimedia.org/wikisource/EN/TablesWikipediaTA.htm( stats http://stats.wikimedia.org/wikisource/EN/SummaryTA.htm), Malayalamhttp://ml.wikisource.org/( stats http://stats.wikimedia.org/wikisource/EN/SummaryML.htm), Teluguhttp://te.wikisource.org/( stats http://stats.wikimedia.org/wikisource/EN/SummaryTE.htm), Kannadahttp://kn.wikisource.org/( stats http://stats.wikimedia.org/wikisource/EN/SummaryKN.htm), Sanskrithttp://sa.wikisource.org/( stats http://stats.wikimedia.org/wikisource/EN/SummarySA.htm), Marathihttp://mr.wikisource.org/( stats http://stats.wikimedia.org/wikisource/EN/SummaryMR.htm), Bengalihttp://bn.wikisource.org/( stats http://stats.wikimedia.org/wikisource/EN/SummaryBN.htm) and Gujaratihttp://stats.wikimedia.org/wikisource/EN/TablesWikipediaGU.htm( stats http://stats.wikimedia.org/wikisource/EN/SummaryGU.htm). What’s particularly notable about this digitization is that the texts are being typed out by volunteers on their own time, one word at a time.* *How It Began* *Users were adding bhajans of Mirabai to Wikipedia, but according to Wikipedia’s policies, recipes, poems and song lyrics belong to Wikibooks or Wikisource, Noopur Raval, Communications Consultant (India Program) at the Wikimedia Foundation told MediaNama. One user raised this issue, and following discussions, it was decided to create a Wikisource for Gujarati. The first text to be digitized, though, was Rachnatmak Karyakram, a book by Mahatma Gandhi. The project, involving the digitization of 60 pages, took six volunteers a week. This was followed by another project, the digitization of Gandhi’s autobiography, with a group of 13 people typing out the book over a month.* *Identification Prioritization Of Texts For Digitization* *Selection of text for digitization is entirely community driven: they decide what is important. Editors put up a notice for the project, and user participation is sought. For example, the Gujarati Wikisource editors chose a text by Mahatma Gandhi. The community has an intensive process for checking if a book is out of copyright, either using the publication date, and there are mailing lists which discuss when books go out of copyright. “It’s not as if there is a shortage of texts that are out of copyright,” Hisham Mundol, Consultant (India Program) at the Wikimedia Foundation said, adding that “The kind of projects that the community is undertaking (at present) involves iconic books, where you know the author and the publisher.”* *Overcoming Technological Challenges* *Mundol points out that the process of digitization is brutal, compounded by the fact that there is no reasonably functional OCR (Optical Character Recognition) in Indic languages. Texts are thus manually typed out, followed by a phase of correction and proofreading. In comparison, English texts can be scanned and uploaded and OCR’ed. The lack of tools points towards an issue which Wikipedia faces with Indic languages. “If a MediaWiki tool comes to an English language project, the possibility of implementing it, the kind of people using it, all of that happens very quickly, because most of this is written English. It takes time to localize it. For a bug to be filed for a local language project takes a lot more time. That gap makes for a
Re: [Wikimediaindia-l] [Wikipatrika]: Announcing the third issue of Wikipatrika!
Thank you to everyone (around 25 at the last count!) who helped out on Patrika. I was going through this again and 5 particular stories caught my attention - from the HUGE number of stories in Patrika[0] a) The 10th anniversary plans of the Assamese community.[1] For a small community, you guys are breath-takingly ambitious and are going about realising your dreams in a wonderfully collaborative manner. b) The section of Why I do it by User:BPositive[2] regarding the Collaboration of the Month at the bottom of the English page [3] is what IT is ALL about. Rock On, Pratik! c) The Wikiproject Film [4] - quietly started by the Hindi community [5] - is fantastic. Check out the articles that they have been editing and join them on more! d) To be honest, the Malayalam page [6] just blew my mind... Phew. So much has happened - but the best part for me was the community interaction section. That is the surest sign that there is so much more to come. e) Isn't it amazing that there is actually an Experiments section on the Tamil page [7]. How cool is that! The trial of site notices and the use of that information for a planned redesign of the main page to try and attract new users is really interesting - and tips from the Tamil community would be useful across India, and indeed the world. I did also want to (re)extend the invite from Kannada[8] Wikimedians for fellow lovers of Kannada to join in their Articles Enhancement Project.[9] The planning stage is complete - and the actual work has started but with about 1800 articles to work on, the workload is huge for a very small set of volunteers. If anyone can help them out, I'm sure they'd be most appreciative. (Some tasks can be done be be done quite comfortably by new editors - and some can even be done by non-Kannada speakers! Everyone is welcome!) Best, hisham [0] http://wiki.wikimedia.in/WikiPatrika/2012-05 [1] http://wiki.wikimedia.in/WikiPatrika/2012-05/Community_News/as [2] http://en.wikipedia.org/wiki/User:BPositive [3] http://wiki.wikimedia.in/WikiPatrika/2012-05/Community_News/en [4] http://hi.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%95%E0%A4%BF%E0%A4%AA%E0%A5%80%E0%A4%A1%E0%A4%BF%E0%A4%AF%E0%A4%BE:Film [5] http://wiki.wikimedia.in/WikiPatrika/2012-05/Community_News/hi [6] http://wiki.wikimedia.in/WikiPatrika/2012-05/Community_News/ml [7] http://wiki.wikimedia.in/WikiPatrika/2012-05/Community_News/ta [8] http://wiki.wikimedia.in/WikiPatrika/2012-05/Community_News/kn [9] http://kn.wikipedia.org/wiki/%E0%B2%B5%E0%B2%BF%E0%B2%95%E0%B2%BF%E0%B2%AA%E0%B3%80%E0%B2%A1%E0%B2%BF%E0%B2%AF:%E0%B2%85%E0%B2%A8%E0%B3%81%E0%B2%B5%E0%B2%BE%E0%B2%A6%E0%B2%97%E0%B3%8A%E0%B2%82%E0%B2%A1_%E0%B2%B2%E0%B3%87%E0%B2%96%E0%B2%A8%E0%B2%97%E0%B2%B3_%E0%B2%B8%E0%B2%82%E0%B2%B5%E0%B2%B0%E0%B3%8D%E0%B2%A7%E0%B2%A8%E0%B2%BE_%E0%B2%AF%E0%B3%8B%E0%B2%9C%E0%B2%A8%E0%B3%86 On May 10, 2012, at 10:03 PM, Noopur Raval wrote: Dear all, I am very happy to tell you all that the third issue of Wikipatrika - the community newsletter is finally out! You can check it here: http://wiki.wikimedia.in/WikiPatrika/2012-05 First things first, a humble acknowledgement to all the editors who helped out with this issue of the newsletter. A special thanks to Gitartha (as), Debanjan (bn), Karthik (en and mr), Pratik (en), DS Vyas and Sushant (gu), Siddhartha (hi), Omshivaprakash (kn), Anoop, Kannan, Sreejith, Viswaprabha and Shiju (ml), Abhishek (mr), Rajesh and Saroj (ne), Mkar and Srikant Kedia (or), Abhiram (sa), Logicwiki and Shanmugam (ta), Arjuna and Rahimanuddin (te). In other contributions, thanks to Srikanth R for Commons, Logic, Santhosh, Achal for Free culture news and Tinu for Press news. This entire venture could not have been possible without Tanvir's constant support with templates and tweaks. Forgive me if I have missed someone. So, what has changed from last time? Since all the Wikimeetups are listed on Wikipedia here and on the respective community pages, we've omitted the section on events and meetups. Instead, we've added a section on Featured Interviews which has a GLAM interview this time and the EPOV column (Editor's point of view: INCOTM report). The idea is to make Wikipatrika a supplement to the mailing lists and village pumps and not replicate the same information in all places. Hopefully, by the next issue, with more featured interviews, insightful pieces from editors and free culture persons we will be able to achieve that. The way forward: It's a proud moment to see the amount of activity that all our language communities are involved in. We should definitely strive to make Wikipatrika regular and hopefully get more editors to contribute. This is one space where editors can express the dreams, visions and challenges that their Wikipedia community faces. If you have projects you want to publicize or successful projects that could be started in other communities, this is
[Wikimediaindia-l] [Blog post]: Numerals in Indic languages
Hey folks, Shiju has published an interesting post on numerals in an Indic context. It's a really fascinating overview - including how numerals are depicted across Indic languages, their use across languages, the policies adopted by the various Indic communities, the need for some community decisions to take things forward, a bit of Shiju's personal grumble on the fact that Hindi film posters are no longer in Hindi :-) and a picture of a Northern Railways bed sheet! Intrigued? Read more on either on metahttp://meta.wikimedia.org/wiki/India_Program/Indic_Languages/Numerals_in_Indic_Languages__Indic_language_Wikipedias and or the Chapter bloghttp://blog.wikimedia.in/2012/06/01/numerals-in-indic-languages-indic-wikipedias-2/ . [1] http://meta.wikimedia.org/wiki/India_Program/Indic_Languages/Numerals_in_Indic_Languages_%26_Indic_language_Wikipedias [2] http://blog.wikimedia.in/2012/06/01/numerals-in-indic-languages-indic-wikipedias-2/ Regards, Noopur -- Noopur Raval ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] [Blog post]: Numerals in Indic languages
On Fri, Jun 1, 2012 at 4:09 PM, Noopur Raval nra...@wikimedia.org wrote: Hey folks, Shiju has published an interesting post on numerals in an Indic context. It's a really fascinating overview - including how numerals are depicted across Indic languages, their use across languages, the policies adopted by the various Indic communities, the need for some community decisions to take things forward, a bit of Shiju's personal grumble on the fact that Hindi film posters are no longer in Hindi :-) and a picture of a Northern Railways bed sheet! Intrigued? Read more on either on metahttp://meta.wikimedia.org/wiki/India_Program/Indic_Languages/Numerals_in_Indic_Languages__Indic_language_Wikipedias and or the Chapter bloghttp://blog.wikimedia.in/2012/06/01/numerals-in-indic-languages-indic-wikipedias-2/ . [1] http://meta.wikimedia.org/wiki/India_Program/Indic_Languages/Numerals_in_Indic_Languages_%26_Indic_language_Wikipedias [2] http://blog.wikimedia.in/2012/06/01/numerals-in-indic-languages-indic-wikipedias-2/ Good read. Thanks for the post Shiju. Coincidentally I came across Kaplan's blog[1] about digits and numbers just yesterday. It was a good read, also throws in implementation (technical + usage) related issues. I would also like to ask Shiju if Wikipedia community has the right to adopt the say its own numeral standard without considering the fact that whole world does not use it. Can Wikipedia be used as medium to introduce language changes or should the task be just documenting things? I ask this because we are also having similar debates about language style[2] (not to be confused with grantha, which is planned to be discussed later as grantha will have similar factors + additional factors for consideration.) and if a Wikipedia can introduce a new language style(not sentence constructs, but new forms of words[again not technical words, but new words for nouns which are in popular use] on its own without an external guideline and there are different thoughts. Thanks! [1] http://blogs.msdn.com/b/michkap/archive/2005/01/24/359347.aspx [2] http://tawp.in/r/3n4 -- Regards Srikanth.L PS: May I also please ask you to add a Copyright notice on blog(similar to copied text/images wmf blog) since the same is pasted from meta. ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
[Wikimediaindia-l] TOI:Urdu Wikipedia to be up and running in June
This article in today's Times of India left me puzzled. This is not related to the Urdu wikipedia is it? excerpt: Mooted by the National Council for Promotion of Urdu Language (NCPUL), instituted by the human resource development ministry, the project is likely to be up and running in June. We want to the young generation to be interested in Urdu literature and culture, Khwaja Mohammed Ekramuddin, NCPUL director, said. The council will initially upload nine volumes of the Urdu encyclopedia that covers subjects ranging from arts, science, politics and culture. It also plans to place a team of people to edit and update comments posted on the Urdu Wiki. Article: http://timesofindia.indiatimes.com/home/education/news/Urdu-Wikipedia-to-be-up-and-running-in-June/articleshow/13695535.cms -- j.mp/ArunGanesh http://j.mp/ArunGanesh ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] TOI:Urdu Wikipedia to be up and running in June
2012/6/1 Arun Ganesh arun.plane...@gmail.com: This article in today's Times of India left me puzzled. This is not related to the Urdu wikipedia is it? Hmm, it's a curious article, but I'm as puzzled as you are. Are they uploading an existing encyclopedia to Wikipedia? Are they creating a new site? Why is Pakistan ahead of India if it's the same language? It's possible, of course, that the journalist who wrote the article didn't quite understand it himself. It happens very often... -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] TOI:Urdu Wikipedia to be up and running in June
Amir, I didn't understand what you mean by Pakistan being ahead? Has Pakistan also taken up any govt-backed initiative for Urdu development? Arun, this could be like what the K'taka govt tried with Kanaja.[1] Curiously, when it was released, the press said Kannada Wikipedia launched.[2] 1: http://kanaja.in/ 2a: http://news.oneindia.in/2009/12/06/kannada-wikipedia-launched.html 2b: http://www.youtube.com/watch?v=rcz-nkx5vrU On Fri, Jun 1, 2012 at 7:19 PM, Amir E. Aharoni amir.ahar...@mail.huji.ac.il wrote: 2012/6/1 Arun Ganesh arun.plane...@gmail.com: This article in today's Times of India left me puzzled. This is not related to the Urdu wikipedia is it? Hmm, it's a curious article, but I'm as puzzled as you are. Are they uploading an existing encyclopedia to Wikipedia? Are they creating a new site? Why is Pakistan ahead of India if it's the same language? It's possible, of course, that the journalist who wrote the article didn't quite understand it himself. It happens very often... -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l -- Regards, Srikanth Ramakrishnan. ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] TOI:Urdu Wikipedia to be up and running in June
2012/6/1 Srikanth Ramakrishnan parakara.gh...@gmail.com: Amir, I didn't understand what you mean by Pakistan being ahead? Has Pakistan also taken up any govt-backed initiative for Urdu development? Well, the article says that India has been so far left behind by Pakistan and other countries and I really don't understand what does it mean in this context. Quite likely it just doesn't mean anything :) -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] TOI:Urdu Wikipedia to be up and running in June
On Jun 1, 2012, at 7:13 PM, Arun Ganesh wrote: This article in today's Times of India left me puzzled. This is not related to the Urdu wikipedia is it? snip Mooted by the National Council for Promotion of Urdu Language (NCPUL), instituted by the human resource development ministry, the project is likely to be up and running in June. Article: http://timesofindia.indiatimes.com/home/education/news/Urdu-Wikipedia-to-be-up-and-running-in-June/articleshow/13695535.cms Shiju and I are meeting up with the National Council for Promotion of Urdu Language on Monday to understand this better and explore any opportunities that might be there. There have been attempts in the past to try and build a (similar-ish model) to Wikipedia (such as bharatpedia) - but we need to learn more about this particular initiative. hisham ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] TOI:Urdu Wikipedia to be up and running in June
//The Urdu Wiki, its current tentative nomenclature, will work along the lines of Wikipedia allowing readers to edit and add comments.// This makes it clear that this is an independent encyclopaedia initiative. Wikipedia, Wiki have long become common names to indicate anything like encyclopaedia. Ravi ___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] TOI:Urdu Wikipedia to be up and running in June
On Jun 1, 2012, at 9:11 PM, Ravishankar wrote: //The Urdu Wiki, its current tentative nomenclature, will work along the lines of Wikipedia allowing readers to edit and add comments.// This makes it clear that this is an independent encyclopaedia initiative. Wikipedia, Wiki have long become common names to indicate anything like encyclopaedia. I suspect as much. ...but let's see how the meeting goes and if we can work out some form of mutually beneficial partnership. hisham___ Wikimediaindia-l mailing list Wikimediaindia-l@lists.wikimedia.org To unsubscribe from the list / change mailing preferences visit https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l
Re: [Wikimediaindia-l] [Blog post]: Numerals in Indic languages
I would also like to ask Shiju if Wikipedia community has the right to adopt the say its own numeral standard without considering the fact that whole world does not use it. This is my personal opinion as an Indic language wikimedian. According to me, Wikimedia community should be following what the speakers of that language use. We cannot force numerals/script through wikimedia projects on the speakers of a language if the speakers of that languages are not even aware about it. That is the reason why few Indic language like Tamil, Malayalam and so on completely moved to Arabic numerals since the respective language speakers do not know the language numerals. But take the case of Kannada. The Kannada speakers are aware about kannada numerals and they use both numeral system in their daily life. So it was easy for Kannada community to stick to Kannada numerals. This will not be possible in Malayalam or Tamil since majority of the speakers of the respective language do not know the respective numerals. So in short Wiki community cannot forcefully adopt a numeral system with out considering the speakers of the language. That is why it was so easy for Assamese and Bangla to adopt the respective numeral. Can Wikipedia be used as medium to introduce language changes or should the task be just documenting things? If we think from English or European language wikipedias the answer might be No. Here I am answering from the perspective indic language wikipedias. So the answer (according to me) to this question is, *to some extent* Yes. Remember for most Indic languages, Wikipedia is the first unicode website in that respective language. Even though the primary mission of wikimedia is to document things through various projects, for Indic languages knowingly or unknowingly we are brining many revolutionary things for that language in the cyber world. So for Indic language, Wikimedia projects are not just another website in that language. Which means there is some sort of language intervention is happening through the work done by Indic Wikimedians. Over the past 6 years I have seen many of us asking (only we Indians will ask like that) about the relevance of Indic Wikipedias when English Wikipedia is available. I personally met and heard wikipedians itself speaking against it and lobbying for it. But now a days I am able to note that these same people who had criticized the existence of Indic wikipedias now started speaking for it and even slowly started editing on it :) I have seen users who never used an Indic language for their studies earlier, now started studying their mother language just to contribute to the wiki project in their mother language. This trend is going to continue and increase now and we will see more speakers of a language speak and work for their language wiki projects. That is why I told Indic language wiki projects are doing more than just documenting the things. May I also please ask you to add a Copyright notice on blog(similar to copied text/images wmf blog) since the same is pasted from meta. I have asked the chapter blog administrators to add copyright notice at the footer of the blog. I do not have access to that. Shiju On Fri, Jun 1, 2012 at 4:58 PM, Srikanth Lakshmanan srik@gmail.comwrote: On Fri, Jun 1, 2012 at 4:09 PM, Noopur Raval nra...@wikimedia.org wrote: Hey folks, Shiju has published an interesting post on numerals in an Indic context. It's a really fascinating overview - including how numerals are depicted across Indic languages, their use across languages, the policies adopted by the various Indic communities, the need for some community decisions to take things forward, a bit of Shiju's personal grumble on the fact that Hindi film posters are no longer in Hindi :-) and a picture of a Northern Railways bed sheet! Intrigued? Read more on either on metahttp://meta.wikimedia.org/wiki/India_Program/Indic_Languages/Numerals_in_Indic_Languages__Indic_language_Wikipedias and or the Chapter bloghttp://blog.wikimedia.in/2012/06/01/numerals-in-indic-languages-indic-wikipedias-2/ . [1] http://meta.wikimedia.org/wiki/India_Program/Indic_Languages/Numerals_in_Indic_Languages_%26_Indic_language_Wikipedias [2] http://blog.wikimedia.in/2012/06/01/numerals-in-indic-languages-indic-wikipedias-2/ Good read. Thanks for the post Shiju. Coincidentally I came across Kaplan's blog[1] about digits and numbers just yesterday. It was a good read, also throws in implementation (technical + usage) related issues. I would also like to ask Shiju if Wikipedia community has the right to adopt the say its own numeral standard without considering the fact that whole world does not use it. Can Wikipedia be used as medium to introduce language changes or should the task be just documenting things? I ask this because we are also having similar debates about language style[2] (not to be confused with grantha, which is planned to be discussed