Hi,

For a long time Indic languages Wikisource projects depended totally
on manual proofreading, which not only wasted a lot of time, but also
a lot of energy. Recently Google has released OCR software for more
than 20 Indic languages, along with other Asian languages. This
software is far far better and accurate than the previous OCRs. But it
has many limitations. Uploading the same large file two times (one
time for Google OCR and another at Commons) is not an easy solution
for most of the contributors, as Internet connection is way slow in
India. Now if we develop a tool which can feed the uploaded pdf or
djvu files of Commons directly to Google OCRs, so that uploading them
2 times can be avoided.

This was proposed in 2015 community wishlist. Now, as the voting
procedure for the wishlist has been started, the proposal needs your
support. Please follow the link-

https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Wikisource#Tool_to_use_Google_OCRs_in_Indic_language_Wikisource

FYI, this proposal was also accepted as a highest priority need at the
2015 Wikisource Conference in Vienna.
(https://etherpad.wikimedia.org/p/wscon2015needs)

Regards
-- 
Bodhisattwa Mandal
Administrator, Bengali Wikipedia

''Imagine a world in which every single person on the planet is given
free access to the sum of all human knowledge.''

_______________________________________________
Wikimediaindia-l mailing list
Wikimediaindia-l@lists.wikimedia.org
To unsubscribe from the list / change mailing preferences visit 
https://lists.wikimedia.org/mailman/listinfo/wikimediaindia-l

Reply via email to