I explored as a user the website of Distributed Proofreaders, to catch ideas about proofreading. It has been a very productive and highlighting experience, even if the whole philosophy of DP proofreading/formatting is completely different - and incompatible - with wiki approach. One of tools is an excellent customable, js-based spelling dictionary. How much I desire something like that into wikisource! Obviuosly we need an excellent, very simply customable tool - ideally, a "specific book spelling tool", I tried to think about but there are lots of difficulties - the first one is, that it's difficult to highlight words into a textarea by js. Can be, that VisualEditor could make things easier.
Alex 2013/5/24 Andrea Zanni <zanni.andre...@gmail.com> > I completely agree with Lars. > I remember, for example, an awesome tool from Alex Brollo, postOCR, > a js script which corrects automatically most common OCR errors and > converts apostrophes. > The tool is very useful and very used, and it would improve a lot from > a given list of common OCR errors per book. > > Moreover, a set of stats per books > (list of words used, counting those words, etc.) > could be very interesting for a tiny range of readers, but skilled ones, > as digital humanists and philologists. > > As an example, we are collaborating right now with a philologist (a > digital humanist) > who put text on Wikisource, proofread them with the community, > and then works on them. > > Aubrey > > > On Fri, May 24, 2013 at 1:54 AM, Lars Aronsson <l...@aronsson.se> wrote: > >> It should be possible, in any language of Wikisource, to >> check all existing text against a known dictionary valid >> for that year, and to find words that are outside the >> dictionary. These words could be proofread in some tool >> similar to a CAPTCHA. They might be uncommon place names >> that are correctly OCRed but not in the dictionary, or >> they could be OCR errors, or both. >> >> Has anybody tried this? >> >> Such finds are not necessarily the only OCR errors. >> Some OCR errors result in correctly spelled words, that >> are found in the dictionary, e.g. burn -> bum. >> So full manual proofreading and validation will still be >> needed. But a statistics based approach could fill gaps >> and quickly improve full text searchability. >> >> >> -- >> Lars Aronsson (l...@aronsson.se) >> Aronsson Datateknik - http://aronsson.se >> >> Project Runeberg - free Nordic literature - http://runeberg.org/ >> >> >> >> ______________________________**_________________ >> Wikisource-l mailing list >> Wikisource-l@lists.wikimedia.**org <Wikisource-l@lists.wikimedia.org> >> https://lists.wikimedia.org/**mailman/listinfo/wikisource-l<https://lists.wikimedia.org/mailman/listinfo/wikisource-l> >> > > > _______________________________________________ > Wikisource-l mailing list > Wikisource-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikisource-l > >
_______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l