On 05/24/2013 09:11 AM, Andrea Zanni wrote:

> I remember, for example, an awesome tool from Alex Brollo, postOCR,

a js script which corrects automatically most common OCR errors and
> converts apostrophes.


Where is this? Is it documented in English?

Andrea mentioned two different tools merged into one.
1. postOCR code comes mainly from Pathoschild's
RegexMenuFramework<http://meta.wikimedia.org/wiki/User:Pathoschild/Scripts/Regex_menu_framework>
with
minor changes for Italian OCR errors.
2. apostrophes conversion (from keyboard, typewriter one ' into real
apostrophe character ’) comes from an original it.source script (in python
to be used by a bot, and in js to be merged into postOCR); it's very
complex, since conversions into templates, link, html tags, math tags and
wiki markup must be avoided. This it far from simple, since regex doesn't
help to manage nested templates/nested code structures. No, we don't
document this stuff. We simply use it.... a lot.

Alex



2013/5/25 Lars Aronsson <l...@aronsson.se>

> On 05/24/2013 09:11 AM, Andrea Zanni wrote:
>
>> I remember, for example, an awesome tool from Alex Brollo, postOCR,
>> a js script which corrects automatically most common OCR errors and
>> converts apostrophes.
>>
>
> Where is this? Is it documented in English?
>
>
>  As an example, we are collaborating right now with a philologist (a
>> digital humanist)
>> who put text on Wikisource, proofread them with the community,
>> and then works on them.
>>
>
> Do you document and distribute your experience?
>
>
>
> --
>   Lars Aronsson (l...@aronsson.se)
>   Aronsson Datateknik - http://aronsson.se
>
>   Project Runeberg - free Nordic literature - http://runeberg.org/
>
>
>
> ______________________________**_________________
> Wikisource-l mailing list
> Wikisource-l@lists.wikimedia.**org <Wikisource-l@lists.wikimedia.org>
> https://lists.wikimedia.org/**mailman/listinfo/wikisource-l<https://lists.wikimedia.org/mailman/listinfo/wikisource-l>
>
_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Reply via email to