Perhaps there's a misinterpretation - I mentioned abbyy.xml but with no
project to import it as-it-is; abbyy.xml is only a surprising data
container from which extract anything useful to speed up proofreading (and
formatting) - nothing more than this.

Just an example: vertical djvu coordinates of lines can be used to get
font-size; horizontal coordinates of lines can be used to recognize
 centered text; paragraphs splitting is valuable; coolumns can be
recognized; margin too; with some effort probably poems can pop up.

Far from simply importing  coordinates, it's a matter of use them at our
best; no data, no data information contents.

Alex


2013/7/17 Lars Aronsson <[email protected]>

> On 07/17/2013 12:57 PM, Alex Brollo wrote:
>
>> FineReader OCR stores an incredibly detailed information in [...]
>> abbyy.xml
>>
>
> In the other end, Wikisource is a wiki that edits wiki text.
> Sure, you could insert the XML there and let users
> edit the XML, but that would scare more users away
> and allow for more mistakes.
>
> For example, if proofreading Hamlet,
>
>   To be or not to bc, that is the question,
>
> anybody can easily spot "bc" and correct that.
> In the XML version,
>
>  <word x=1 y=1>To</word>
>  <word x=5 y=1>be</word>
>  <word x=8 y=1>or</word>
>
> someone might think that "or" should be a litte more
> to the right, so one user inserts a space between the
> tag "<word x=8 y=1>" and "or", while another user
> adjusts the tag to "<word x=9 y=1>". All the tags
> make it harder to spot the OCR error "bc".
>
> Even if you replace manual XML editing with some
> graphic tool, you get the same ambiguity between
> adding whitespace and adjusting coordinates.
>
> This is a nightmare that we avoid by throwing away
> all the coordinates and just proofreading the plain text.
> It is not the perfect system, it's a compromise, in
> order to get some useful work done.
>
>
> --
>   Lars Aronsson ([email protected])
>   Project Runeberg - free Nordic literature - http://runeberg.org/
>
>
>
> ______________________________**_________________
> Wikisource-l mailing list
> [email protected].**org <[email protected]>
> https://lists.wikimedia.org/**mailman/listinfo/wikisource-l<https://lists.wikimedia.org/mailman/listinfo/wikisource-l>
>
_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Reply via email to