I apologyze for using Italian, my aim was to send a personal reply to Nemo.

Being a personal comment, it doesn't deserve an English translation, so
ingore it please.

Alex

2015-10-05 15:05 GMT+02:00 Alex Brollo <alex.bro...@gmail.com>:

> Interessante; una conferma della mia vecchia idea che il "cuore di
> wikisource" è il nsIndice, e l'unità di trascrizione +è la pagina in
> nsPagina ma è un'opinione isolata, sono stato contraddetto da chi (anche
> fra i wikisourciani di altissimo livello internazionale) è convinto che
> nsIndice e nsPagina siano unicamente "proofreading tools".
>
> Ovvio che la strutturazione xml dei contenuti, per quel poco che ho visto,
> richiama (è l'evoluzione?) della struttura TEI, ma vivendo dentro
> wikisource vedo che il "peccato originale" di non valorizzare nsPagina
> rischia di rendere le cose complesse, o impossibili, oltre ad aver disperso
> incredibili energie nella "transclusione".
>
> Le mie energie e il mio entusiasmo stanno scemando....
>
> Alex
>
>
> 2015-10-05 13:04 GMT+02:00 Federico Leva (Nemo) <nemow...@gmail.com>:
>
>> I'm finding this document quite useful:
>> http://www.succeed-project.eu/sites/default/files/deliverables/Succeed_600555_WP4_D4.1_RecommendationsOnFormatsAndStandards_v1.1.pdf
>>
>> See description of ALTO pasted below, which is a followup to
>> https://lists.wikimedia.org/pipermail/wikisource-l/2014-September/002081.html
>> . We should find a way to convert the transcribed books' HTML to ALTO
>> format. :)
>>
>> Some libraries are apparently using
>> http://www.primaresearch.org/tools/Aletheia which seems an augmented
>> (but unfree?!) version of ScanTailor with some different purpose.
>>
>> Nemo
>>
>> Principles
>> ALTO stores layout information and OCR recognized text of pages of any
>> kind of printed
>> documents like books, journals and newspapers. ALTO can detail technical
>> metadata for
>> describing the layout and content of physical resources (text,
>> illustrations, graphics).
>> ALTO describes a content page with different views:
>> The Description section helps to describe some general settings and
>> information
>> of the ALTO file (measurement units, file name, etc.), and the production
>> process
>> itself (processing steps, software used, dates and actors, etc.)
>> The Layout section contains what‟s on the page. A page is divided into
>> several
>> regions (print space; left, right, top and bottom margins). For each
>> region, all
>> objects are listed which have been detected inside: text blocks,
>> illustrations,
>> graphical elements, composed blocks. Each object previously identified is
>> defined
>> by generic attributes: width, height, text content (for the String
>> element).
>> Besides, the reading order of all the elements can be managed.
>> Each ALTO file may also contain a style section where different styles
>> (for
>> paragraphs and fonts) are listed.
>> Use cases
>> ALTO is one of the most common formats used by libraries for converting
>> text from
>> images. It‟s used both to deliver digitized contents and to preserve
>> these contents.
>> In a delivery perspective, the ability of ALTO to store the text content
>> coordinates in a
>> page allows the overlay of image and text (multilayer PDF) and highlight
>> search words
>> in a query.
>>
>> _______________________________________________
>> Wikisource-l mailing list
>> Wikisource-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>>
>
>
_______________________________________________
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Reply via email to