I apologyze for using Italian, my aim was to send a personal reply to Nemo.
Being a personal comment, it doesn't deserve an English translation, so ingore it please. Alex 2015-10-05 15:05 GMT+02:00 Alex Brollo <alex.bro...@gmail.com>: > Interessante; una conferma della mia vecchia idea che il "cuore di > wikisource" è il nsIndice, e l'unità di trascrizione +è la pagina in > nsPagina ma è un'opinione isolata, sono stato contraddetto da chi (anche > fra i wikisourciani di altissimo livello internazionale) è convinto che > nsIndice e nsPagina siano unicamente "proofreading tools". > > Ovvio che la strutturazione xml dei contenuti, per quel poco che ho visto, > richiama (è l'evoluzione?) della struttura TEI, ma vivendo dentro > wikisource vedo che il "peccato originale" di non valorizzare nsPagina > rischia di rendere le cose complesse, o impossibili, oltre ad aver disperso > incredibili energie nella "transclusione". > > Le mie energie e il mio entusiasmo stanno scemando.... > > Alex > > > 2015-10-05 13:04 GMT+02:00 Federico Leva (Nemo) <nemow...@gmail.com>: > >> I'm finding this document quite useful: >> http://www.succeed-project.eu/sites/default/files/deliverables/Succeed_600555_WP4_D4.1_RecommendationsOnFormatsAndStandards_v1.1.pdf >> >> See description of ALTO pasted below, which is a followup to >> https://lists.wikimedia.org/pipermail/wikisource-l/2014-September/002081.html >> . We should find a way to convert the transcribed books' HTML to ALTO >> format. :) >> >> Some libraries are apparently using >> http://www.primaresearch.org/tools/Aletheia which seems an augmented >> (but unfree?!) version of ScanTailor with some different purpose. >> >> Nemo >> >> Principles >> ALTO stores layout information and OCR recognized text of pages of any >> kind of printed >> documents like books, journals and newspapers. ALTO can detail technical >> metadata for >> describing the layout and content of physical resources (text, >> illustrations, graphics). >> ALTO describes a content page with different views: >> The Description section helps to describe some general settings and >> information >> of the ALTO file (measurement units, file name, etc.), and the production >> process >> itself (processing steps, software used, dates and actors, etc.) >> The Layout section contains what‟s on the page. A page is divided into >> several >> regions (print space; left, right, top and bottom margins). For each >> region, all >> objects are listed which have been detected inside: text blocks, >> illustrations, >> graphical elements, composed blocks. Each object previously identified is >> defined >> by generic attributes: width, height, text content (for the String >> element). >> Besides, the reading order of all the elements can be managed. >> Each ALTO file may also contain a style section where different styles >> (for >> paragraphs and fonts) are listed. >> Use cases >> ALTO is one of the most common formats used by libraries for converting >> text from >> images. It‟s used both to deliver digitized contents and to preserve >> these contents. >> In a delivery perspective, the ability of ALTO to store the text content >> coordinates in a >> page allows the overlay of image and text (multilayer PDF) and highlight >> search words >> in a query. >> >> _______________________________________________ >> Wikisource-l mailing list >> Wikisource-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikisource-l >> > >
_______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l