Re: [Wikisource-l] About texts without supporting files and Index: pages

2013-06-13 Thread Andrea Zanni
On Wed, Jun 12, 2013 at 4:47 PM, Aarti K. Dwivedi ellydwivedi2...@gmail.com
 wrote:

 If I am not wrong, as of today, most books that were born digital, are
 still under copyright. Of course, they are available freely on the
 internet. But we can't use the pirated copies. How would we go about the
 procurement of these books?
 If we procure these copyrighted books, then the only we would have to do
 is to check for proper formatting. Isn't it?


You are thinking of *books*, which are not the only documents Wikisource
can host.
For example, I am thinking about Open Access literature, which counts in
hundred thousands CC-BY licensed articles, for example.
Just look in DOAJ: http://www.doaj.org/

One of the wikimedians most involved in Open Access - Wiki collaboration is
Daniel Mietchen (cc'ed).
He's working on a bot who could grab the XML/HTML of an online article,
format it in wikicode, and post it wherever he wants (maybe, Wikisources).
The bot is aming to download automatically all images within the articles,
and post them on Commons.

I personally think that this project is beyond awesomeness,
IF we manage to solve particular and specific issues (as converting
hyperlinks to other articles in wikilinks to those articles posted on
WIkisource...)

As I said before, I see Wikisource as a broad, international, connected,
hypertextual digital library,
which has a thing no other digital library in the world has: a dedicated
community[*].

It is my personal opinion, I know some people don't see it that way (like
Alex :-D)


Aubrey

[*] there is Project Gutenberg, but I would argue they are not a digital
library...
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] Converting pdf files into wiki markup

2013-06-13 Thread Andrea Zanni
If you are interested in working with PDFs,
study this blog :-)
http://blogs.ch.cam.ac.uk/pmr/

(these fellows are open access activist, btw)

Aubrey


On Wed, Jun 12, 2013 at 7:04 PM, David Cuenca dacu...@gmail.com wrote:

 It is not a trivial matter. The best bet would be to take an existing pdf
 import tool for a word processor, and try to write a similar tool for
 wikitext.

 There is the Oracle PDF Import Extension for Open Office, the code can be
 browsed, maybe it can give you some ideas
 http://extensions.services.openoffice.org/project/pdfimport

 Micru

 On Wed, Jun 12, 2013 at 12:38 PM, Alex Brollo alex.bro...@gmail.comwrote:

 When we tried to convert into wiki code (a needed step to add links and
 to convert files into a wiki hypertext) a pdf file, that's a opaque,
 closed format, such a work turned off in a nightmare. If we simply load
 free pdf books as they are, I don't see any advantage, but feed
 wikisource numbers/statistics nd this in presently far from my personal
 interest.

 As you guess, I'm one of users who don't support Aubrey's enthusiasm
 about  texts born digital, even if free. :-)

 Alex


 2013/6/12 David Cuenca dacu...@gmail.com

 Nobody is saying anything about using copyrighted works, there are many
 books that have an open license that would allow to include them in
 Wikisource.

 For instance in ca-ws we have this translation from 2009:

 http://ca.wikisource.org/wiki/Llibre:El_secret_de_l%E2%80%99or_que_creix_%282009%29.djvu

 The original is in the PD, and the translator gave away his rights. It
 would have been much easier to work directly with the pdf, instead of
 converting to djvu.

 Micru


 On Wed, Jun 12, 2013 at 10:47 AM, Aarti K. Dwivedi 
 ellydwivedi2...@gmail.com wrote:

 If I am not wrong, as of today, most books that were born digital, are
 still under copyright. Of course, they are available freely on the
 internet. But we can't use the pirated copies. How would we go about
 the procurement of these books?
 If we procure these copyrighted books, then the only we would have to
 do is to check for proper formatting. Isn't it?


 On Wed, Jun 12, 2013 at 7:58 PM, Lars Aronsson l...@aronsson.sewrote:

 On 06/12/2013 02:48 PM, Andrea Zanni wrote:

 We could define some tasks as
 * corrected the page
 * OPTIONAL added optional templates/links/annotations
 *...


 Geotagged all the photos, ...

 The list doesn't end. You need a generic mechanism
 for any new feature you can invent. But aren't our
 existing templates and categories the best way to
 do this? You could just add to each page:
 {{done|proofread=user1|**validated=user2|geotagged=**user4|...}}


 --
   Lars Aronsson (l...@aronsson.se)
   Project Runeberg - free Nordic literature - http://runeberg.org/




 __**_
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.**org Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/**mailman/listinfo/wikisource-lhttps://lists.wikimedia.org/mailman/listinfo/wikisource-l




 --
 Aarti K. Dwivedi


 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l




 --
 Etiamsi omnes, ego non
 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l



 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l




 --
 Etiamsi omnes, ego non
 ___
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l


___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


[Wikisource-l] Use of public book scanners

2013-06-13 Thread Lars Aronsson

Some research libraries in Stockholm (at archives and
museums) have put up book scanners that the public
can use. They have the same function as a public
copier, but you get your copies on a USB stick rather
than on paper.

This opens an interesting opportunity for Wikisource and
similar volunteer book scanning projects. Instead of
buying expensive equipment, experimenting with
cameras and lighting, or building your own scanner,
you can just visit such a library. I guess you can even
bring your own book and scan it there, instead of just
using the library's books. (Of course you still need to
consider copyright. That goes without saying.)

Wikimedia Sverige, the Swedish chapter of the WMF,
started a wiki page to document some experience
from this kind of use, in Swedish of course,
https://se.wikimedia.org/wiki/Allm%C3%A4nhetens_bokscanner

Here is an example of a book that was scanned this way,
http://runeberg.org/nordmuseet/1897/0001.html
(Ironically, it is the annual report for 1897 of the museum
where it was scanned. They have the scanner standing in
their own library, but they have not scanned their own
reports.)

Are you familiar with anyting similar? Any other pages
that we should link to?


--
  Lars Aronsson (l...@aronsson.se)

  Wikimedia Sverige - stöd fri kunskap - http://wikimedia.se/

  Project Runeberg - free Nordic literature - http://runeberg.org/



___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] Use of public book scanners

2013-06-13 Thread Alex Brollo
Scan quality is excellent.

Yes, is a very promising way - my suggestion is, to get always scans in
TIFF (if possible; they are large but USB are large too ...), tro transform
them into an image-only pdf (which is the simpler tool to do this?)  and to
load a copy into Internet Archive specifyng both the library where the book
has been scanned AND the wikisource contribution in scansion/merging
TIFFs/uploading into IA.

Then the excellent OCR - divu produced by IA can be downloaded and
uploaded into Commons. A good way to share anything, IMHO.

In the meantime: IA produces too an extremely interesting ABBYY.gz output;
it's a xml where a incredible set of interesting data is recorded for any
scanned character. Here an example for a random character of a random IA
book:

charParams l=1356 t=680 r=1544 b=884 wordStart=false
wordFromDictionary=true wordNormal=true wordNumeric=false
wordIdentifier=false charConfidence=25 serifProbability=100
wordPenalty=0 meanStrokeWidth=347G/charParams

Something to explore deeply  IMHO; I presume that less than 1% of usable
ABBYY scan data are wrapped into djvu as OCR layer.

Alex




2013/6/13 Lars Aronsson l...@aronsson.se

 Some research libraries in Stockholm (at archives and
 museums) have put up book scanners that the public
 can use. They have the same function as a public
 copier, but you get your copies on a USB stick rather
 than on paper.

 This opens an interesting opportunity for Wikisource and
 similar volunteer book scanning projects. Instead of
 buying expensive equipment, experimenting with
 cameras and lighting, or building your own scanner,
 you can just visit such a library. I guess you can even
 bring your own book and scan it there, instead of just
 using the library's books. (Of course you still need to
 consider copyright. That goes without saying.)

 Wikimedia Sverige, the Swedish chapter of the WMF,
 started a wiki page to document some experience
 from this kind of use, in Swedish of course,
 https://se.wikimedia.org/wiki/**Allm%C3%A4nhetens_bokscannerhttps://se.wikimedia.org/wiki/Allm%C3%A4nhetens_bokscanner

 Here is an example of a book that was scanned this way,
 http://runeberg.org/**nordmuseet/1897/0001.htmlhttp://runeberg.org/nordmuseet/1897/0001.html
 (Ironically, it is the annual report for 1897 of the museum
 where it was scanned. They have the scanner standing in
 their own library, but they have not scanned their own
 reports.)

 Are you familiar with anyting similar? Any other pages
 that we should link to?


 --
   Lars Aronsson (l...@aronsson.se)

   Wikimedia Sverige - stöd fri kunskap - http://wikimedia.se/

   Project Runeberg - free Nordic literature - http://runeberg.org/



 __**_
 Wikisource-l mailing list
 Wikisource-l@lists.wikimedia.**org Wikisource-l@lists.wikimedia.org
 https://lists.wikimedia.org/**mailman/listinfo/wikisource-lhttps://lists.wikimedia.org/mailman/listinfo/wikisource-l

___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l