Re: [Wikisource-l] About texts without supporting files and Index: pages
On Wed, Jun 12, 2013 at 4:47 PM, Aarti K. Dwivedi ellydwivedi2...@gmail.com wrote: If I am not wrong, as of today, most books that were born digital, are still under copyright. Of course, they are available freely on the internet. But we can't use the pirated copies. How would we go about the procurement of these books? If we procure these copyrighted books, then the only we would have to do is to check for proper formatting. Isn't it? You are thinking of *books*, which are not the only documents Wikisource can host. For example, I am thinking about Open Access literature, which counts in hundred thousands CC-BY licensed articles, for example. Just look in DOAJ: http://www.doaj.org/ One of the wikimedians most involved in Open Access - Wiki collaboration is Daniel Mietchen (cc'ed). He's working on a bot who could grab the XML/HTML of an online article, format it in wikicode, and post it wherever he wants (maybe, Wikisources). The bot is aming to download automatically all images within the articles, and post them on Commons. I personally think that this project is beyond awesomeness, IF we manage to solve particular and specific issues (as converting hyperlinks to other articles in wikilinks to those articles posted on WIkisource...) As I said before, I see Wikisource as a broad, international, connected, hypertextual digital library, which has a thing no other digital library in the world has: a dedicated community[*]. It is my personal opinion, I know some people don't see it that way (like Alex :-D) Aubrey [*] there is Project Gutenberg, but I would argue they are not a digital library... ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Re: [Wikisource-l] Converting pdf files into wiki markup
If you are interested in working with PDFs, study this blog :-) http://blogs.ch.cam.ac.uk/pmr/ (these fellows are open access activist, btw) Aubrey On Wed, Jun 12, 2013 at 7:04 PM, David Cuenca dacu...@gmail.com wrote: It is not a trivial matter. The best bet would be to take an existing pdf import tool for a word processor, and try to write a similar tool for wikitext. There is the Oracle PDF Import Extension for Open Office, the code can be browsed, maybe it can give you some ideas http://extensions.services.openoffice.org/project/pdfimport Micru On Wed, Jun 12, 2013 at 12:38 PM, Alex Brollo alex.bro...@gmail.comwrote: When we tried to convert into wiki code (a needed step to add links and to convert files into a wiki hypertext) a pdf file, that's a opaque, closed format, such a work turned off in a nightmare. If we simply load free pdf books as they are, I don't see any advantage, but feed wikisource numbers/statistics nd this in presently far from my personal interest. As you guess, I'm one of users who don't support Aubrey's enthusiasm about texts born digital, even if free. :-) Alex 2013/6/12 David Cuenca dacu...@gmail.com Nobody is saying anything about using copyrighted works, there are many books that have an open license that would allow to include them in Wikisource. For instance in ca-ws we have this translation from 2009: http://ca.wikisource.org/wiki/Llibre:El_secret_de_l%E2%80%99or_que_creix_%282009%29.djvu The original is in the PD, and the translator gave away his rights. It would have been much easier to work directly with the pdf, instead of converting to djvu. Micru On Wed, Jun 12, 2013 at 10:47 AM, Aarti K. Dwivedi ellydwivedi2...@gmail.com wrote: If I am not wrong, as of today, most books that were born digital, are still under copyright. Of course, they are available freely on the internet. But we can't use the pirated copies. How would we go about the procurement of these books? If we procure these copyrighted books, then the only we would have to do is to check for proper formatting. Isn't it? On Wed, Jun 12, 2013 at 7:58 PM, Lars Aronsson l...@aronsson.sewrote: On 06/12/2013 02:48 PM, Andrea Zanni wrote: We could define some tasks as * corrected the page * OPTIONAL added optional templates/links/annotations *... Geotagged all the photos, ... The list doesn't end. You need a generic mechanism for any new feature you can invent. But aren't our existing templates and categories the best way to do this? You could just add to each page: {{done|proofread=user1|**validated=user2|geotagged=**user4|...}} -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/ __**_ Wikisource-l mailing list Wikisource-l@lists.wikimedia.**org Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikisource-lhttps://lists.wikimedia.org/mailman/listinfo/wikisource-l -- Aarti K. Dwivedi ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l -- Etiamsi omnes, ego non ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l -- Etiamsi omnes, ego non ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
[Wikisource-l] Use of public book scanners
Some research libraries in Stockholm (at archives and museums) have put up book scanners that the public can use. They have the same function as a public copier, but you get your copies on a USB stick rather than on paper. This opens an interesting opportunity for Wikisource and similar volunteer book scanning projects. Instead of buying expensive equipment, experimenting with cameras and lighting, or building your own scanner, you can just visit such a library. I guess you can even bring your own book and scan it there, instead of just using the library's books. (Of course you still need to consider copyright. That goes without saying.) Wikimedia Sverige, the Swedish chapter of the WMF, started a wiki page to document some experience from this kind of use, in Swedish of course, https://se.wikimedia.org/wiki/Allm%C3%A4nhetens_bokscanner Here is an example of a book that was scanned this way, http://runeberg.org/nordmuseet/1897/0001.html (Ironically, it is the annual report for 1897 of the museum where it was scanned. They have the scanner standing in their own library, but they have not scanned their own reports.) Are you familiar with anyting similar? Any other pages that we should link to? -- Lars Aronsson (l...@aronsson.se) Wikimedia Sverige - stöd fri kunskap - http://wikimedia.se/ Project Runeberg - free Nordic literature - http://runeberg.org/ ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Re: [Wikisource-l] Use of public book scanners
Scan quality is excellent. Yes, is a very promising way - my suggestion is, to get always scans in TIFF (if possible; they are large but USB are large too ...), tro transform them into an image-only pdf (which is the simpler tool to do this?) and to load a copy into Internet Archive specifyng both the library where the book has been scanned AND the wikisource contribution in scansion/merging TIFFs/uploading into IA. Then the excellent OCR - divu produced by IA can be downloaded and uploaded into Commons. A good way to share anything, IMHO. In the meantime: IA produces too an extremely interesting ABBYY.gz output; it's a xml where a incredible set of interesting data is recorded for any scanned character. Here an example for a random character of a random IA book: charParams l=1356 t=680 r=1544 b=884 wordStart=false wordFromDictionary=true wordNormal=true wordNumeric=false wordIdentifier=false charConfidence=25 serifProbability=100 wordPenalty=0 meanStrokeWidth=347G/charParams Something to explore deeply IMHO; I presume that less than 1% of usable ABBYY scan data are wrapped into djvu as OCR layer. Alex 2013/6/13 Lars Aronsson l...@aronsson.se Some research libraries in Stockholm (at archives and museums) have put up book scanners that the public can use. They have the same function as a public copier, but you get your copies on a USB stick rather than on paper. This opens an interesting opportunity for Wikisource and similar volunteer book scanning projects. Instead of buying expensive equipment, experimenting with cameras and lighting, or building your own scanner, you can just visit such a library. I guess you can even bring your own book and scan it there, instead of just using the library's books. (Of course you still need to consider copyright. That goes without saying.) Wikimedia Sverige, the Swedish chapter of the WMF, started a wiki page to document some experience from this kind of use, in Swedish of course, https://se.wikimedia.org/wiki/**Allm%C3%A4nhetens_bokscannerhttps://se.wikimedia.org/wiki/Allm%C3%A4nhetens_bokscanner Here is an example of a book that was scanned this way, http://runeberg.org/**nordmuseet/1897/0001.htmlhttp://runeberg.org/nordmuseet/1897/0001.html (Ironically, it is the annual report for 1897 of the museum where it was scanned. They have the scanner standing in their own library, but they have not scanned their own reports.) Are you familiar with anyting similar? Any other pages that we should link to? -- Lars Aronsson (l...@aronsson.se) Wikimedia Sverige - stöd fri kunskap - http://wikimedia.se/ Project Runeberg - free Nordic literature - http://runeberg.org/ __**_ Wikisource-l mailing list Wikisource-l@lists.wikimedia.**org Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/**mailman/listinfo/wikisource-lhttps://lists.wikimedia.org/mailman/listinfo/wikisource-l ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l