[Wikisource-l] Re: Phetools statistics for ProofreadPage

2024-04-08 Thread Lars Aronsson
hich was my personal hobby project. Now it stopped working, and it will not be continued. -- Lars Aronsson (l...@aronsson.se) Linköping ___ Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org To unsubscribe send an email to wikisource-l-le...@lists.wikimedia.org

[Wikisource-l] Re: Phetools statistics for ProofreadPage

2024-04-08 Thread Lars Aronsson
to https://commons.wikimedia.org/wiki/Category:ProofreadPage_Statistics but I don't seem to be able to do that any more. -- Lars Aronsson (l...@aronsson.se) Linköping ___ Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org To unsubs

[Wikisource-l] Phetools statistics for ProofreadPage

2024-04-08 Thread Lars Aronsson
g/wiki/Extension:Proofread_Page https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Tool_sweep/Lists/5#phetools https://commons.wikimedia.org/wiki/Category:ProofreadPage_Statistics * * -- Lars Aronsson (l...@aronsson.se) Linköping ___ Wikisource-l mailin

[Wikisource-l] Re: Google not indexing Wikisource for last few years now.

2023-08-02 Thread Lars Aronsson
with duplicate material? -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/ ___ Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org To unsubscribe send an email to wikisource-l-le

[Wikisource-l] Re: OCR in 2023

2023-01-14 Thread Lars Aronsson
re-trained languages? As proofreading progresses, year after year, we should be able to retrain the OCR software and improve its performance. But I don't hear about any such progress. -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://ru

[Wikisource-l] OCR in 2023

2023-01-14 Thread Lars Aronsson
of scanned books? When I ask researchers in image processing / computer vision, they say that plain text (book) OCR "is a solved problem" that nobody researches, and all research goes into self-driving cars reading street signs. Is this true, or are there any exceptions? -- Lars A

[Wikisource-l] Annual April snapshot of ProofreadPage statistics graphs uploaded

2021-04-05 Thread Lars Aronsson
Each year in April, I take a snapshot of Phe's statistics graphs for ProofreadPage, and upload them to Commons, in https://commons.wikimedia.org/wiki/Category:ProofreadPage_Statistics This year's graphs are named Wikisource_20210405_*.svg (In previous years, the format was png.) -- Lars

[Wikisource-l] Systems for proofreading scanned books

2020-12-26 Thread Lars Aronsson
ed up, if it were started today, instead of developing its own extension.) -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/ ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.w

Re: [Wikisource-l] Fwd: Can You Help us Make the 19th Century Searchable?

2020-08-22 Thread Lars Aronsson
ould be reasonable for the WMF to fund a developer (or team of developers) to create such a solution. There is already some solution for marking parts of a picture, right? This needs to work within pages of a PDF or Djvu file. -- Lars Aronsson (l...@aronsson.se) Linköping, Sweden Project Rune

Re: [Wikisource-l] [Brand Project] Next naming phase

2020-06-20 Thread Lars Aronsson
quickly without asking anyone. It would have been criticized, but now it is criticized anyway after very long and slow process, so no gain. -- Lars Aronsson (l...@aronsson.se) Linköping, Sweden Project Runeberg - free Nordic literature - http://runeberg.org/ ___

Re: [Wikisource-l] Wikimania

2019-07-27 Thread Lars Aronsson
extra day. Is that realistic? If so, which day? (Skokloster castle is another destination, close to Uppsala, so a visit could be included in the same trip.) -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://ru

[Wikisource-l] Fraktur OCR with Tesseract

2019-04-15 Thread Lars Aronsson
a Finnish version can be created? I can provide quite a lot of training data in the form of scanned books and proofread text. Is there an active mailing list or web forum for Fraktur issues with Tesseract? -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http

[Wikisource-l] Assessing OCR quality

2019-03-12 Thread Lars Aronsson
(and a normal dictionary) the only useful tool? Would you count the number of spelling errors, or the ratio of errors to correct words? Has anyone done this? -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org

Re: [Wikisource-l] Fwd: Wikimedia affiliate/user group social media accounts

2018-10-23 Thread Lars Aronsson
tributors to these projects. If anybody else wants to be a co-admin with the ability to post to these pages, add me as a friend on Facebook and then I can add you as co-admin. https://www.facebook.com/lars.aronsson.355 -- Lars Aronsson (l...@aronsson.se, user:LA2) Linköpin

[Wikisource-l] Statistics graphs

2017-07-12 Thread Lars Aronsson
://tools.wmflabs.org/phetools/stats.html -- Lars Aronsson (l...@aronsson.se) ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Re: [Wikisource-l] Parallel text, DoubleWiki

2016-08-22 Thread Lars Aronsson
he input from the actual users. -- Lars Aronsson (l...@aronsson.se) Linköping ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

[Wikisource-l] Parallel text, DoubleWiki

2016-08-21 Thread Lars Aronsson
iki https://wikisource.org/wiki/Wikisource:DoubleWiki_Extension Is anybody using this feature in a serious way? Does it have any more details that can make the matching better? If both texts had numbered paragraphs and sentences (something like the Bible), it would in theory be possible to match t

Re: [Wikisource-l] What is our next major hurdle, or where we need most development assistance

2014-11-25 Thread Lars Aronsson
. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Re: [Wikisource-l] What is our next major hurdle, or where we need most development assistance

2014-11-23 Thread Lars Aronsson
serious analysis. How do we proofread so many pages in any reasonable time? We don't have enough volunteers for that. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikisource-l mailing list Wikisource-l

Re: [Wikisource-l] Multilingual Books

2014-08-17 Thread Lars Aronsson
pagination, it makes sense to make each volume/part one Djvu/Pdf file and Index page. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org

Re: [Wikisource-l] [Commons-l] The British Library releases 1 million images

2013-12-20 Thread Lars Aronsson
://runeberg.org/elfsyssel/0331.html but even if you select full resolution there, you only get the image from the PDF, and not the good picture from Flickr. -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org

[Wikisource-l] New, affordable book scanner

2013-10-29 Thread Lars Aronsson
to capture rare books that they don't allow you to bring home. It's not a digital camera array, though, but a linear scanner that sweeps across the page, as the videos show. -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org

Re: [Wikisource-l] Wikidata and Wikisource (feedback needed)

2013-08-22 Thread Lars Aronsson
by Shakespeare (e.g. Hamlet) would also be a good test case, since it has many translations into each language. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org

Re: [Wikisource-l] Proofread extension extraction of OCR text in Djvu

2013-07-17 Thread Lars Aronsson
coordinates. This is a nightmare that we avoid by throwing away all the coordinates and just proofreading the plain text. It is not the perfect system, it's a compromise, in order to get some useful work done. -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http

Re: [Wikisource-l] ABBYY xml files: any of you is working about?

2013-06-17 Thread Lars Aronsson
that this team brings to society? If we were already paying salaries to proofreaders, then we could save a lot of money by producing better OCR text (with formatting). But we have no such existing expenditure to reduce. -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature

[Wikisource-l] Use of public book scanners

2013-06-13 Thread Lars Aronsson
that we should link to? -- Lars Aronsson (l...@aronsson.se) Wikimedia Sverige - stöd fri kunskap - http://wikimedia.se/ Project Runeberg - free Nordic literature - http://runeberg.org/ ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org

Re: [Wikisource-l] Playing with Lua, javascript and pagelist tag

2013-06-02 Thread Lars Aronsson
, since all page links in the margin of the transcluded chapter are marked like this:... id=pag137span id=pagename147... -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/ ___ Wikisource-l mailing

Re: [Wikisource-l] Reunification of Wikisources

2013-06-02 Thread Lars Aronsson
: Where does it leave Project Runeberg? Would it not be needed anymore? For the time being, it is needed for all those texts where copyright is a bit unclear, and that we dare to put online, but that Wikimedia Commons doesn't accept.) -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free

Re: [Wikisource-l] [WikiDA-l] [GLAM] Library e-books on demand

2012-11-10 Thread Lars Aronsson
uploaded the work to http://runeberg.org/glossnor/ with the OCR text provided. It's now ready for your proofreading. -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/ ___ Wikisource-l mailing list

Re: [Wikisource-l] [WikiDA-l] [GLAM] Library e-books on demand

2012-11-09 Thread Lars Aronsson
random noise patterns. Med Adobe Reader i W7, inga problem... It works fine with Evince 3.6.0 in Ubuntu 12.10. My problems were with Ubuntu 12.04. -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org

Re: [Wikisource-l] [WikiDA-l] [GLAM] Library e-books on demand

2012-11-08 Thread Lars Aronsson
On 11/08/2012 01:53 PM, Ole Palnatoke Andersen wrote: The book has been digitized now. I can see it at http://www.kb.dk/e-mat/dod/130019427200.pdf You may or may not be able to see it. Jag kan ladda hem den, men i evince (Linux) ser sidorna ut som random noise patterns. -- Lars Aronsson

[Wikisource-l] Danish books until 1900 scanned for free on demand

2012-11-07 Thread Lars Aronsson
per book plus 3 SEK per page, or 400 SEK ($65, €45) for a typical book of 300 pages. The limitation to Danish IP addresses seems ridiculous to me. I have asked for the reason. I hope someone in Denmark can mass upload these PDF files to Wikimedia Commons. -- Lars Aronsson (l...@aronsson.se

Re: [Wikisource-l] Roadmap Wikisource

2012-08-19 Thread Lars Aronsson
/Digitizing_books_with_MediaWiki I think you need to do something similar. Most of us can't read Hebrew (or Swedish), and won't fully understand any example given in such a small language. -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org

Re: [Wikisource-l] Roadmap Wikisource

2012-08-07 Thread Lars Aronsson
that if you think Wikidata can solve your problem, you will be trapped waiting for that to happen, while years pass by that you could have used better. -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org

Re: [Wikisource-l] Roadmap Wikisource

2012-08-06 Thread Lars Aronsson
. -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/ ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Re: [Wikisource-l] Roadmap Wikisource

2012-08-05 Thread Lars Aronsson
? -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/ ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

[Wikisource-l] Which books to scan to support Wikipedia

2012-07-20 Thread Lars Aronsson
works are more interesting than text? This would be a GLAM + wiki cooperation that cuts across national borders. -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/ ___ Wikisource-l mailing list

Re: [Wikisource-l] @wikisource finally on Twitter!

2012-07-20 Thread Lars Aronsson
to interconnect with and use the Facebook page http://www.facebook.com/Wikisource I created it and it now has 133 fans, but I've made lots of administrators, so anybody should be able to take over. It also has a Facebook timeline for the history of Wikisource. -- Lars Aronsson (l...@aronsson.se

Re: [Wikisource-l] [cultural-partners] IMPACT

2012-06-07 Thread Lars Aronsson
do a poor job. Wikisource can indeed offer the strength of manual, volunteer proofreaders in many different languages. -- Lars Aronsson (l...@aronsson.se) Project Runeberg - free Nordic literature - http://runeberg.org/ ___ Wikisource-l mailing

[Wikisource-l] Toolserver user phe account expired

2012-05-21 Thread Lars Aronsson
of phe's graphs to Commons, where we can go for a nostalgic look at the once existing statistics graph system that phe built, and that worked fine until the German toolserver gang decided to ruin it, http://commons.wikimedia.org/wiki/Category:ProofreadPage_Statistics -- Lars Aronsson (l

[Wikisource-l] Colour-coded character sets in edit box

2012-03-07 Thread Lars Aronsson
/index.php?title=%D0%A1%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0%3AGeo_stat_rus_imp_4.djvu%2F863diff=702229oldid=701373 The large print is the text of an encyclopedic article. The small print is the list of literature references. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http

[Wikisource-l] Blank status bars (bug 34821)

2012-03-05 Thread Lars Aronsson
has left the project. So is anybody maintaining the ProofreadPage extension now? -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org

[Wikisource-l] Terese for proofreading after Tesseract OCR

2011-07-15 Thread Lars Aronsson
the Swedish-Latin dictionary. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se Project Runeberg - free Nordic literature - http://runeberg.org/ ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https

[Wikisource-l] Writing reviews in Google Book Search

2011-04-21 Thread Lars Aronsson
should do systematically? There are 24 volumes in the Swedish Wikisource alone, that are based on scans from Google. (So I guess there could be hundreds in the English Wikisource.) Did anybody try this already? Did you run into any problems? -- Lars Aronsson (l...@aronsson.se) Aronsson

Re: [Wikisource-l] Copyright status of scans

2010-10-27 Thread Lars Aronsson
. Different courts in different countries in different times might decide differently. The important thing is that we should discourage book scanners from claiming copyright. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se

Re: [Wikisource-l] Parallel text alignment

2010-08-16 Thread Lars Aronsson
and the output is a dictionary. It's like a more advanced diff tool. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman

Re: [Wikisource-l] Wikisource books and web 1.0 pages

2010-08-13 Thread Lars Aronsson
scrolling through that sequence can be a way to overcome the delay between fast mechanical scanning and slow manual proofreading. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikisource-l mailing list Wikisource-l

Re: [Wikisource-l] page numbers and the pages/ command

2010-08-06 Thread Lars Aronsson
with the MySQL version, because when an update comes it just works. This is how the ProofreadPage system also needs to work, so small languages can focus on proofreading books instead of software upgrades. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se

Re: [Wikisource-l] Dublin Core and TEI

2010-07-16 Thread Lars Aronsson
important? Could you give an example of a website that does this and actually benefits from it? -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se Project Runeberg - free Nordic literature - http://runeberg.org

Re: [Wikisource-l] [Wikitech-l] Wikisource bugs

2010-07-05 Thread Lars Aronsson
) with OpenStreetMap. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Re: [Wikisource-l] [Wikitech-l] Wikisource bugs

2010-07-03 Thread Lars Aronsson
. But my suggestion is that we start to compile a catalog of such problems, rather than submitting bug reports. Where is a good place to start? -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikisource-l mailing

[Wikisource-l] Can't use templates on Index: page

2010-04-29 Thread Lars Aronsson
there, e.g. http://sv.wikisource.org/wiki/Sida:Post-_och_Inrikes_Tidningar_1836-01-11_3.jpg Some might say I should just combine the images into a single Djvu file, or maybe one Djvu volume for each month or year of the newspaper, but each of these JPG files is 11 megabytes. -- Lars Aronsson (l

Re: [Wikisource-l] OCR requests

2010-04-22 Thread Lars Aronsson
searching for a word and finding it in the right position of the image. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se Wikimedia Sverige - stöd fri kunskap - http://wikimedia.se/ ___ Wikisource-l mailing list

[Wikisource-l] OCR requests

2010-04-21 Thread Lars Aronsson
://en.wikisource.org/wiki/Category:OCR_Requests http://fr.wikisource.org/wiki/Cat%C3%A9gorie:Demandes_d%27OCR http://pt.wikisource.org/wiki/Categoria:!Pedidos_de_OCR -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikisource-l

[Wikisource-l] Large pages

2010-02-24 Thread Lars Aronsson
Has anybody scanned a large-format newspaper in Wikisource? How does proofreading (page extension) work if you have 6 or 8 columns of a broadsheet? What possible methods are there to anchor a part of the OCR text to a part of the image? Are there any rolemodel websites out there? -- Lars

Re: [Wikisource-l] Proofreading

2009-10-11 Thread Lars Aronsson
community. The software should support the community, not force it to accept the developer's opinion. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https

Re: [Wikisource-l] [Foundation-l] Universal Library

2009-09-04 Thread Lars Aronsson
in OpenLibrary to Wikipedia biographies is just one way where we can do a lot, without needing to start a new project. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org

Re: [Wikisource-l] [Foundation-l] Universal Library

2009-09-02 Thread Lars Aronsson
. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Re: [Wikisource-l] [Commons-l] Digitisation equipment

2009-08-28 Thread Lars Aronsson
cameras. Let two teams compete against each other. Write a report for next year's Wikimania. Have great fun! -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se Project Runeberg - free Nordic literature - http://runeberg.org/ Wikimedia Sverige - stöd fri kunskap

Re: [Wikisource-l] [Commons-l] Digitisation equipment

2009-08-28 Thread Lars Aronsson
specialized book can be very useful for a limited Wikiproject. This book was published in 1909 for the 100th anniversary of the Finnish War (1808-1809), and digitized in 2008 for the 200th anniversary. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se Project

Re: [Wikisource-l] Open Library, Wikisource, and cleaning and translating OCR of Classics

2009-08-20 Thread Lars Aronsson
of their catalogs has been a bottleneck for OpenLibrary. -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo

Re: [Wikisource-l] Open Library, Wikisource, and cleaning and translating OCR of Classics

2009-08-17 Thread Lars Aronsson
used too Mediawiki. ;oD And therefore, you would not try to improve OpenLibrary, but rather start an entirely new project based on MediaWiki? I'm afraid that this (not invented here) is a common sentiment, and a major reason that we will get nowhere. -- Lars Aronsson (l...@aronsson.se

Re: [Wikisource-l] Open Library, Wikisource, and cleaning and translating OCR of Classics

2009-08-11 Thread Lars Aronsson
that material? On Wikisource. What's stopping them? -- Lars Aronsson (l...@aronsson.se) Aronsson Datateknik - http://aronsson.se ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo

Re: [Wikisource-l] [Wikipedia-l] Wikisource

2008-10-19 Thread Lars Aronsson
for that institution might be to cooperate with the successful Google or Gallica. So why is Wikisource superior? This is what we need to explain. * develop arguments for museums etc... Exactly. -- Lars Aronsson ([EMAIL PROTECTED]) Aronsson Datateknik - http://aronsson.se

Re: [Wikisource-l] help needed searching for pagescans and front covers

2008-08-10 Thread Lars Aronsson
. -- Lars Aronsson ([EMAIL PROTECTED]) Aronsson Datateknik - http://aronsson.se ___ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l