hich was my personal hobby project.
Now it stopped working, and it will not be continued.
--
Lars Aronsson (l...@aronsson.se)
Linköping
___
Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org
To unsubscribe send an email to wikisource-l-le...@lists.wikimedia.org
to
https://commons.wikimedia.org/wiki/Category:ProofreadPage_Statistics
but I don't seem to be able to do that any more.
--
Lars Aronsson (l...@aronsson.se)
Linköping
___
Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org
To unsubs
g/wiki/Extension:Proofread_Page
https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Tool_sweep/Lists/5#phetools
https://commons.wikimedia.org/wiki/Category:ProofreadPage_Statistics
*
*
--
Lars Aronsson (l...@aronsson.se)
Linköping
___
Wikisource-l mailin
with duplicate material?
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org/
___
Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org
To unsubscribe send an email to wikisource-l-le
re-trained languages?
As proofreading progresses, year after year, we should be able
to retrain the OCR software and improve its performance.
But I don't hear about any such progress.
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://ru
of scanned books?
When I ask researchers in image processing / computer vision, they
say that plain text (book) OCR "is a solved problem" that nobody
researches, and all research goes into self-driving cars reading
street signs. Is this true, or are there any exceptions?
--
Lars A
Each year in April, I take a snapshot of Phe's statistics graphs
for ProofreadPage, and upload them to Commons, in
https://commons.wikimedia.org/wiki/Category:ProofreadPage_Statistics
This year's graphs are named Wikisource_20210405_*.svg
(In previous years, the format was png.)
--
Lars
ed up, if it were
started today, instead of developing its own extension.)
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org/
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.w
ould be reasonable for
the WMF to fund a developer (or team of developers) to create
such a solution. There is already some solution for marking
parts of a picture, right? This needs to work within pages of
a PDF or Djvu file.
--
Lars Aronsson (l...@aronsson.se)
Linköping, Sweden
Project Rune
quickly without asking anyone. It would have been criticized,
but now it is criticized anyway after very long and slow process, so no
gain.
--
Lars Aronsson (l...@aronsson.se)
Linköping, Sweden
Project Runeberg - free Nordic literature - http://runeberg.org/
___
extra day. Is that realistic? If so, which day?
(Skokloster castle is another destination, close to Uppsala, so
a visit could be included in the same trip.)
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://ru
a Finnish version can be
created? I can provide quite a lot of training data
in the form of scanned books and proofread text.
Is there an active mailing list or web forum for
Fraktur issues with Tesseract?
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http
(and a normal dictionary) the only useful tool?
Would you count the number of spelling errors, or the ratio
of errors to correct words? Has anyone done this?
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org
tributors to these projects.
If anybody else wants to be a co-admin with the
ability to post to these pages, add me as a friend
on Facebook and then I can add you as co-admin.
https://www.facebook.com/lars.aronsson.355
--
Lars Aronsson (l...@aronsson.se, user:LA2)
Linköpin
://tools.wmflabs.org/phetools/stats.html
--
Lars Aronsson (l...@aronsson.se)
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
he input from the actual users.
--
Lars Aronsson (l...@aronsson.se)
Linköping
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
iki
https://wikisource.org/wiki/Wikisource:DoubleWiki_Extension
Is anybody using this feature in a serious way? Does it have
any more details that can make the matching better? If both
texts had numbered paragraphs and sentences (something like the
Bible), it would in theory be possible to match t
.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
serious analysis.
How do we proofread so many pages in any reasonable
time? We don't have enough volunteers for that.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
___
Wikisource-l mailing list
Wikisource-l
pagination, it makes sense
to make each volume/part one Djvu/Pdf file and Index page.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org
://runeberg.org/elfsyssel/0331.html
but even if you select full resolution there,
you only get the image from the PDF, and
not the good picture from Flickr.
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org
to capture rare books that
they don't allow you to bring home. It's not a
digital camera array, though, but a linear scanner
that sweeps across the page, as the videos show.
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org
by Shakespeare (e.g. Hamlet) would also
be a good test case, since it has many translations
into each language.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
coordinates.
This is a nightmare that we avoid by throwing away
all the coordinates and just proofreading the plain text.
It is not the perfect system, it's a compromise, in
order to get some useful work done.
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http
that this team
brings to society?
If we were already paying salaries to proofreaders,
then we could save a lot of money by producing
better OCR text (with formatting). But we have no
such existing expenditure to reduce.
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature
that we should link to?
--
Lars Aronsson (l...@aronsson.se)
Wikimedia Sverige - stöd fri kunskap - http://wikimedia.se/
Project Runeberg - free Nordic literature - http://runeberg.org/
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
, since
all page links in the margin of the transcluded chapter are marked
like this:... id=pag137span id=pagename147...
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org/
___
Wikisource-l mailing
: Where does it leave
Project Runeberg? Would it not be needed anymore? For
the time being, it is needed for all those texts where
copyright is a bit unclear, and that we dare to put online,
but that Wikimedia Commons doesn't accept.)
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free
uploaded the work to
http://runeberg.org/glossnor/
with the OCR text provided.
It's now ready for your proofreading.
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org/
___
Wikisource-l mailing list
random noise patterns.
Med Adobe Reader i W7, inga problem...
It works fine with Evince 3.6.0 in Ubuntu 12.10.
My problems were with Ubuntu 12.04.
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org
On 11/08/2012 01:53 PM, Ole Palnatoke Andersen wrote:
The book has been digitized now. I can see it at
http://www.kb.dk/e-mat/dod/130019427200.pdf
You may or may not be able to see it.
Jag kan ladda hem den, men i evince (Linux) ser
sidorna ut som random noise patterns.
--
Lars Aronsson
per book
plus 3 SEK per page, or 400 SEK ($65, €45) for a
typical book of 300 pages.
The limitation to Danish IP addresses seems ridiculous
to me. I have asked for the reason. I hope someone in
Denmark can mass upload these PDF files to Wikimedia
Commons.
--
Lars Aronsson (l...@aronsson.se
/Digitizing_books_with_MediaWiki
I think you need to do something similar. Most of us
can't read Hebrew (or Swedish), and won't fully
understand any example given in such a small language.
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org
that if you think Wikidata can solve your
problem, you will be trapped waiting for that to happen, while
years pass by that you could have used better.
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org
.
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org/
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
?
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org/
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
works are more interesting than text?
This would be a GLAM + wiki cooperation that cuts across
national borders.
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org/
___
Wikisource-l mailing list
to interconnect with and use the Facebook page
http://www.facebook.com/Wikisource
I created it and it now has 133 fans, but I've made lots of
administrators, so anybody should be able to take over.
It also has a Facebook timeline for the history of Wikisource.
--
Lars Aronsson (l...@aronsson.se
do a poor job.
Wikisource can indeed offer the strength of manual, volunteer
proofreaders in many different languages.
--
Lars Aronsson (l...@aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org/
___
Wikisource-l mailing
of phe's graphs to Commons,
where we can go for a nostalgic look at the once existing
statistics graph system that phe built, and that worked fine
until the German toolserver gang decided to ruin it,
http://commons.wikimedia.org/wiki/Category:ProofreadPage_Statistics
--
Lars Aronsson (l
/index.php?title=%D0%A1%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0%3AGeo_stat_rus_imp_4.djvu%2F863diff=702229oldid=701373
The large print is the text of an encyclopedic article.
The small print is the list of literature references.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http
has left the
project. So is anybody maintaining the ProofreadPage extension now?
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org
the Swedish-Latin dictionary.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
Project Runeberg - free Nordic literature - http://runeberg.org/
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https
should do systematically? There are
24 volumes in the Swedish Wikisource alone, that are based
on scans from Google. (So I guess there could be hundreds
in the English Wikisource.)
Did anybody try this already? Did you run into any problems?
--
Lars Aronsson (l...@aronsson.se)
Aronsson
. Different courts in different countries in
different times might decide differently. The important thing
is that we should discourage book scanners from claiming copyright.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
and the output is a dictionary.
It's like a more advanced diff tool.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman
scrolling through that sequence
can be a way to overcome the delay between fast mechanical
scanning and slow manual proofreading.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
___
Wikisource-l mailing list
Wikisource-l
with the MySQL version,
because when an update comes it just works. This is how the
ProofreadPage system also needs to work, so small languages can
focus on proofreading books instead of software upgrades.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
important? Could you give an example of a website
that does this and actually benefits from it?
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
Project Runeberg - free Nordic literature - http://runeberg.org
) with
OpenStreetMap.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
.
But my suggestion is that we start to compile a catalog
of such problems, rather than submitting bug reports.
Where is a good place to start?
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
___
Wikisource-l mailing
there, e.g.
http://sv.wikisource.org/wiki/Sida:Post-_och_Inrikes_Tidningar_1836-01-11_3.jpg
Some might say I should just combine the images
into a single Djvu file, or maybe one Djvu
volume for each month or year of the newspaper,
but each of these JPG files is 11 megabytes.
--
Lars Aronsson (l
searching for a word and finding it in the
right position of the image.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
Wikimedia Sverige - stöd fri kunskap - http://wikimedia.se/
___
Wikisource-l mailing list
://en.wikisource.org/wiki/Category:OCR_Requests
http://fr.wikisource.org/wiki/Cat%C3%A9gorie:Demandes_d%27OCR
http://pt.wikisource.org/wiki/Categoria:!Pedidos_de_OCR
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
___
Wikisource-l
Has anybody scanned a large-format newspaper in Wikisource?
How does proofreading (page extension) work if you have
6 or 8 columns of a broadsheet? What possible methods
are there to anchor a part of the OCR text to a part of
the image? Are there any rolemodel websites out there?
--
Lars
community. The
software should support the community, not force it to accept the
developer's opinion.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https
in OpenLibrary to Wikipedia biographies is just one
way where we can do a lot, without needing to start a new project.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
cameras. Let two teams compete against each
other. Write a report for next year's Wikimania. Have great fun!
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
Project Runeberg - free Nordic literature - http://runeberg.org/
Wikimedia Sverige - stöd fri kunskap
specialized book can be very useful
for a limited Wikiproject. This book was published in 1909 for the
100th anniversary of the Finnish War (1808-1809), and digitized in
2008 for the 200th anniversary.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
Project
of their catalogs has been a bottleneck
for OpenLibrary.
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo
used too Mediawiki. ;oD
And therefore, you would not try to improve OpenLibrary, but
rather start an entirely new project based on MediaWiki? I'm
afraid that this (not invented here) is a common sentiment, and
a major reason that we will get nowhere.
--
Lars Aronsson (l...@aronsson.se
that material?
On Wikisource. What's stopping them?
--
Lars Aronsson (l...@aronsson.se)
Aronsson Datateknik - http://aronsson.se
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo
for that institution might be to
cooperate with the successful Google or Gallica. So why is
Wikisource superior? This is what we need to explain.
* develop arguments for museums etc...
Exactly.
--
Lars Aronsson ([EMAIL PROTECTED])
Aronsson Datateknik - http://aronsson.se
.
--
Lars Aronsson ([EMAIL PROTECTED])
Aronsson Datateknik - http://aronsson.se
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l
65 matches
Mail list logo