Re: [Wikisource-l] Budget for Wikisource

2017-06-30 Thread Alex Brollo
Opppss... I *presume* that _djvu.xml is bugged, really I only examined whole text file (deved, I think, from _djvu.xml file). I'll take a deeper look, examining too searchable PDF. Alex 2017-06-30 12:20 GMT+02:00 Alex Brollo : > Take a look to this case:

Re: [Wikisource-l] Budget for Wikisource

2017-06-30 Thread Alex Brollo
Take a look to this case: https://archive.org/details/GiacomoRacioppiLAgiografiaDiSanLaverioDel1162Images Here OCR (as you can see from _djvu.xml file) seems severely bugged, and obviously djvu file built by IA Upload tool can't be better than source. Please Aubrey go on notifying me any case of

Re: [Wikisource-l] Budget for Wikisource

2017-06-30 Thread Andrea Zanni
Unfortunately, sometimes, and apparently it's not related to the Google cover page (at least, I removed a page in a book and it doesn't have the problem. Another book indeed is disaligned, without removing the cover). Look this:

Re: [Wikisource-l] Budget for Wikisource

2017-06-30 Thread Sam Wilson
This is indeed a bug! I can't replicate it though. Does it happen for every book for you? Or only sometimes? Do you know what is different about the ones that fail? Is it related to removing (or not) the Google cover page? I can find time this weekend I think, to work on this. On Fri, 30 Jun

Re: [Wikisource-l] Budget for Wikisource

2017-06-30 Thread Andrea Zanni
Hello everyone, before talking again about this let me say that I think we have a "major" bug in the IA-upload: sometimes, the OCR is not aligned between the pages, meaning you have the right OCR but it's shown for the following page... Aubrey On Thu, May 11, 2017 at 1:30 AM, Sam Wilson

Re: [Wikisource-l] Budget for Wikisource

2017-05-10 Thread Sam Wilson
This is very cool news. :) One possibly not-too-onerous feature would be to permit upload of other file types other than DjVu (e.g. PDF). Or there's the whole topic of creating/finding Wikidata items for the books uploaded, and updating them with the IA identifier. That'd probably require the

Re: [Wikisource-l] Budget for Wikisource

2017-05-10 Thread Andrea Zanni
It may be. Not sure how Sam and Tpt solved that issue. Aubrey On Wed, May 10, 2017 at 6:01 PM, Philippe Elie wrote: > On Wed, 10 May 2017 at 18:00 +0200, Andrea Zanni wrote: > > > > > > > There isn't also a trend when converting from jp2 --> pdf to produce > > > too big djvu?

Re: [Wikisource-l] Budget for Wikisource

2017-05-10 Thread Philippe Elie
On Wed, 10 May 2017 at 18:00 +0200, Andrea Zanni wrote: > > > > There isn't also a trend when converting from jp2 --> pdf to produce > > too big djvu? > > > > May you please explain it better? I don't understand. > Aren't djvu produced often too big? -- Phe

Re: [Wikisource-l] Budget for Wikisource

2017-05-10 Thread Philippe Elie
On Wed, 10 May 2017 at 17:14 +0200, Andrea Zanni wrote: > You can check in the queue that a lot of processes just freeze: > es. https://tools.wmflabs.org/ia-upload/log/bullettinodella04italgoog > > Also, there is an issue with HTML tags: sometime they are present in the IA > description, > and

Re: [Wikisource-l] Budget for Wikisource

2017-05-10 Thread Andrea Zanni
You can check in the queue that a lot of processes just freeze: es. https://tools.wmflabs.org/ia-upload/log/bullettinodella04italgoog Also, there is an issue with HTML tags: sometime they are present in the IA description, and this means they are copied also in the Commons Book template during

Re: [Wikisource-l] Budget for Wikisource

2017-05-10 Thread Philippe Elie
On Wed, 10 May 2017 at 15:38 +0200, Andrea Zanni wrote: > Dear all, > Wikimedia Italia put in its budget 3000€ for Wikisource-related work. > When we discussed this, months ago, we thought about paying a developer for > the DJVU issue of the IA-Upload tool, > which then has been resolved by our

[Wikisource-l] Budget for Wikisource

2017-05-10 Thread Andrea Zanni
Dear all, Wikimedia Italia put in its budget 3000€ for Wikisource-related work. When we discussed this, months ago, we thought about paying a developer for the DJVU issue of the IA-Upload tool, which then has been resolved by our beloved Sam Wilson. The tool is still not perfect (I often get