Re: [Wikitech-l] Status of the new PDF Renderer
On Thu, May 29, 2014 at 6:06 PM, Matthew Walker mwal...@wikimedia.org wrote: I should have also noted -- there is something strange going on with the frontend to Special:Collection. You have to manually refresh to see status updates... Reported 10 days ago in test envs: https://bugzilla.wikimedia.org/show_bug.cgi?id=65562 ~Matt Walker Wikimedia Foundation Fundraising Technology Team On Thu, May 29, 2014 at 5:56 PM, Matthew Walker mwal...@wikimedia.org wrote: I'm happy to report that after a LONG time fighting with deployment the test instance is available in beta labs (en.wikipedia.beta.wmflabs.org and all others) via the WMF PDF option in Special:Collection and on the side panel. It is very rough still in terms of reliable rendering (it doesn't like to clean up after itself) -- but now that I have deployment sorted and it stably running that's my next task. Play away :D ~Matt Walker Wikimedia Foundation Fundraising Technology Team On Thu, May 29, 2014 at 7:02 AM, Andre Klapper aklap...@wikimedia.org wrote: Hi, On Mon, 2014-05-19 at 11:57 -0700, C. Scott Ananian wrote: That's a good question! I'm in SFO this week, so it's probably worth setting aside a day to resync and figure out what the next steps for the new PDF renderer are. Any news (or a public test instance available)? As I wrote, I'd be interested in having a bugday on testing the new PDF renderer by going through / retesting https://bugzilla.wikimedia.org/buglist.cgi?resolution=---component=Collection Thanks, andre -- Andre Klapper | Wikimedia Bugwrangler http://blogs.gnome.org/aklapper/ ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
Hi, On Mon, 2014-05-19 at 11:57 -0700, C. Scott Ananian wrote: That's a good question! I'm in SFO this week, so it's probably worth setting aside a day to resync and figure out what the next steps for the new PDF renderer are. Any news (or a public test instance available)? As I wrote, I'd be interested in having a bugday on testing the new PDF renderer by going through / retesting https://bugzilla.wikimedia.org/buglist.cgi?resolution=---component=Collection Thanks, andre -- Andre Klapper | Wikimedia Bugwrangler http://blogs.gnome.org/aklapper/ ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
I'm happy to report that after a LONG time fighting with deployment the test instance is available in beta labs (en.wikipedia.beta.wmflabs.org and all others) via the WMF PDF option in Special:Collection and on the side panel. It is very rough still in terms of reliable rendering (it doesn't like to clean up after itself) -- but now that I have deployment sorted and it stably running that's my next task. Play away :D ~Matt Walker Wikimedia Foundation Fundraising Technology Team On Thu, May 29, 2014 at 7:02 AM, Andre Klapper aklap...@wikimedia.org wrote: Hi, On Mon, 2014-05-19 at 11:57 -0700, C. Scott Ananian wrote: That's a good question! I'm in SFO this week, so it's probably worth setting aside a day to resync and figure out what the next steps for the new PDF renderer are. Any news (or a public test instance available)? As I wrote, I'd be interested in having a bugday on testing the new PDF renderer by going through / retesting https://bugzilla.wikimedia.org/buglist.cgi?resolution=---component=Collection Thanks, andre -- Andre Klapper | Wikimedia Bugwrangler http://blogs.gnome.org/aklapper/ ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
I should have also noted -- there is something strange going on with the frontend to Special:Collection. You have to manually refresh to see status updates... ~Matt Walker Wikimedia Foundation Fundraising Technology Team On Thu, May 29, 2014 at 5:56 PM, Matthew Walker mwal...@wikimedia.org wrote: I'm happy to report that after a LONG time fighting with deployment the test instance is available in beta labs (en.wikipedia.beta.wmflabs.org and all others) via the WMF PDF option in Special:Collection and on the side panel. It is very rough still in terms of reliable rendering (it doesn't like to clean up after itself) -- but now that I have deployment sorted and it stably running that's my next task. Play away :D ~Matt Walker Wikimedia Foundation Fundraising Technology Team On Thu, May 29, 2014 at 7:02 AM, Andre Klapper aklap...@wikimedia.org wrote: Hi, On Mon, 2014-05-19 at 11:57 -0700, C. Scott Ananian wrote: That's a good question! I'm in SFO this week, so it's probably worth setting aside a day to resync and figure out what the next steps for the new PDF renderer are. Any news (or a public test instance available)? As I wrote, I'd be interested in having a bugday on testing the new PDF renderer by going through / retesting https://bugzilla.wikimedia.org/buglist.cgi?resolution=---component=Collection Thanks, andre -- Andre Klapper | Wikimedia Bugwrangler http://blogs.gnome.org/aklapper/ ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
Hi, On 01/18/2014 03:42 AM, Matthew Walker wrote: We've just finished our second sprint on the new PDF renderer. A significant chunk of renderer development time this cycle was on non latin script support, as well as puppetization and packaging for deployment. We have a work in progress pipeline up and running in labs which I encourage everyone to go try and break. Seeing breakage of PDF downloads on pl.wikisource reported in https://bugzilla.wikimedia.org/show_bug.cgi?id=65298 I got curious what the status of the new PDF renderer is. https://www.mediawiki.org/wiki/PDF_rendering#Status only links to the quoted email from January 2014. Has anything happened in the last four months that is worth to be added as a status update? For the records, Nemo asked on the Talk page for a test instance and I support the idea of having a bugday on PDF rendering once public testing infrastructure for the new PDF renderer is available. Open tickets to potentially re-test: https://bugzilla.wikimedia.org/buglist.cgi?resolution=---component=Collection andre -- Andre Klapper | Wikimedia Bugwrangler http://blogs.gnome.org/aklapper/ ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
That's a good question! I'm in SFO this week, so it's probably worth setting aside a day to resync and figure out what the next steps for the new PDF renderer are. --scott ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
On 01/17/2014 09:42 PM, Matthew Walker wrote: All, We've just finished our second sprint on the new PDF renderer. A significant chunk of renderer development time this cycle was on non latin script support, as well as puppetization and packaging for deployment. We have a work in progress pipeline up and running in labs which I encourage everyone to go try and break. You can use the following featured articles just to see what our current output is: * http://ocg-collection-alpha.wmflabs.org/index.php/Alexis_Bachelot * http://ocg-collection-alpha.wmflabs.org/index.php/Atlantis:_The_Lost_Empire Some other articles imported on that test wiki: * http://ur1.ca/gg0bw Please note that some of these will fail due to known issues noted below. You can render any page in the new renderer by clicking the sidebar link Download as WMF PDF; if you Download as PDF you'll be using the old renderer (useful for comparison.) Additionally, you can create full books via Special:Book -- our renderer is RDF to Latex (PDF) and the old renderer is e-book (PDF). You can also try out the RDF to Text (TXT) renderer, but that's not on the critical path. As of right now we do not have a bugzilla project entry so reply to this email, or email me directly -- we'll need one of: the name of the page, the name of the collection, or the collection_id parameter from the URL to debug. There are some code bits that we know are still missing that we will have to address in the coming weeks or in another sprint. * Attribution for images and text. The APIs are done, but we still need to massage that information into the document. * Message translation -- right now all internal messages are in English which is not so helpful to non English speakers. * Things using the cite tag and the Cite extension are not currently supported (meaning you won't get nice references.) * Tables may not render at all, or may break the renderer. * Caching needs to be greatly improved. Looking longer term into deployment on wiki, my plans right now are to get this into beta labs for general testing and connect test.wikipedia.org up to our QA hardware for load testing. The major blocker there is acceptance of the Node.JS 0.10, and TexLive 2012 packages into reprap, our internal aptitude package source. This is not quite as easy as it sounds, we already use TexLive 2009 in production for the Math extension and we must apply thorough tests to ensure we do not introduce any regressions when we update to the 2012 package. I'm not sure what actual dates for those migrations / testing will be because it greatly depends on when Ops has time. In the meantime, our existing PDF cluster based on mwlib will continue to serve our offline needs. Once our solution is deployed and tested, mwlib (pdf[1-3]) will be retired here at the WMF and print on demand services will be provided directly by PediaPress servers. For the technically curious; we're approximately following the parsoid deployment model -- using trebuchet to push out a source repository (services/ocg-collection) that has the configuration and node dependencies built on tin along with git submodules containing the actual service code. It may not look like it on the surface, but we've come a long way and it wouldn't have been possible without the (probably exasperated) help from Jeff Green, Faidon, and Ori. Also big thanks to Brad and Max for their work, and Gabriel for some head thunking. C. Scott and I are not quite off the hook yet, as indicated by the list above, but hopefully soon enough we'll be enjoying the cake and cookies from another new product launch. (And yes, even if you're remote if I promised you cookies as bribes I'll ship them to you :p) ~Matt Walker ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l Hey there! We just got a #mediawiki question about Collections and so I was wondering what we can tell third-party MediaWiki administrators about the new renderer work? Thanks! -- Sumana Harihareswara Senior Technical Writer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
Hi, On 2014-01-23 19:38, Matthew Walker wrote: If you want to set this up locally; I can help with that if you jump on IRC #mediawiki-pdfhack on freenode. I'm mwalker. Thank you a lot for helping me installing the stack. Although it is an early stage of the project, it is working quite well. We were looking for such a solution for months now. This is the first one doing the job we need. Yesterday I showed the output to my employer. He was very delighted about the result. So, many commendations from our company. Cheers, Marco ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
Dear All, Is it possible at the current moment to test the new PDF Renderer online for RTL languages? And is it possible to adjust the page layout? I see that the default is the two column layout. Thanks, Kind Regards, Aya Saif El-yazal Mahfouz On Sat, Jan 25, 2014 at 9:56 AM, Liangent liang...@gmail.com wrote: I didn't look at the new renderer carefully, but I guess it's a Parsoid-based one. Hope that the language conversion syntax issue in PDF output can be resolved together with Parsoid in the future, which blocks the deployment of PDF output on zhwiki currently. See https://bugzilla.wikimedia.org/show_bug.cgi?id=34919 . -Liangent On Fri, Jan 24, 2014 at 2:38 AM, Matthew Walker mwal...@wikimedia.org wrote: Marco, Is it also possible to set this up behind a firewall? Yes; with the caveat that your wiki must be running Parsoid. It is also theoretically possible to still use Print on Demand services behind a firewall as we can POST a zip bundle to them -- likely however you'd just disable that functionality and I'm not sure our new bundle format is entirely compatible with the old bundle format... If you want to set this up locally; I can help with that if you jump on IRC #mediawiki-pdfhack on freenode. I'm mwalker. ~Matt Walker Wikimedia Foundation Fundraising Technology Team ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
Hoi Liangent, Does [1] this answer your question ? It is a page they use for testing. Thanks, Gerard http://zh.wikipedia.org/wiki/納粹德國海軍 On 25 January 2014 08:56, Liangent liang...@gmail.com wrote: I didn't look at the new renderer carefully, but I guess it's a Parsoid-based one. Hope that the language conversion syntax issue in PDF output can be resolved together with Parsoid in the future, which blocks the deployment of PDF output on zhwiki currently. See https://bugzilla.wikimedia.org/show_bug.cgi?id=34919 . -Liangent On Fri, Jan 24, 2014 at 2:38 AM, Matthew Walker mwal...@wikimedia.org wrote: Marco, Is it also possible to set this up behind a firewall? Yes; with the caveat that your wiki must be running Parsoid. It is also theoretically possible to still use Print on Demand services behind a firewall as we can POST a zip bundle to them -- likely however you'd just disable that functionality and I'm not sure our new bundle format is entirely compatible with the old bundle format... If you want to set this up locally; I can help with that if you jump on IRC #mediawiki-pdfhack on freenode. I'm mwalker. ~Matt Walker Wikimedia Foundation Fundraising Technology Team ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
On Sat, Jan 25, 2014 at 6:13 PM, Gerard Meijssen gerard.meijs...@gmail.comwrote: Hoi Liangent, Does [1] this answer your question ? It is a page they use for testing. Thanks, Gerard http://zh.wikipedia.org/wiki/納粹德國海軍 It's mentioned as a test case but where's the output (expected and actual) of that article? -Liangent On 25 January 2014 08:56, Liangent liang...@gmail.com wrote: I didn't look at the new renderer carefully, but I guess it's a Parsoid-based one. Hope that the language conversion syntax issue in PDF output can be resolved together with Parsoid in the future, which blocks the deployment of PDF output on zhwiki currently. See https://bugzilla.wikimedia.org/show_bug.cgi?id=34919 . -Liangent On Fri, Jan 24, 2014 at 2:38 AM, Matthew Walker mwal...@wikimedia.org wrote: Marco, Is it also possible to set this up behind a firewall? Yes; with the caveat that your wiki must be running Parsoid. It is also theoretically possible to still use Print on Demand services behind a firewall as we can POST a zip bundle to them -- likely however you'd just disable that functionality and I'm not sure our new bundle format is entirely compatible with the old bundle format... If you want to set this up locally; I can help with that if you jump on IRC #mediawiki-pdfhack on freenode. I'm mwalker. ~Matt Walker Wikimedia Foundation Fundraising Technology Team ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
On Jan 25, 2014 9:55 AM, Liangent liang...@gmail.com wrote: It's mentioned as a test case but where's the output (expected and actual) of that article? You may find some answers in the initial mail that started this thread. -Jeremy ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
Yes, zhwiki is still an issue because of LanguageConverter. I will be fixing that issue in both Parsoid and the PDF renderer. (As soon as I fix some long-standing bugs in image handling for Parsoid/VE.) It's a bit tough to test the renderer on-line at the moment, because you have to import your own non-English content into the test wiki. I recommend trying things out off-line if possible. I can also email/post sample articles if you like. RTL languages should be well-supported; I spent about a week getting the details of the bidirectional algorithm correct. (And of course we inherit nice ligatures, etc, for Arabic from the XeTeX engine.) --scott ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
Hello C.Scott, Could you kindly try importing the following articles from the Arabic Wikipedia and then sending me the resultant pdf files? https://ar.wikipedia.org/wiki/كأس_العالم_لكرة_القدم https://ar.wikipedia.org/wiki/إسلام https://ar.wikipedia.org/wiki/مكلارين_پ1 https://ar.wikipedia.org/wiki/ليفيhttps://ar.wikipedia.org/wiki/ليفي_أشكول _أشكول https://ar.wikipedia.org/wiki/ليفي_أشكول https://ar.wikipedia.org/wiki/جامعة_الدول_العربيةhttps://ar.wikipedia.org/wiki/ليفي_أشكول https://ar.wikipedia.org/wiki/جمهورية أيرلندا I might ask you in the future to add to your test cases some articles in Farsi and Hebrew too. If this will be a burden, then feel free to simply send me what you have already got. On my side, I will try to setup the extension in the near future. Thank you for your efforts, Kind Regards, Aya Saif El-yazal Mahfouz On Sat, Jan 25, 2014 at 6:09 PM, C. Scott Ananian canan...@wikimedia.orgwrote: Yes, zhwiki is still an issue because of LanguageConverter. I will be fixing that issue in both Parsoid and the PDF renderer. (As soon as I fix some long-standing bugs in image handling for Parsoid/VE.) It's a bit tough to test the renderer on-line at the moment, because you have to import your own non-English content into the test wiki. I recommend trying things out off-line if possible. I can also email/post sample articles if you like. RTL languages should be well-supported; I spent about a week getting the details of the bidirectional algorithm correct. (And of course we inherit nice ligatures, etc, for Arabic from the XeTeX engine.) --scott ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
I didn't look at the new renderer carefully, but I guess it's a Parsoid-based one. Hope that the language conversion syntax issue in PDF output can be resolved together with Parsoid in the future, which blocks the deployment of PDF output on zhwiki currently. See https://bugzilla.wikimedia.org/show_bug.cgi?id=34919 . -Liangent On Fri, Jan 24, 2014 at 2:38 AM, Matthew Walker mwal...@wikimedia.orgwrote: Marco, Is it also possible to set this up behind a firewall? Yes; with the caveat that your wiki must be running Parsoid. It is also theoretically possible to still use Print on Demand services behind a firewall as we can POST a zip bundle to them -- likely however you'd just disable that functionality and I'm not sure our new bundle format is entirely compatible with the old bundle format... If you want to set this up locally; I can help with that if you jump on IRC #mediawiki-pdfhack on freenode. I'm mwalker. ~Matt Walker Wikimedia Foundation Fundraising Technology Team ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
On Sat, Jan 18, 2014 at 3:42 AM, Matthew Walker mwal...@wikimedia.orgwrote: We've just finished our second sprint on the new PDF renderer. A Google code-in student wrote some tests[1][2] for testing existing export to PDF functionality. I did not have the time to review the last few patch sets and merge them into master branch. Let me know if anybody is interested in writing more tests. Željko -- 1: https://gerrit.wikimedia.org/r/#/c/98160 2: https://gerrit.wikimedia.org/r/#/c/105179 ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
Marco, Is it also possible to set this up behind a firewall? Yes; with the caveat that your wiki must be running Parsoid. It is also theoretically possible to still use Print on Demand services behind a firewall as we can POST a zip bundle to them -- likely however you'd just disable that functionality and I'm not sure our new bundle format is entirely compatible with the old bundle format... If you want to set this up locally; I can help with that if you jump on IRC #mediawiki-pdfhack on freenode. I'm mwalker. ~Matt Walker Wikimedia Foundation Fundraising Technology Team ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
Amir, Gerard: The easiest way to test locally at the moment is to use the standalone 'mw-ocg-bundler' and 'mw-ocg-latexer' node packages. There are good installation instructions in the READMEs, see: https://npmjs.org/package/mw-ocg-bundler https://npmjs.org/package/mw-ocg-latexer and let me know if I need to document anything better. This will let you pull individual articles from an arbitrary wiki, and then typeset them with xelatex. There is currently good support for quite a number of languages. My standard test case contains: http://ar.wikipedia.org/wiki/ليونيل_ميسي http://ar.wikipedia.org/wiki/بشير_الثاني_الشهابي http://ar.wikipedia.org/wiki/حمزة_بن_عبد_المطلب http://ar.wikipedia.org/wiki/إسطنبول http://ar.wikipedia.org/wiki/الحرب_الإنجليزية_الزنجبارية http://de.wikipedia.org/wiki/Papier http://en.wikipedia.org/wiki/Durian http://es.wikipedia.org/wiki/Latas_de_sopa_Campbell http://fa.wikipedia.org/wiki/کعبه_زرتشت http://fr.wikipedia.org/wiki/Trachylepis_atlantica http://he.wikipedia.org/wiki/ספרטה http://hi.wikipedia.org/wiki/रामायण http://it.wikipedia.org/wiki/La_vita_è_meravigliosa http://ja.wikipedia.org/wiki/熊野三山本願所 http://ja.wikipedia.org/wiki/金星の日面通過 http://ko.wikipedia.org/wiki/조화진동자 http://ml.wikipedia.org/wiki/മലയാളം http://pl.wikipedia.org/wiki/Efekt_potwierdzenia http://pt.wikipedia.org/wiki/Scaphyglottis http://ru.wikipedia.org/wiki/Битва_при_Платеях http://simple.wikipedia.org/wiki/Taoism http://vi.wikipedia.org/wiki/Vệ_tinh_tự_nhiên_của_Sao_Thiên_Vương http://zh.wikipedia.org/wiki/納粹德國海軍 and a few other English articles. That said, I don't read most of these languages, so I've mostly been trying to ensure that our output matches the HTML displayed by the wiki. It is quite possible I've chosen bad-looking fonts, or that there are other details that could be improved. (For example, the way that Vietnamese stacked accents was bad for a while; I've fixed that now.) Comments eagerly requested! --scott ps. there are a number of minor issues with citations in RTL languages, even in our standard HTML rendering on the wikis; it appears that our citation templates should be more aggressive about adding bdi tags or lang attributes to ensure that citations of LTR sources in an RTL article are displayed as nicely as possible. If these fixes are made to the source, the latex output should inherit them. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
1. Can this be set up for testing locally? Where is the new software? I'm not sure that I see it in the master version of Collection in Gerrit. 2. Are the wikis with a non-English content language where this can be tested? -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore 2014/1/18 Matthew Walker mwal...@wikimedia.org All, We've just finished our second sprint on the new PDF renderer. A significant chunk of renderer development time this cycle was on non latin script support, as well as puppetization and packaging for deployment. We have a work in progress pipeline up and running in labs which I encourage everyone to go try and break. You can use the following featured articles just to see what our current output is: * http://ocg-collection-alpha.wmflabs.org/index.php/Alexis_Bachelot * http://ocg-collection-alpha.wmflabs.org/index.php/Atlantis:_The_Lost_Empire Some other articles imported on that test wiki: * http://ur1.ca/gg0bw Please note that some of these will fail due to known issues noted below. You can render any page in the new renderer by clicking the sidebar link Download as WMF PDF; if you Download as PDF you'll be using the old renderer (useful for comparison.) Additionally, you can create full books via Special:Book -- our renderer is RDF to Latex (PDF) and the old renderer is e-book (PDF). You can also try out the RDF to Text (TXT) renderer, but that's not on the critical path. As of right now we do not have a bugzilla project entry so reply to this email, or email me directly -- we'll need one of: the name of the page, the name of the collection, or the collection_id parameter from the URL to debug. There are some code bits that we know are still missing that we will have to address in the coming weeks or in another sprint. * Attribution for images and text. The APIs are done, but we still need to massage that information into the document. * Message translation -- right now all internal messages are in English which is not so helpful to non English speakers. * Things using the cite tag and the Cite extension are not currently supported (meaning you won't get nice references.) * Tables may not render at all, or may break the renderer. * Caching needs to be greatly improved. Looking longer term into deployment on wiki, my plans right now are to get this into beta labs for general testing and connect test.wikipedia.org up to our QA hardware for load testing. The major blocker there is acceptance of the Node.JS 0.10, and TexLive 2012 packages into reprap, our internal aptitude package source. This is not quite as easy as it sounds, we already use TexLive 2009 in production for the Math extension and we must apply thorough tests to ensure we do not introduce any regressions when we update to the 2012 package. I'm not sure what actual dates for those migrations / testing will be because it greatly depends on when Ops has time. In the meantime, our existing PDF cluster based on mwlib will continue to serve our offline needs. Once our solution is deployed and tested, mwlib (pdf[1-3]) will be retired here at the WMF and print on demand services will be provided directly by PediaPress servers. For the technically curious; we're approximately following the parsoid deployment model -- using trebuchet to push out a source repository (services/ocg-collection) that has the configuration and node dependencies built on tin along with git submodules containing the actual service code. It may not look like it on the surface, but we've come a long way and it wouldn't have been possible without the (probably exasperated) help from Jeff Green, Faidon, and Ori. Also big thanks to Brad and Max for their work, and Gabriel for some head thunking. C. Scott and I are not quite off the hook yet, as indicated by the list above, but hopefully soon enough we'll be enjoying the cake and cookies from another new product launch. (And yes, even if you're remote if I promised you cookies as bribes I'll ship them to you :p) ~Matt Walker ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
apologies: s/Are the/Are there/ -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore 2014/1/19 Amir E. Aharoni amir.ahar...@mail.huji.ac.il 1. Can this be set up for testing locally? Where is the new software? I'm not sure that I see it in the master version of Collection in Gerrit. 2. Are the wikis with a non-English content language where this can be tested? -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore 2014/1/18 Matthew Walker mwal...@wikimedia.org All, We've just finished our second sprint on the new PDF renderer. A significant chunk of renderer development time this cycle was on non latin script support, as well as puppetization and packaging for deployment. We have a work in progress pipeline up and running in labs which I encourage everyone to go try and break. You can use the following featured articles just to see what our current output is: * http://ocg-collection-alpha.wmflabs.org/index.php/Alexis_Bachelot * http://ocg-collection-alpha.wmflabs.org/index.php/Atlantis:_The_Lost_Empire Some other articles imported on that test wiki: * http://ur1.ca/gg0bw Please note that some of these will fail due to known issues noted below. You can render any page in the new renderer by clicking the sidebar link Download as WMF PDF; if you Download as PDF you'll be using the old renderer (useful for comparison.) Additionally, you can create full books via Special:Book -- our renderer is RDF to Latex (PDF) and the old renderer is e-book (PDF). You can also try out the RDF to Text (TXT) renderer, but that's not on the critical path. As of right now we do not have a bugzilla project entry so reply to this email, or email me directly -- we'll need one of: the name of the page, the name of the collection, or the collection_id parameter from the URL to debug. There are some code bits that we know are still missing that we will have to address in the coming weeks or in another sprint. * Attribution for images and text. The APIs are done, but we still need to massage that information into the document. * Message translation -- right now all internal messages are in English which is not so helpful to non English speakers. * Things using the cite tag and the Cite extension are not currently supported (meaning you won't get nice references.) * Tables may not render at all, or may break the renderer. * Caching needs to be greatly improved. Looking longer term into deployment on wiki, my plans right now are to get this into beta labs for general testing and connect test.wikipedia.org up to our QA hardware for load testing. The major blocker there is acceptance of the Node.JS 0.10, and TexLive 2012 packages into reprap, our internal aptitude package source. This is not quite as easy as it sounds, we already use TexLive 2009 in production for the Math extension and we must apply thorough tests to ensure we do not introduce any regressions when we update to the 2012 package. I'm not sure what actual dates for those migrations / testing will be because it greatly depends on when Ops has time. In the meantime, our existing PDF cluster based on mwlib will continue to serve our offline needs. Once our solution is deployed and tested, mwlib (pdf[1-3]) will be retired here at the WMF and print on demand services will be provided directly by PediaPress servers. For the technically curious; we're approximately following the parsoid deployment model -- using trebuchet to push out a source repository (services/ocg-collection) that has the configuration and node dependencies built on tin along with git submodules containing the actual service code. It may not look like it on the surface, but we've come a long way and it wouldn't have been possible without the (probably exasperated) help from Jeff Green, Faidon, and Ori. Also big thanks to Brad and Max for their work, and Gabriel for some head thunking. C. Scott and I are not quite off the hook yet, as indicated by the list above, but hopefully soon enough we'll be enjoying the cake and cookies from another new product launch. (And yes, even if you're remote if I promised you cookies as bribes I'll ship them to you :p) ~Matt Walker ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
Hi Matthew, greate work, thank you for sharing. In my company we need an extension like this. A few months ago, I was not successful to find a solution accepting UTF-8 encoded Unicode characters greater than 0x7F inside URLs. Here I couldn't find an article, which was not forwarded to another article. Therefore I just created the article [1] and tried if it is possible to render it as PDF. Though this is just a sprint's result, there are still some bugs, but mainly rendering was possible to render the article. That, compared to my experiences, is a really good result. Is it also possible to set this up behind a firewall? [1] http://ocg-collection-alpha.wmflabs.org/index.php/Test_German_Umlauts_%C3%A4%C3%B6%C3%BC%C3%84%C3%96%C3%9C%C3%9F%E2%86%92%E2%80%93%E2%80%9E%E2%80%9C Cheers, Marco On 01/18/2014 03:42 AM, Matthew Walker wrote: All, We've just finished our second sprint on the new PDF renderer. A significant chunk of renderer development time this cycle was on non latin script support, as well as puppetization and packaging for deployment. We have a work in progress pipeline up and running in labs which I encourage everyone to go try and break. You can use the following featured articles just to see what our current output is: * http://ocg-collection-alpha.wmflabs.org/index.php/Alexis_Bachelot * http://ocg-collection-alpha.wmflabs.org/index.php/Atlantis:_The_Lost_Empire Some other articles imported on that test wiki: * http://ur1.ca/gg0bw Please note that some of these will fail due to known issues noted below. You can render any page in the new renderer by clicking the sidebar link Download as WMF PDF; if you Download as PDF you'll be using the old renderer (useful for comparison.) Additionally, you can create full books via Special:Book -- our renderer is RDF to Latex (PDF) and the old renderer is e-book (PDF). You can also try out the RDF to Text (TXT) renderer, but that's not on the critical path. As of right now we do not have a bugzilla project entry so reply to this email, or email me directly -- we'll need one of: the name of the page, the name of the collection, or the collection_id parameter from the URL to debug. There are some code bits that we know are still missing that we will have to address in the coming weeks or in another sprint. * Attribution for images and text. The APIs are done, but we still need to massage that information into the document. * Message translation -- right now all internal messages are in English which is not so helpful to non English speakers. * Things using the cite tag and the Cite extension are not currently supported (meaning you won't get nice references.) * Tables may not render at all, or may break the renderer. * Caching needs to be greatly improved. Looking longer term into deployment on wiki, my plans right now are to get this into beta labs for general testing and connect test.wikipedia.org up to our QA hardware for load testing. The major blocker there is acceptance of the Node.JS 0.10, and TexLive 2012 packages into reprap, our internal aptitude package source. This is not quite as easy as it sounds, we already use TexLive 2009 in production for the Math extension and we must apply thorough tests to ensure we do not introduce any regressions when we update to the 2012 package. I'm not sure what actual dates for those migrations / testing will be because it greatly depends on when Ops has time. In the meantime, our existing PDF cluster based on mwlib will continue to serve our offline needs. Once our solution is deployed and tested, mwlib (pdf[1-3]) will be retired here at the WMF and print on demand services will be provided directly by PediaPress servers. For the technically curious; we're approximately following the parsoid deployment model -- using trebuchet to push out a source repository (services/ocg-collection) that has the configuration and node dependencies built on tin along with git submodules containing the actual service code. It may not look like it on the surface, but we've come a long way and it wouldn't have been possible without the (probably exasperated) help from Jeff Green, Faidon, and Ori. Also big thanks to Brad and Max for their work, and Gabriel for some head thunking. C. Scott and I are not quite off the hook yet, as indicated by the list above, but hopefully soon enough we'll be enjoying the cake and cookies from another new product launch. (And yes, even if you're remote if I promised you cookies as bribes I'll ship them to you :p) ~Matt Walker ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
Hoi, I have a few questions - do you support other scripts used by languages like Malayalam (ml), Persian (fa), Chinese (zh) Russian (ru) ?? - when you do, do you have examples for these languages ? - are the messages not localised or are they also not internationalised ? - are support for other scripts and proper internationalisation and localisation blockers for deployment ? Thanks, GerardM On 18 January 2014 03:42, Matthew Walker mwal...@wikimedia.org wrote: All, We've just finished our second sprint on the new PDF renderer. A significant chunk of renderer development time this cycle was on non latin script support, as well as puppetization and packaging for deployment. We have a work in progress pipeline up and running in labs which I encourage everyone to go try and break. You can use the following featured articles just to see what our current output is: * http://ocg-collection-alpha.wmflabs.org/index.php/Alexis_Bachelot * http://ocg-collection-alpha.wmflabs.org/index.php/Atlantis:_The_Lost_Empire Some other articles imported on that test wiki: * http://ur1.ca/gg0bw Please note that some of these will fail due to known issues noted below. You can render any page in the new renderer by clicking the sidebar link Download as WMF PDF; if you Download as PDF you'll be using the old renderer (useful for comparison.) Additionally, you can create full books via Special:Book -- our renderer is RDF to Latex (PDF) and the old renderer is e-book (PDF). You can also try out the RDF to Text (TXT) renderer, but that's not on the critical path. As of right now we do not have a bugzilla project entry so reply to this email, or email me directly -- we'll need one of: the name of the page, the name of the collection, or the collection_id parameter from the URL to debug. There are some code bits that we know are still missing that we will have to address in the coming weeks or in another sprint. * Attribution for images and text. The APIs are done, but we still need to massage that information into the document. * Message translation -- right now all internal messages are in English which is not so helpful to non English speakers. * Things using the cite tag and the Cite extension are not currently supported (meaning you won't get nice references.) * Tables may not render at all, or may break the renderer. * Caching needs to be greatly improved. Looking longer term into deployment on wiki, my plans right now are to get this into beta labs for general testing and connect test.wikipedia.org up to our QA hardware for load testing. The major blocker there is acceptance of the Node.JS 0.10, and TexLive 2012 packages into reprap, our internal aptitude package source. This is not quite as easy as it sounds, we already use TexLive 2009 in production for the Math extension and we must apply thorough tests to ensure we do not introduce any regressions when we update to the 2012 package. I'm not sure what actual dates for those migrations / testing will be because it greatly depends on when Ops has time. In the meantime, our existing PDF cluster based on mwlib will continue to serve our offline needs. Once our solution is deployed and tested, mwlib (pdf[1-3]) will be retired here at the WMF and print on demand services will be provided directly by PediaPress servers. For the technically curious; we're approximately following the parsoid deployment model -- using trebuchet to push out a source repository (services/ocg-collection) that has the configuration and node dependencies built on tin along with git submodules containing the actual service code. It may not look like it on the surface, but we've come a long way and it wouldn't have been possible without the (probably exasperated) help from Jeff Green, Faidon, and Ori. Also big thanks to Brad and Max for their work, and Gabriel for some head thunking. C. Scott and I are not quite off the hook yet, as indicated by the list above, but hopefully soon enough we'll be enjoying the cake and cookies from another new product launch. (And yes, even if you're remote if I promised you cookies as bribes I'll ship them to you :p) ~Matt Walker ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Status of the new PDF Renderer
Gerard, On Sat, Jan 18, 2014 at 1:38 AM, Gerard Meijssen gerard.meijs...@gmail.comwrote: - do you support other scripts used by languages like Malayalam (ml), Persian (fa), Chinese (zh) Russian (ru) ?? In the final product yes; I'm not entirely sure where we are with ml, and zh; but I've seen test renders in fa, and ru. It is a goal of our project to offer significantly better render support for all languages. - when you do, do you have examples for these languages ? If we don't already; please feel free to add them to my test instance on labs and report back. I know the zh and ru test pages I already have on wiki do render but the other language tests seem to fail at this time -- possibly due to pages that I imported. It would actually be fairly useful to have test pages with just the language content; and no extra templates / wiki features. - are the messages not localised or are they also not internationalised ? At this time, the internal status messages are not localized nor internationalized. I plan to add support for that; but it was not an initial focus because of the limited utility of these messages in the UI itself. - are support for other scripts and proper internationalisation and localisation blockers for deployment ? We have no specific goals for script support except better or equal in parity to the current mwlib renderer. In this phase, and once we deploy to beta labs; we're going to be relying on the community to tell us where we need to improve. It's also likely that both render pipelines will continue to be offered for some time in parallel. I do not consider localization of status messages in the backend renderer a blocker for the reason that a user does not need to understand the messages in order to continue to use the collection extension or renderer itself. It will merely be failing to report the in progress status of the render job. The failure and success notifications *are* localized so the final state is then something that any user can proceed from. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Status of the new PDF Renderer
All, We've just finished our second sprint on the new PDF renderer. A significant chunk of renderer development time this cycle was on non latin script support, as well as puppetization and packaging for deployment. We have a work in progress pipeline up and running in labs which I encourage everyone to go try and break. You can use the following featured articles just to see what our current output is: * http://ocg-collection-alpha.wmflabs.org/index.php/Alexis_Bachelot * http://ocg-collection-alpha.wmflabs.org/index.php/Atlantis:_The_Lost_Empire Some other articles imported on that test wiki: * http://ur1.ca/gg0bw Please note that some of these will fail due to known issues noted below. You can render any page in the new renderer by clicking the sidebar link Download as WMF PDF; if you Download as PDF you'll be using the old renderer (useful for comparison.) Additionally, you can create full books via Special:Book -- our renderer is RDF to Latex (PDF) and the old renderer is e-book (PDF). You can also try out the RDF to Text (TXT) renderer, but that's not on the critical path. As of right now we do not have a bugzilla project entry so reply to this email, or email me directly -- we'll need one of: the name of the page, the name of the collection, or the collection_id parameter from the URL to debug. There are some code bits that we know are still missing that we will have to address in the coming weeks or in another sprint. * Attribution for images and text. The APIs are done, but we still need to massage that information into the document. * Message translation -- right now all internal messages are in English which is not so helpful to non English speakers. * Things using the cite tag and the Cite extension are not currently supported (meaning you won't get nice references.) * Tables may not render at all, or may break the renderer. * Caching needs to be greatly improved. Looking longer term into deployment on wiki, my plans right now are to get this into beta labs for general testing and connect test.wikipedia.org up to our QA hardware for load testing. The major blocker there is acceptance of the Node.JS 0.10, and TexLive 2012 packages into reprap, our internal aptitude package source. This is not quite as easy as it sounds, we already use TexLive 2009 in production for the Math extension and we must apply thorough tests to ensure we do not introduce any regressions when we update to the 2012 package. I'm not sure what actual dates for those migrations / testing will be because it greatly depends on when Ops has time. In the meantime, our existing PDF cluster based on mwlib will continue to serve our offline needs. Once our solution is deployed and tested, mwlib (pdf[1-3]) will be retired here at the WMF and print on demand services will be provided directly by PediaPress servers. For the technically curious; we're approximately following the parsoid deployment model -- using trebuchet to push out a source repository (services/ocg-collection) that has the configuration and node dependencies built on tin along with git submodules containing the actual service code. It may not look like it on the surface, but we've come a long way and it wouldn't have been possible without the (probably exasperated) help from Jeff Green, Faidon, and Ori. Also big thanks to Brad and Max for their work, and Gabriel for some head thunking. C. Scott and I are not quite off the hook yet, as indicated by the list above, but hopefully soon enough we'll be enjoying the cake and cookies from another new product launch. (And yes, even if you're remote if I promised you cookies as bribes I'll ship them to you :p) ~Matt Walker ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l