Hi folks,

for a long time we've relied on the mwlib libraries by PediaPress to
generate PDFs on Wikimedia sites. These have served us well (we
generate >200K PDFs/day), but they architecturally pre-date a lot of
important developments in MediaWiki, and actually re-implement the
MediaWiki parser (!) in Python. The occasion of moving the entire PDF
service to a new data-center has given us reason to re-think the
architecture and come up with a minimally viable alternative that we
can support long term.

Most likely, we'll end up using Parsoid's HTML5 output, transform it
to add required bits like licensing info and prettify it, and then
render it to PDF via phantomjs, but we're still looking at various
rendering options.

Thanks to Matt Walker, C. Scott Ananian, Max Semenik, Brad Jorsch and
Jeff Green for joining the effort, and thanks to the PediaPress folks
for giving background as needed. Ideally we'd like to continue to
support printed book generation via PediaPress' web service, while
completely replacing the rendering tech stack on the WMF side of
things (still using the Collection extension to manage books). We may
need to deprecate some output formats - more on that as we go.

We've got the collection-alt-renderer project set up on Labs (thanks
Andrew) and can hopefully get a plan to our ops team soon as to how
the new setup could work.

If you want to peek - work channel is #mediawiki-pdfhack on FreeNode.

Live notes here:
http://etherpad.wikimedia.org/p/pdfhack

Stuff will be consolidated here:
https://www.mediawiki.org/wiki/PDF_rendering

Some early experiments with different rendering strategies here:
https://github.com/cscott/pdf-research

Some improvements to Collection extension underway:
https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/extensions/Collection,n,z

More soon,
Erik

-- 
Erik Möller
VP of Engineering and Product Development, Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to