Re: [Wikitech-l] alternative PDF exporter

2011-08-07 Thread N. Max Pierson
Thanks for the replies.

Håkon,

First I would like to thank you for the effort you and your colleagues have
put into PrinceXML (and Opera as well). I read through the Norwegian wiki
and I must say that it would be a great approach to fixing the issue. It
would be quite an undertaking to move all of the style tags into a class
however and would take way more time than I have to do a complete wiki fix.
Since the most important wiki I maintain very rarely has any back end
changes made to it, it would not be that difficult to simply rip out the
wiki markup before it is sent to the browser, clean it up with some regular
expressions, and then send it to the PDF renderer.

For my current projects, I could use Prince on 2 of the non-commercial
wiki's I maintain, however my other wiki site is commercial and there simply
isn't enough in my budget for the license. It is a wonderful binary and does
the job exactly as expected during my testing, but I just do not have any
say so over the money for this project. Now that I see that this seems to be
a pretty common problem, I may be able to start a new side project that
could help remedy this problem, but as stated before, I have several
projects in line before I could even begin thinking about helping out with
the changes to mediawiki itself. I see that Jon Harald Søby is the lead on
this project and it it is still in the draft process, but I am very
interested in the idea and will keep up with the process even though I
cannot contribute at this moment in time.

Brion,

This is exactly what i'm looking to do. At the moment, I just need to render
one wiki article at a time and wkhtmltopdf works perfectly when I tested it
with some simple scripting. I need to learn a little more about the global
object variables, but I believe I've read through the development manuals
enough and have ripped apart a few similar extensions to give me a great
place to start. If this turns out to be something useful across all three of
my wiki sites, I will probably register it on the mediawiki site so that
others may benefit from it since there seems to be more than just myself in
need of rendering CSS to PDF.

Once again thanks for the replies all.

Regards,
max


*
*
On Sat, Aug 6, 2011 at 2:07 AM, Brion Vibber br...@pobox.com wrote:

 On Fri, Aug 5, 2011 at 9:56 PM, N. Max Pierson nmaxpier...@gmail.com
 wrote:

  New to the list, so please tell me to RTFM if I've missed anything or
 this
  is the incorrect place for my question.
 
  We've been using and hacking 2 different extensions (PdfBook and
 PdfExport)
  to render our wiki pages to PDF. Both extensions have been hacked enough
 to
  work quite well but now we I a requirement to include CSS so that the
  rendered PDF looks EXACTLY like the wiki.


 A couple years ago I did some brief experiments using this tool:

 http://code.google.com/p/wkhtmltopdf/

 It simply uses the WebKit HTML renderer implementation and PDF output
 implementation available in the common Qt framework library to render any
 given web page to PDF, just as if you had printed / saved to PDF from a
 browser.

 If you only need to render out individual pages (as opposed to bundling
 collections of pages for book-style publishing with additional credit 
 license information), this sort of thing is probably a far better option
 than anything that tries to work with the low-level wiki markup
 (necessitating reimplementation of all of MediaWiki's parser, any plugins
 used, and of course... an HTML renderer.)

 -- brion
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] alternative PDF exporter

2011-08-06 Thread Brion Vibber
On Fri, Aug 5, 2011 at 9:56 PM, N. Max Pierson nmaxpier...@gmail.comwrote:

 New to the list, so please tell me to RTFM if I've missed anything or this
 is the incorrect place for my question.

 We've been using and hacking 2 different extensions (PdfBook and PdfExport)
 to render our wiki pages to PDF. Both extensions have been hacked enough to
 work quite well but now we I a requirement to include CSS so that the
 rendered PDF looks EXACTLY like the wiki.


A couple years ago I did some brief experiments using this tool:

http://code.google.com/p/wkhtmltopdf/

It simply uses the WebKit HTML renderer implementation and PDF output
implementation available in the common Qt framework library to render any
given web page to PDF, just as if you had printed / saved to PDF from a
browser.

If you only need to render out individual pages (as opposed to bundling
collections of pages for book-style publishing with additional credit 
license information), this sort of thing is probably a far better option
than anything that tries to work with the low-level wiki markup
(necessitating reimplementation of all of MediaWiki's parser, any plugins
used, and of course... an HTML renderer.)

-- brion
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] alternative PDF exporter

2011-08-05 Thread N. Max Pierson
Hi *,

New to the list, so please tell me to RTFM if I've missed anything or this
is the incorrect place for my question.

We've been using and hacking 2 different extensions (PdfBook and PdfExport)
to render our wiki pages to PDF. Both extensions have been hacked enough to
work quite well but now we I a requirement to include CSS so that the
rendered PDF looks EXACTLY like the wiki. Since htmldoc doesn't support CSS
as of yet, I've been looking at some different approaches to rendering
wiki's to PDF.

The first option I came across was PrinceXML. This does quite a nice job but
at a cost of a license ~$6k USD. Unfortunately that's a deal breaker due to
the nature of the projects (some of which are funded and others are
completely open and not funded except for my time). I then ran across
wkhtmltopdf which utilizes QTWebkit. After a few scripts to test externally,
this seems to be a viable option that I would like to put some cycles
towards.

I've been using mediawiki for quite some time and know PHP, so writing an
extension should not be to terribly hard. I've been reading through the
manual on extension development (hooking, parsers, etc) and kinda get the
idea of how to do it. I'm sure that after a little trial and error and
looking at others source, I should be able to get what I need done and give
the extension back if anyone would be interested.

Has anyone gone down this road before with trying to render wiki's with
CSS?? (We currently have custom Common.css and Print.css files on many wiki
sites). Obviously I don't want to have to re-invent the wheel, but I have
yet to see any extensions that support CSS.

Any feedback would be greatly appreciated!!

TIA,
max
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] alternative PDF exporter

2011-08-05 Thread Håkon Wium Lie
Also sprach N. Max Pierson:

  Has anyone gone down this road before with trying to render wiki's with
  CSS?? (We currently have custom Common.css and Print.css files on many wiki
  sites). Obviously I don't want to have to re-invent the wheel, but I have
  yet to see any extensions that support CSS.

There's a set of case studies here:

  http://www.princexml.com/samples/#wiki

You can use Prince for free for non-commercial purposes. 

Wikipedia's HTML markup is suboptimal for printing, ofte due to the
use of the 'style' attribute which hardcodes presentations for
screens. 

  http://www.princexml.com/bb/viewtopic.php?f=2t=3823

In Norway, we have started a project to exterminate the 'style'
attribute. Here's a description (in Norwegian):

  
http://no.wikipedia.org/wiki/Wikipedia:Underprosjekter/Utryddelse_av_«style»-attributtet

Good progress has been made in the templates. Most of the remaining
issues are in the Mediawiki software itself. For example, in markup
like this:

   div class=thumbinner style=width:222px;a 
href=/wiki/Fil:FeleHel_(2).jpg class=imageimg alt= 
src=http://upload.wikimedia.org/wikipedia/commons/thumb/b/bf/FeleHel_%282%29.jpg/220px-FeleHel_%282%29.jpg;
 width=220 height=431 class=thumbimage //a

Perhaps one could create classes for the most common sizes? (220px seems quite 
common)

Then there's these:

div id=mw-js-message style=display:none;/div

div id=p-logoa style=background-image: 
url(http://upload.wikimedia.org/wikipedia/no/b/bc/Wiki.png);href=/wiki/Portal:Forside
  title=Hovedside/a/div

div style=clear:both/div

Efforts to rmove these -- by turning them into classes -- will be much
appreciated.

Cheers,

-hkon

http://people.opera.com/howcome
http://www.princexml.com/howcome


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l