On Sun, Jul 19, 2009 at 5:23 AM, Chengbin Zheng<chengbinzh...@gmail.com> wrote:
> Since the static HTML Wikipedia is not updating (please update), and XML
> updates like everyday, the logical choice is to go with XML. Is there any
> way to convert XML to HTML, like the static HTML version? I need it in HTML,
> and I don't want a one year old version of Wikipedia, with all the useless
> information on user talk, discussions, etc.
> Thank you.

There are plenty of options to parse the XML (or just the Mediawiki
markup) to HTML like :

- http://sourceforge.net/apps/mediawiki/wikiprep/index.php?title=Main_Page
(the parser is decent but currently
no real full featured HTML export)

- http://wiki.laptop.org/go/Wiki_Slice (but not using XML as source,
just stripping down output using ?action=raw)

- https://projects.fslab.de/projects/wpofflineclient/wiki/Specifications
(but also using the raw action)

(a nice article of how to a static version of Wikipedia :
http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html)

There is a also a nice list of all the parser available (usually from
the Mediawiki markup
to something else) :

http://www.mediawiki.org/wiki/Alternative_parsers

Regarding the XML format, usually you want to seek into the XML and
look for start of
<page> and the end of </page> to get the page and look for the <text>
element containing
the raw page in mediawiki markup format. So you can use all the
existing mediawiki
markup parser as long you have extract the latest revision of the page
in mediawiki format.

Hope this helps,

adulau

-- 
--                   Alexandre Dulaunoy (adulau) -- http://www.foo.be/
--                             http://www.foo.be/cgi-bin/wiki.pl/Diary
--         "Knowledge can create problems, it is not through ignorance
--                                that we can solve them" Isaac Asimov

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to