from:"Chengbin Zheng"

[Wikitech-l] What happened to static Wikipedia page?

2009-10-13 Thread Chengbin Zheng

The page is blank. What happened to http://static.wikipedia.org/?
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Any news to update static HTML Wikipedia?

2009-09-02 Thread Chengbin Zheng

On Wed, Sep 2, 2009 at 8:45 AM, Manuel Schneider <
manuel.schnei...@wikimedia.ch> wrote:

> Hi Chengbin,
>
> ZIM is an upcoming standard for using HTML contents offline. It is derived
> from the Zeno file format used on the german Wikipedia DVDs since 2006 (ZIM
> =
> Zeno IMproved).
>
> There are currently several reader applications for it, for instance the
> zimreader made by the openZIM project or Kiwix.
> There are some ports around like Kiwix on Windows and zimreader on openmoko
> /
> ARM.
>
> The zimreader by openZIM works like a small webserver, it serves the
> contents
> of the ZIM file locally.
>
> Once the HTML dump on static.wikimedia.org is fixed and ZIM file creation
> has
> been integrated you will be able to download fresh ZIM files of all
> Wikimedia
> projects directly from download.wikimedia.org.
>
> Currently the Kiwix team has created some ZIM files and we try to build a
> ZIM
> file directory:
> http://openzim.org/ZIM_File_Archive
>
> ZIM actually stores the article text portion of the HTML output of the Wiki
> in
> a compressed cluster. It can hold also all type of other MIME types such as
> images, CSS files etc.
> http://openzim.org/ZIM_File_Format
>
> It is an open standard and has currently been developed and implemented by
> the
> openZIM team (sponsored by Wikimedia CH) in C++. There is a library
> (zimlib)
> which can be integrated in other reader or dumping applications to make
> them
> ZIM-aware.
>
> Using the open documentation ZIM can be implemented in any other language
> as
> well.
> The idea of ZIM is to make the data files freely interchangeable with any
> reader application. It is also flexible enough to store other works than
> only
> data from Wikipedia/MediaWiki. Then it tries to keep the reader application
> as simple and stupid as possible. There is only uncompression and HTML
> rendering to be done while a HTML renderer should be available on nearly
> all
> devices.
>
> Greets,
>
>
> Manuel
>
>
> Am Mittwoch, 2. September 2009 schrieb Chengbin Zheng:
> > On Wed, Sep 2, 2009 at 8:13 AM, Manuel Schneider <
> >
> > manuel.schnei...@wikimedia.ch> wrote:
> > > Hi Chengbin, hi list,
> > >
> > > static.wikimedia.org is currently not being updated and while the
> dumps
> > > processing has been assigned to and completely rewritten by Tomasz Finc
> > > (developer at WMF), there has not been made any assignment concerning
> > > HTML dumps.
> > >
> > > We had a Wikipedia Offline meeting at Wikimania last week and discussed
> > > several issues. One issue is the fact, that WMF wants to see the ZIM
> file
> > > format being used for offline dumps and has suggested to include it
> into
> > > the
> > > regular dumping process.
> > > So one question was: When will that happen, what is the status of WMF
> ZIM
> > > dumping?
> > > As ZIM uses HTML extracts Tomasz clarified that once
> > > static.wikimedia.orghas been rebuild to be stable and sutainable,
> > > integrating ZIM would be trivial. But he also informed us that this
> task
> > > has not yet been assigned.
> > >
> > > As Brion Vibber and Erik Möller have been at the meeting as well we
> hope
> > > that
> > > this assignment will be made soon and this task has got higher
> priority.
> > >
> > > This said I may also advise you not to you use the pure HTML dumps but
> > > the ZIM
> > > files for your Archos, because that's what they are meant for.
> > > A ZIM file containing all german Wikipedia articles (>900,000) is 1,4
> GB,
> > > an
> > > additional full text search index takes another 1 GB.
> > >
> > > Greets,
> > >
> > >
> > > Manuel
> > >
> > > Am Mittwoch, 2. September 2009 schrieb Chengbin Zheng:
> > > > I bring this old issue up because I want to know if (or if not)
> > > > progress (or plans) are made to update the static HTML version of
> > > > Wikipedia. B&H photos just leaked the next generation of Archos
> > > > portable media players. Unbelievably, the rumors of a 500GB version
> is
> > > > true! This is already tempting (especially the price at $420). Just
> > > > waiting for specs
> > >
> > > on
> > >
> > > > September 15, the Archos event. I really hope it will support NTFS so
> I
> > >
> > > can
> > >
> > > > use the compression feature.

Re: [Wikitech-l] Any news to update static HTML Wikipedia?

2009-09-02 Thread Chengbin Zheng

On Wed, Sep 2, 2009 at 8:13 AM, Manuel Schneider <
manuel.schnei...@wikimedia.ch> wrote:

> Hi Chengbin, hi list,
>
> static.wikimedia.org is currently not being updated and while the dumps
> processing has been assigned to and completely rewritten by Tomasz Finc
> (developer at WMF), there has not been made any assignment concerning HTML
> dumps.
>
> We had a Wikipedia Offline meeting at Wikimania last week and discussed
> several issues. One issue is the fact, that WMF wants to see the ZIM file
> format being used for offline dumps and has suggested to include it into
> the
> regular dumping process.
> So one question was: When will that happen, what is the status of WMF ZIM
> dumping?
> As ZIM uses HTML extracts Tomasz clarified that once static.wikimedia.orghas
> been rebuild to be stable and sutainable, integrating ZIM would be trivial.
> But he also informed us that this task has not yet been assigned.
>
> As Brion Vibber and Erik Möller have been at the meeting as well we hope
> that
> this assignment will be made soon and this task has got higher priority.
>
> This said I may also advise you not to you use the pure HTML dumps but the
> ZIM
> files for your Archos, because that's what they are meant for.
> A ZIM file containing all german Wikipedia articles (>900,000) is 1,4 GB,
> an
> additional full text search index takes another 1 GB.
>
> Greets,
>
>
> Manuel
>
>
>
> Am Mittwoch, 2. September 2009 schrieb Chengbin Zheng:
> > I bring this old issue up because I want to know if (or if not) progress
> > (or plans) are made to update the static HTML version of Wikipedia.
> > B&H photos just leaked the next generation of Archos portable media
> > players. Unbelievably, the rumors of a 500GB version is true! This is
> > already tempting (especially the price at $420). Just waiting for specs
> on
> > September 15, the Archos event. I really hope it will support NTFS so I
> can
> > use the compression feature.
> >
> > It would be really cool and convenient to have an offline copy of
> Wikipedia
> > anywhere I go without the need of Wi-Fi. What am I gonna do with 500GB?
> >
> > BTW, does anyone know what is the size of the current static HTML English
> > Wikipedia version uncompressed? Thanks.
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> Regards
> Manuel Schneider
>
> Wikimedia CH - Verein zur Förderung Freien Wissens
> Wikimedia CH - Association for the advancement of free knowledge
> www.wikimedia.ch
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

I'm not familiar with the file extension .zim. What is that? Some sort of
compressed html format like .chm? Where can I get a .zim file? I need to get
check if this format is compatible with my Archos's Opera browser.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Any news to update static HTML Wikipedia?

2009-09-02 Thread Chengbin Zheng

On Tue, Sep 1, 2009 at 9:00 PM, Platonides  wrote:

>  Chengbin Zheng wrote:
> > I bring this old issue up because I want to know if (or if not) progress
> (or
> > plans) are made to update the static HTML version of Wikipedia.
> > B&H photos just leaked the next generation of Archos portable media
> players.
> > Unbelievably, the rumors of a 500GB version is true! This is already
> > tempting (especially the price at $420). Just waiting for specs on
> September
> > 15, the Archos event. I really hope it will support NTFS so I can use the
> > compression feature.
> >
> > It would be really cool and convenient to have an offline copy of
> Wikipedia
> > anywhere I go without the need of Wi-Fi. What am I gonna do with 500GB?
> >
> > BTW, does anyone know what is the size of the current static HTML English
> > Wikipedia version uncompressed? Thanks.
>
> I don't think a static dump is the best way to keep wikipedia on your hd.
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>


It is the only way actually.  Although I'm curious on what other ways one
can use to keep Wikipedia

Archos PMPs are not computers, but they do have the ability to go on the
Internet, and read an HTML file offline through the hard drive.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Any news to update static HTML Wikipedia?

2009-09-01 Thread Chengbin Zheng

I bring this old issue up because I want to know if (or if not) progress (or
plans) are made to update the static HTML version of Wikipedia.
B&H photos just leaked the next generation of Archos portable media players.
Unbelievably, the rumors of a 500GB version is true! This is already
tempting (especially the price at $420). Just waiting for specs on September
15, the Archos event. I really hope it will support NTFS so I can use the
compression feature.

It would be really cool and convenient to have an offline copy of Wikipedia
anywhere I go without the need of Wi-Fi. What am I gonna do with 500GB?

BTW, does anyone know what is the size of the current static HTML English
Wikipedia version uncompressed? Thanks.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Simple way to convert XML to HTML

2009-07-26 Thread Chengbin Zheng

On Sun, Jul 26, 2009 at 8:51 PM, K. Peachey  wrote:

> On Mon, Jul 27, 2009 at 10:17 AM, Chengbin Zheng
> wrote:
> > Anyone know how long it takes to create a static HTML dump? A month?
> > ___
> > Wikitech-l mailing list
> As in locally on your own systems or for the WMF servers to create it?
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

WMF servers.

Sorry for not clarifying.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Simple way to convert XML to HTML

2009-07-26 Thread Chengbin Zheng

On Sun, Jul 26, 2009 at 8:13 PM, Andrew Dunbar  wrote:

> 2009/7/26 Andrew Garrett :
> >
> > On 21/07/2009, at 6:48 PM, Daniel Schwen wrote:
> >
>  wouldn't it be faster than to actually create a static HTML dump the
>  traditional way?
> >>> The content is wiki-text. It has to be parsed to be turned into
> >>> HTML. There
> >>> isn't a more traditional way, because there is no other way.
> >>
> >> Wouldn't it be possible to dump the parser cache instead of dumping
> >> XML and reparsing? Al the parsing work is already done on the
> >> Wikimedia servers, why do it again on a slow desktop system?
> >
> > For a few reasons:
> >
> > 1/ There's no reason to expect that the contents of every page,
> > revision, et cetera, would be in the parser cache.
> > 2/ Deleted or otherwise private revision content may remain in the
> > parser cache.
> > 3/ There would be a lot of redundant content in the parser cache,
> > owing to people browsing with the same options.
> > 4/ None of the useful article metadata is stored in the parser cache.
> > 5/ The parser cache is stored in memcached, a hash-based system which
> > it is impossible to simply "dump", let alone dump selectively
> > excluding all of the other things stored in memcached (including quite
> > a bit of private data).
> >
> > It might, however, be sensible to generate parsed HTML text for every
> > page, save them in a directory, and then zip it up.
> >
> > Oh, wait...
>
> I always thought it would be much more useful to generate the HTML of
> action=render for every page rather than the action=view with the HTML
> for one specific skin a million or so times, which is then a pain to
> parse out if you want to do anything other than open the HTML in a
> browser.
>
> (-:
>
> Andrew Dunbar (hippietrail)
>
>
> > --
> > Andrew Garrett
> > agarr...@wikimedia.org
> > http://werdn.us/
> >
> >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
>
>
> --
> http://wiktionarydev.leuksman.com http://linguaphile.sf.net
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

Anyone know how long it takes to create a static HTML dump? A month?
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Simple way to convert XML to HTML

2009-07-22 Thread Chengbin Zheng

On Wed, Jul 22, 2009 at 6:53 PM, Aryeh Gregor

> wrote:

> On Wed, Jul 22, 2009 at 6:37 PM, Tei wrote:
> > At a point, Brion compressed it to 242 MB.
> >
> > http://www.mail-archive.com/wikitech-l@lists.wikimedia.org/msg00358.html
>
> It looks like it was Platonides, not Brion, and as far as I can tell,
> Gregory Maxwell said his compression procedure was broken (i.e.,
> inadvertently lossy).
>
> On Wed, Jul 22, 2009 at 7:03 PM, Chengbin Zheng
> wrote:
> > I have no doubt that you can compress it to 3.3GB. I'm just curious how
> > that's possible for an eBook format.
>
> You just use a very good compression algorithm.  Why can't e-books use
> 7-Zip?
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

Because decompression would be so slow it would be unusable (correct me if
I'm wrong).

Even if it used an excellent compression algorithm, you can't use solid
compression, otherwise decompression will be a major pain. My own testing
show that solid compression is roughly 5 times more efficient in compressing
Wikipedia than normal compression.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Simple way to convert XML to HTML

2009-07-22 Thread Chengbin Zheng

On Wed, Jul 22, 2009 at 2:37 PM, Tei  wrote:

> On Wed, Jul 22, 2009 at 5:48 PM, Chengbin Zheng
> wrote:
> ...
> >
> > Yes, the "TombRaider" version is exactly the version I want for static
> > HTML.
> >
> > Just curious, is
> > pages-articles.xml.bz2<
> http://download.wikimedia.org/enwiki/20090713/enwiki-20090713-pages-articles.xml.bz2
> >
> > like
> > a "TombRaider" version? If not, what's the difference?
> >
> > And another curiosity, at
> > http://en.wikipedia.org/wiki/Wikipedia:TomeRaider_database, it says the
> > English Wikipedia database is only 3.3GB. Did they use compression? That
> > seems awfully small. Even if they did, that's an incredible compression
> > ratio, similar to 7-zip, I don't know how you can do that on a eBook
> format.
> > NTFS compression only brings size down 50%.
>
> At a point, Brion compressed it to 242 MB.
>
> http://www.mail-archive.com/wikitech-l@lists.wikimedia.org/msg00358.html
>
> You may also read this:
>  http://en.wikipedia.org/wiki/Solid_compression
>
>
> --
> --
> ℱin del ℳensaje.
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>


I have no doubt that you can compress it to 3.3GB. I'm just curious how
that's possible for an eBook format. 3.3GB, does it include skin, proper
format of Wikipedia, etc?

I'm assuming that the pages-articles.xml.bz2 XML dump includes something
else other than the raw articles? What else are in it?
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Simple way to convert XML to HTML

2009-07-22 Thread Chengbin Zheng

On Wed, Jul 22, 2009 at 8:15 AM, Dmitriy Sintsov  wrote:

> * Tei  [Tue, 21 Jul 2009 19:42:45 +0200]:
> > On Tue, Jul 21, 2009 at 7:17 PM, Chengbin
> Zheng
> > wrote:
> > ...
> > >
> > > No, I know what parsing means. Even if it takes 2 days to parse
> them,
> > > wouldn't it be faster than to actually create a static HTML dump the
> > > traditional way?
> > >
> > > If it is not, then what is the difficulty of making static HTML
> dumps?
> > It
> > > can't be bandwidth, storage, or speed.
> > >
> >
> > WikiMedia work with limited resources on manpower, hardware,
> etc..etc...
> >
> > Things are done. When? when theres available resources, humans and of
> > the other types.
> > Is not only you, there are lots of people that want to download the
> > wikipedia (sometimes in a periodic fashion)
> >
> > There are a log somewhere with the daily work of some wikipedia admin.
> (
> > - :
> > http://wikitech.wikimedia.org/view/Server_admin_log
> >
> > Some of these are even very fun, like in:
> > 02:11 b: CPAN sux
> > 01:47 d**: I FOUND HOW TO REVIVE APACHES
> > ( names obscured to protect the inocents ).
> >
> Speaking of compact off-line English Wikipedia I liked the TomeRaider
> version:
> http://en.wikipedia.org/wiki/TomeRaider
> I wish there were newer TR builds, because English Wikipedia grows
> really fast.
> Dmitriy
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

Yes, the "TombRaider" version is exactly the version I want for static
HTML.

Just curious, is
pages-articles.xml.bz2<http://download.wikimedia.org/enwiki/20090713/enwiki-20090713-pages-articles.xml.bz2>
like
a "TombRaider" version? If not, what's the difference?

And another curiosity, at
http://en.wikipedia.org/wiki/Wikipedia:TomeRaider_database, it says the
English Wikipedia database is only 3.3GB. Did they use compression? That
seems awfully small. Even if they did, that's an incredible compression
ratio, similar to 7-zip, I don't know how you can do that on a eBook format.
NTFS compression only brings size down 50%.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Simple way to convert XML to HTML

2009-07-21 Thread Chengbin Zheng

On Tue, Jul 21, 2009 at 2:20 PM, Chengbin Zheng wrote:

>
>
> On Tue, Jul 21, 2009 at 1:49 PM, Chad  wrote:
>
>> On Tue, Jul 21, 2009 at 1:42 PM, Tei wrote:
>> > On Tue, Jul 21, 2009 at 7:17 PM, Chengbin Zheng
>> wrote:
>> > ...
>> >>
>> >> No, I know what parsing means. Even if it takes 2 days to parse them,
>> >> wouldn't it be faster than to actually create a static HTML dump the
>> >> traditional way?
>> >>
>> >> If it is not, then what is the difficulty of making static HTML dumps?
>> It
>> >> can't be bandwidth, storage, or speed.
>> >>
>> >
>> > WikiMedia work with limited resources on manpower, hardware, etc..etc...
>> >
>> > Things are done. When? when theres available resources, humans and of
>> > the other types.
>> > Is not only you, there are lots of people that want to download the
>> > wikipedia (sometimes in a periodic fashion)
>> >
>> > There are a log somewhere with the daily work of some wikipedia admin. (
>> - :
>> > http://wikitech.wikimedia.org/view/Server_admin_log
>> >
>> > Some of these are even very fun, like in:
>> > 02:11 b: CPAN sux
>> > 01:47 d**: I FOUND HOW TO REVIVE APACHES
>> > ( names obscured to protect the inocents ).
>> >
>> > --
>> > --
>> > ℱin del ℳensaje.
>> >
>> > ___
>> > Wikitech-l mailing list
>> > Wikitech-l@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>> Hehe, seeing as like there's only 10 different names on there, it's
>> pretty easy to figure out who B and D are ;-)
>>
>> -Chad
>>
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>
> I can't imagine the need of downloading Wikipedia often for personal use.
> The amount of work (or should I say pain) involved to get Wikipedia working,
> umm, I don't want to do that often.
>
> The only reason I'm doing it is I want a copy of Wikipedia on the go.
> Finding Wi-Fi hotspots is hard (especially in a subway, LOL). It can save me
> time, as I can do research anytime I want, anywhere I want, for example in
> the subway. I'm not downloading the current static HTML dump because
>
> 1: It is very outdated.
> 2: It contains a LOT of useless information, hogging up half the space.
> Space is a big priority, as the English Wikipedia is what, 300GB
> uncompressed including "junk". The next Archos PMP releasing in September is
> said to have a 500GB hard drive, but I doubt it, even though I hope so,
> because I would need 500GB if I'm putting Wikipedia on it (my videos are
> taking 220ish GB already on my Archos 5). Seriously hoping the next Archos
> supports NTFS (compression feature, cuts size by about half). How hard is it
> to get Linux to support NTFS?
>
> Why would you download Wikipedia? Internet is so readily available, and the
> online version has images.
>


I downloaded the static HTML dump for another language to do a MUCH MUCH
smaller scale test to see if it actually works. It works brilliantly. Even
the search function works!! I didn't expect that to work. How does the
search function work? I thought it is like search in Windows, but since
everything is on RAM, website searches are instantaneous. I'm running this
on hard drive, and it is instantaneous as well.

BTW, the pages-articles.xml.bz2 version of the XML dump, does it include
links to images, even though images don't exist? I find those pages taking
up a lot of space as well.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Simple way to convert XML to HTML

2009-07-21 Thread Chengbin Zheng

On Tue, Jul 21, 2009 at 1:49 PM, Chad  wrote:

> On Tue, Jul 21, 2009 at 1:42 PM, Tei wrote:
> > On Tue, Jul 21, 2009 at 7:17 PM, Chengbin Zheng
> wrote:
> > ...
> >>
> >> No, I know what parsing means. Even if it takes 2 days to parse them,
> >> wouldn't it be faster than to actually create a static HTML dump the
> >> traditional way?
> >>
> >> If it is not, then what is the difficulty of making static HTML dumps?
> It
> >> can't be bandwidth, storage, or speed.
> >>
> >
> > WikiMedia work with limited resources on manpower, hardware, etc..etc...
> >
> > Things are done. When? when theres available resources, humans and of
> > the other types.
> > Is not only you, there are lots of people that want to download the
> > wikipedia (sometimes in a periodic fashion)
> >
> > There are a log somewhere with the daily work of some wikipedia admin. (
> - :
> > http://wikitech.wikimedia.org/view/Server_admin_log
> >
> > Some of these are even very fun, like in:
> > 02:11 b: CPAN sux
> > 01:47 d**: I FOUND HOW TO REVIVE APACHES
> > ( names obscured to protect the inocents ).
> >
> > --
> > --
> > ℱin del ℳensaje.
> >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> Hehe, seeing as like there's only 10 different names on there, it's
> pretty easy to figure out who B and D are ;-)
>
> -Chad
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

I can't imagine the need of downloading Wikipedia often for personal use.
The amount of work (or should I say pain) involved to get Wikipedia working,
umm, I don't want to do that often.

The only reason I'm doing it is I want a copy of Wikipedia on the go.
Finding Wi-Fi hotspots is hard (especially in a subway, LOL). It can save me
time, as I can do research anytime I want, anywhere I want, for example in
the subway. I'm not downloading the current static HTML dump because

1: It is very outdated.
2: It contains a LOT of useless information, hogging up half the space.
Space is a big priority, as the English Wikipedia is what, 300GB
uncompressed including "junk". The next Archos PMP releasing in September is
said to have a 500GB hard drive, but I doubt it, even though I hope so,
because I would need 500GB if I'm putting Wikipedia on it (my videos are
taking 220ish GB already on my Archos 5). Seriously hoping the next Archos
supports NTFS (compression feature, cuts size by about half). How hard is it
to get Linux to support NTFS?

Why would you download Wikipedia? Internet is so readily available, and the
online version has images.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Simple way to convert XML to HTML

2009-07-21 Thread Chengbin Zheng

On Tue, Jul 21, 2009 at 1:11 PM, Aryeh Gregor

> wrote:

> On Tue, Jul 21, 2009 at 1:08 PM, Chengbin Zheng
> wrote:
> > Wouldn't parsing it be faster than actually creating that many HTMLs?
>
> Parsing it *is* creating the HTML files.  That's what "parsing" means
> in MediaWiki, converting wikitext to HTML.  It's kind of a misnomer,
> admittedly.
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

No, I know what parsing means. Even if it takes 2 days to parse them,
wouldn't it be faster than to actually create a static HTML dump the
traditional way?

If it is not, then what is the difficulty of making static HTML dumps? It
can't be bandwidth, storage, or speed.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Simple way to convert XML to HTML

2009-07-21 Thread Chengbin Zheng

On Tue, Jul 21, 2009 at 12:47 PM, Aryeh Gregor <
simetrical+wikil...@gmail.com > wrote:

> On Tue, Jul 21, 2009 at 11:22 AM, Chengbin Zheng
> wrote:
> > On a side note, if parsing the XML gets you the static HTML version of
> > Wikipedia, why can't Wikimedia just parse it for us and save a lot of our
> > time (parsing and learning), and use that as the static HTML dump
> version?
>
> I'd assume it was a performance issue to parse all the pages for all
> the dumps so often.  It might have just used too much CPU to be worth
> it at the time.  Parsing some individual pages can take 20 seconds or
> more, and there are millions of them (although most much faster to
> parse than that).  I'm sure it could be reinstituted with some effort,
> though.
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

Wouldn't parsing it be faster than actually creating that many HTMLs?
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Simple way to convert XML to HTML

2009-07-21 Thread Chengbin Zheng

On Tue, Jul 21, 2009 at 11:18 AM, Chengbin Zheng wrote:

>
>
> On Tue, Jul 21, 2009 at 9:37 AM, Lane, Ryan  > wrote:
>
>> > Actually, I do have to learn everything. I know absolutely
>> > nothing about
>> > HTML and all the stuff (Maybe I will when I take the computer
>> > science course
>> > in grade 10). Think of it this way, you have a radioactive
>> > material decay
>> > problem, where you want to find out how much mass is left
>> > after 1000 years.
>> > Obviously there is no simple algebraic way of doing it. You
>> > must set up a
>> > differential equation and solve it. There is no way to do it
>> > if your math
>> > skills are only basic algebra. This is me, and I have to learn all of
>> > advanced algebra, functions, trigonmetry, calculus, and differential
>> > equation to do it.
>>
>> If you were able to do x264 from the commandline, this will be a walk in
>> the
>> park. I've been using the commandline for years and I *much* prefer to use
>> a
>> GUI to do x264 transcoding.
>>
>> Using the html exporter from the commandline is fairly simple, and it is
>> documented on the extension page:
>>
>> http://www.mediawiki.org/wiki/Extension:DumpHTML
>>
>> V/r,
>>
>> Ryan Lane
>>
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>
> I have no idea on how to install MediaWiki. This is too difficult and
> troublesome. Considering how much pain it is to use x264 from command line,
> I probably don't want to try this. Truthfully there is not much to x264 in
> command line. But the programs I'm seeing here is, well, complicated, to say
> the least. I'm just gonna wait for Wikimedia to update the static HTML, or
> bother my computer science teacher, LOL.
>


On a side note, if parsing the XML gets you the static HTML version of
Wikipedia, why can't Wikimedia just parse it for us and save a lot of our
time (parsing and learning), and use that as the static HTML dump version?
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Simple way to convert XML to HTML

2009-07-21 Thread Chengbin Zheng

On Tue, Jul 21, 2009 at 9:37 AM, Lane, Ryan
wrote:

> > Actually, I do have to learn everything. I know absolutely
> > nothing about
> > HTML and all the stuff (Maybe I will when I take the computer
> > science course
> > in grade 10). Think of it this way, you have a radioactive
> > material decay
> > problem, where you want to find out how much mass is left
> > after 1000 years.
> > Obviously there is no simple algebraic way of doing it. You
> > must set up a
> > differential equation and solve it. There is no way to do it
> > if your math
> > skills are only basic algebra. This is me, and I have to learn all of
> > advanced algebra, functions, trigonmetry, calculus, and differential
> > equation to do it.
>
> If you were able to do x264 from the commandline, this will be a walk in
> the
> park. I've been using the commandline for years and I *much* prefer to use
> a
> GUI to do x264 transcoding.
>
> Using the html exporter from the commandline is fairly simple, and it is
> documented on the extension page:
>
> http://www.mediawiki.org/wiki/Extension:DumpHTML
>
> V/r,
>
> Ryan Lane
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

I have no idea on how to install MediaWiki. This is too difficult and
troublesome. Considering how much pain it is to use x264 from command line,
I probably don't want to try this. Truthfully there is not much to x264 in
command line. But the programs I'm seeing here is, well, complicated, to say
the least. I'm just gonna wait for Wikimedia to update the static HTML, or
bother my computer science teacher, LOL.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Simple way to convert XML to HTML

2009-07-21 Thread Chengbin Zheng

On Mon, Jul 20, 2009 at 11:33 PM, Kwan Ting Chan  wrote:

> Chengbin Zheng wrote:
>
>>
>> Thank you for dropping by and sharing this information with us Tomasz!
>>
>> It is good just knowing that it is in the queue. Have you considered
>> making
>> a version of static HTML Wikipedia where there are no user talk and
>> discussion pages that eating up half the space (like the 5GB XML dump for
>> English Wikipedia)? As in the previous E-Mail, it is impossible to delete
>> millions of pages through Windows Vista's search function (I left it
>> overnight, and it ended up eating 1.3GB of RAM and maxing out one of my
>> cores. Even deleting a single file took minutes).
>>
>
> The Windows (and others?) GUI wasn't really designed with what you are
> trying to do in mind in terms of the number of items. You are asking it to
> search for all the files that match your pattern, keep the millions (?) of
> results in memory, and then to show you a windows containing the millions of
> items and to let you do all the magic GUI operations (selecting / dragging
> ...) all the while keeping track of which you've selected / move about etc.
>
> I know you want to avoid using command line, but in this case it's really
> much simpler / only feasible choice to search the internet / ask around for
> the right commands and issue that on the command line. It's only going to be
> one line of typing once you've got it, and you can write it down on a piece
> of paper or something for future reference. It's not like you have to learn
> the ins and out of all the commands and its options and what not. (Of
> course, you would want to test it on a small sample to make sure the command
> is correct before you let it loose on the whole dump.)
>
> KTC
>
> --
> Experience is a good school but the fees are high.
>- Heinrich Heine
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

Actually, I do have to learn everything. I know absolutely nothing about
HTML and all the stuff (Maybe I will when I take the computer science course
in grade 10). Think of it this way, you have a radioactive material decay
problem, where you want to find out how much mass is left after 1000 years.
Obviously there is no simple algebraic way of doing it. You must set up a
differential equation and solve it. There is no way to do it if your math
skills are only basic algebra. This is me, and I have to learn all of
advanced algebra, functions, trigonmetry, calculus, and differential
equation to do it.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Simple way to convert XML to HTML

2009-07-20 Thread Chengbin Zheng

On Mon, Jul 20, 2009 at 10:21 PM, Tomasz Finc  wrote:

> Chengbin Zheng wrote:
> > On Mon, Jul 20, 2009 at 6:41 PM, Aryeh Gregor
> > <
> simetrical%2bwikil...@gmail.com >
> >> wrote:
> >
> >> . . . I should mention, also, that I believe the one in charge of
> >> dumps is Tomasz Finc.  You may want to ask him about whether there are
> >> plans to resume the static HTML dumps.
> >>
> >> ___
> >> Wikitech-l mailing list
> >> Wikitech-l@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >>
> >
> >
> > I tried through Wikipedia mail, and I can't reach him.
>
> Looks like either my mail client ate them or those mails never arrived.
>
> I've exchanged mails with Tim Starling(original author/maintainer) of
> static.wikipedia.org to gauge the level of support and work required to
> have these running again. It certainly seems doable but I'm not going to
> commit to having them in place until the full en history snapshot works.
> Thinking post Wikimania 2009 (end of August) here for specking the
> return of these to a more maintainable state.
>
> --tomasz
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

Thank you for dropping by and sharing this information with us Tomasz!

It is good just knowing that it is in the queue. Have you considered making
a version of static HTML Wikipedia where there are no user talk and
discussion pages that eating up half the space (like the 5GB XML dump for
English Wikipedia)? As in the previous E-Mail, it is impossible to delete
millions of pages through Windows Vista's search function (I left it
overnight, and it ended up eating 1.3GB of RAM and maxing out one of my
cores. Even deleting a single file took minutes).
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Simple way to convert XML to HTML

2009-07-20 Thread Chengbin Zheng

On Mon, Jul 20, 2009 at 8:52 PM, Aryeh Gregor

> wrote:

> On Mon, Jul 20, 2009 at 11:08 PM, Chengbin Zheng
> wrote:
> > I tried through Wikipedia mail, and I can't reach him.
> >
> > How do you use mediawiki? There are no exe files.
>
> Based on your posts here, I suspect this will be a difficult process
> for you.  Even if you had experience installing and administering web
> apps, I don't know how reliably the dumps can be imported by third
> parties these days.  If you're talking about the English Wikipedia, it
> would probably take a lot of processing time (maybe days, on a typical
> desktop?) for the dump to actually import, even if it's only the
> latest version of each page.  And even after that, I don't know how
> easy or reliable it is to export static HTML.
>
> You will definitely, at a minimum, have to use a command line, and
> probably will run into at least one difficulty that will require
> debugging.  MediaWiki is not really designed to be installed and
> administered by users who are only comfortable with GUIs.  You could
> probably install it without too much difficulty, but the documentation
> for importing the dumps and exporting the static HTML might not be too
> comprehensible.
>
> If you still want to proceed, this page has lengthy instructions on
> installation:
>
> http://www.mediawiki.org/wiki/Manual:Running_MediaWiki_on_Windows
>
> I haven't imported a dump anywhere in a long time, and I've never
> exported static HTML, so I can't really help you with those offhand.
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

Thank you for your answer.

Yes, I think it is probably a bad idea. Maybe when I take the computer
science course this year I'll get a better understanding.

But definitely, I don't like using command lines. Even in video encoding,
which I master at, I prefer using GUI (well simply because it is FAR FAR
more convenient). Even though I could use command line, it takes forever. It
took me over a year to master x264 and avisynth. Don't want to do that again
for this.

I guess I can just hope that the static HTML dumps do update. Meanwhile I
need to look for a way to efficiently delete millions of talk and discussion
files. Or better, Wikimedia making a "lite" version like the dumps so I
don't have to do it. I'm really tight on space, as I'm putting this on a
portable media player (the next Archos PMP, as the Archos 5 I have only have
250GB)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Simple way to convert XML to HTML

2009-07-20 Thread Chengbin Zheng

On Mon, Jul 20, 2009 at 6:41 PM, Aryeh Gregor

> wrote:

> . . . I should mention, also, that I believe the one in charge of
> dumps is Tomasz Finc.  You may want to ask him about whether there are
> plans to resume the static HTML dumps.
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

I tried through Wikipedia mail, and I can't reach him.

How do you use mediawiki? There are no exe files.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Simple way to convert XML to HTML

2009-07-20 Thread Chengbin Zheng

It seems that reply doesn't work. So I'll send a new message.
Since the static HTML Wikipedia is not updating (please update), and XML
updates like everyday, the logical choice is to go with XML. Is there any
way to convert XML to HTML, like the static HTML version? I need it in HTML,
and I don't want a one year old version of Wikipedia, with all the useless
information on user talk, discussions, etc.

I don't have mad computer skills like most of you. I need a simple way
(preferably a GUI) to convert XML to HTML. Also, how does the converted XML
look like compared to the real Wikipedia? I've use Bzreader to open it, and
it looks TERRIBLE, without any skin or format organization. Please tell me
the converted XML won't look like this, and looks like the Wikipedia
website.

If the static HTML Wikipedia does update at some time, what are your
preferred method of deleting the user talk, discussion, etc pages? I tried
using Vista's search function and delete all of them with the name "user",
etc. But Vista doesn't like deleting millions of files. Even deleting 1 file
takes minutes (probably due to the sheer number of folders). Is there like a
program that can delete more efficiently? Or a program that deletes while
searching (like finds a page, delete it, move on to search for the next
file).

Thank you SOO much.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Convert XML to HTML?

2009-07-19 Thread Chengbin Zheng

On Sun, Jul 19, 2009 at 5:30 AM, Alexandre Dulaunoy  wrote:

> On Sun, Jul 19, 2009 at 5:23 AM, Chengbin Zheng
> wrote:
> > Since the static HTML Wikipedia is not updating (please update), and XML
> > updates like everyday, the logical choice is to go with XML. Is there any
> > way to convert XML to HTML, like the static HTML version? I need it in
> HTML,
> > and I don't want a one year old version of Wikipedia, with all the
> useless
> > information on user talk, discussions, etc.
> > Thank you.
>
> There are plenty of options to parse the XML (or just the Mediawiki
> markup) to HTML like :
>
> - http://sourceforge.net/apps/mediawiki/wikiprep/index.php?title=Main_Page
> (the parser is decent but currently
> no real full featured HTML export)
>
> - http://wiki.laptop.org/go/Wiki_Slice (but not using XML as source,
> just stripping down output using ?action=raw)
>
> - https://projects.fslab.de/projects/wpofflineclient/wiki/Specifications
> (but also using the raw action)
>
> (a nice article of how to a static version of Wikipedia :
> http://users.softlab.ece.ntua.gr/~ttsiod/buildWikipediaOffline.html)
>
> There is a also a nice list of all the parser available (usually from
> the Mediawiki markup
> to something else) :
>
> http://www.mediawiki.org/wiki/Alternative_parsers
>
> Regarding the XML format, usually you want to seek into the XML and
> look for start of
>  and the end of  to get the page and look for the 
> element containing
> the raw page in mediawiki markup format. So you can use all the
> existing mediawiki
> markup parser as long you have extract the latest revision of the page
> in mediawiki format.
>
> Hope this helps,
>
> adulau
>
> --
> --   Alexandre Dulaunoy (adulau) -- http://www.foo.be/
> -- http://www.foo.be/cgi-bin/wiki.pl/Diary
> -- "Knowledge can create problems, it is not through ignorance
> --that we can solve them" Isaac Asimov
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

Hi Alexandre

Thank you so much for your response! Do you have a method (or preferably a
GUI) that doesn't take insane computer skills to convert XML to HTML? I am
clueless of how to do this. I'm simply a 15 year old student that want a
copy of Wikipedia on my Archos 5. I don't have the time to learn it.
Thanks.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Convert XML to HTML?

2009-07-18 Thread Chengbin Zheng

Since the static HTML Wikipedia is not updating (please update), and XML
updates like everyday, the logical choice is to go with XML. Is there any
way to convert XML to HTML, like the static HTML version? I need it in HTML,
and I don't want a one year old version of Wikipedia, with all the useless
information on user talk, discussions, etc.
Thank you.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] What happened to static Wikipedia page?

Re: [Wikitech-l] Any news to update static HTML Wikipedia?

Re: [Wikitech-l] Any news to update static HTML Wikipedia?

Re: [Wikitech-l] Any news to update static HTML Wikipedia?

[Wikitech-l] Any news to update static HTML Wikipedia?

Re: [Wikitech-l] Simple way to convert XML to HTML

Re: [Wikitech-l] Simple way to convert XML to HTML

Re: [Wikitech-l] Simple way to convert XML to HTML

Re: [Wikitech-l] Simple way to convert XML to HTML

Re: [Wikitech-l] Simple way to convert XML to HTML

Re: [Wikitech-l] Simple way to convert XML to HTML

Re: [Wikitech-l] Simple way to convert XML to HTML

Re: [Wikitech-l] Simple way to convert XML to HTML

Re: [Wikitech-l] Simple way to convert XML to HTML

Re: [Wikitech-l] Simple way to convert XML to HTML

Re: [Wikitech-l] Simple way to convert XML to HTML

Re: [Wikitech-l] Simple way to convert XML to HTML

Re: [Wikitech-l] Simple way to convert XML to HTML

Re: [Wikitech-l] Simple way to convert XML to HTML

Re: [Wikitech-l] Simple way to convert XML to HTML

[Wikitech-l] Simple way to convert XML to HTML

Re: [Wikitech-l] Convert XML to HTML?

[Wikitech-l] Convert XML to HTML?

23 matches

Site Navigation

Mail list logo

Footer information