Hi,
It might be good to keep a private hash in parallel with the MD5 public hash.
cheers,
Jamie
- Original Message -
From: wikitech-l-requ...@lists.wikimedia.org
Date: Sunday, September 18, 2011 3:12 pm
Subject: Wikitech-l Digest, Vol 98, Issue 30
To: wikitech-l@lists.wikimedia.org
>
Date: Tuesday, March 29, 2011 11:43 pm
Subject: Re: [Xmldatadumps-l] March 17 en wikipedia history bz2 files ready
To: Jamie Morken
Cc: xmldatadump...@lists.wikimedia.org, wikitech-l@lists.wikimedia.org
> The individually numbered files change sizes radically because I'm
> moving around
- Original Message -
From: Brian J Mingus
Date: Tuesday, March 29, 2011 7:15 pm
Subject: Re: [Wikitech-l] [Xmldatadumps-l] March 17 en wikipedia history bz2
files ready
To: Wikimedia developers
Cc: Jamie Morken , "Ariel T. Glenn" ,
xmldatadump...@lists.wikimedia.org
>
- Original Message -
From: Brian J Mingus
Date: Tuesday, March 29, 2011 7:15 pm
Subject: Re: [Wikitech-l] [Xmldatadumps-l] March 17 en wikipedia history bz2
files ready
To: Wikimedia developers
Cc: Jamie Morken , "Ariel T. Glenn" ,
xmldatadump...@lists.wik
Hi all,
Congrats Ariel! :) The sum of pages-meta-history files for the last two
enwiki dumps are 342.7GB for the 20110115 dump and 353.5GB for the
20110317 dump, which shows that the overall dump size grew over 2
months. Seven of the individually numbered pages-meta-history files
reduced i
Hi Ariel,
I don't really understand why the dumps need to be halted as I thought the
mediawiki code and database dump code were basically two separate entities
already*. I guess the 1.17 branch code changes the structure of the database
causing potential errors in the database dump? I also d
Hi,
That is great news, that you for all the hard work you have done on this and
most of all Seasons Greetings, Merry Christmas, and Happy New Year! :)
best regards,
Jamie
- Original Message -
From: "Ariel T. Glenn"
Date: Friday, December 24, 2010 10:42 am
Subject: Re: [Xmldatadumps-
congratulations! :) :) I am in awe at Wikimedia foundation and volunteers,
and I am also truly ashamed that I expressed doubt earlier about the intentions
of the Wikimedia foundation.
cheers,
Jamie
- Original Message -
From: Huib Laurens
Date: Wednesday, September 15, 2010 2:42 a
Hi Tomasz,
That is great news, congrats! I am happy they are spending time to archive
wikis, I also am praying that my post ends up in the correct thread.
cheers,
Jamie
- Original Message -
From: Tomasz Finc
Date: Monday, September 13, 2010 3:09 pm
Subject: Re: [Wikitech-l] Community
Hi all,
We have a beta version of the code for reading the XML dump and
extracting the article names with their associated images. It is in
the yahoo group wikishare files section folder
"WikiXMLArticleIndexer". Also uploaded to:
http://nekrom.com/red79/WikiXMLArticleIndexer.zip
It uses a zip
-From
Ryan Kaldari
Date
Mon, 13 Sep 2010 10:44:54 -0700
To
wikitech-l@lists.wikimedia.org
Subject
Re: [Wikitech-l] Community vs. centralized development
On 9/11/10 2:48 PM, Jamie Morken wrote:
>
Doing the same on my log of the secret channel gives 100903 00:03:40,
meaning it has roughly
On 9/8/2010 10:18 AM, Aryeh Gregor wrote:
Well, this is
probably my last post on this subject for now. I think
I've made my points. Those who don't get them yet probably will
continue not to get them, and those who get them but disagree
probably
will continue to disagree. It looks like not
Hi,
I did some "testing" on Domas' pagecounts log files:
original file: pagecounts-20100910-04.gz downloaded from:
http://dammit.lt/wikistats/
the original file "pagecounts-20100910-04.gz" was parsed to remove all
lines except those
beginning with "en File". This shows what files wer
Hi all,
This is a preliminary list of what needs to be done to generate images dumps.
If anyone can help with #2 to provide the access log of image usage stats
please send me an email!
1. run wikix to generate list of images for a given wiki ie. enwiki
2. sort the image list based on usage fr
Hi,
I created a yahoo group for people interested in continuing the discussion on
"Community vs. centralized development" as well as up to date wiki backups.
Please join if you want to help to keep the Wikimedia foundation part of the
community or just like chatting about it! Here is the grou
Hi,
I was involved in an open source project that was usurped by one of the main
developers for the sole reason of making money, and that project continues now
to take advantage of the community to increase the profit of that developer. I
never would have thought such a thing was possible unti
> > So it sounds like
> respect is also centralized in the wikimedia foundation, please
> include
> me in your email to your underlings Tim, as I would also like to
> have
> respect, maybe it will mean my request for image dumps is taken
> seriously!? It would be nice if respect was earned,
Hi,
>
> > I can say that despite being a nobody at Mozilla and having gotten
> > only one (rather trivial) patch accepted, I feel like I'm
> taken more
> > seriously by most of their paid developers than by most of
> ours.
>
> I'm sorry to hear that, and I'd like to know (off list) which pai
Hi,
>
> What do you mean by "opening"?
> enwiki pages-meta-history is hard due to its size, not because
> Ariel or
> Tomasz being more stupid than any volunteer.
> I trust them to do it at least as well as a volunteer would.
> Of course, if you can perform better I'm all for giving you a
> she
Hi,
I think it would be a nice gesture if the wikimedia foundation decentralized
some of the internal projects that have had little success over the last few
years. Two that come to mind are the enwiki pages-meta-history file creation
(1 successful dump in ~3 years), and apparently very littl
=flowed
Jamie Morken wrote:
>
Also I wonder if it is possible to convert from 7z to bz2 without having
>
to make the 5469GB file first? If this can be done then having only 7z
>
files would be fine, as the bz2 file could be created with a "normal"
>
PC (ie one without a
Hi,
I think we should keep at least one version of a recent bz2 enwiki
pages-meta-history file because there are already some programs that use
the bz2 format directly, and I don't know of any program that uses the
7z format natively.
heres some offline wiki readers that use the bz2 forma
1; format=flowed
The last successful was way before 2009 and sadly doesn't exist on the
wikimedia servers.
Trust me .. as soon as this runs is done were going to stamp it, copy
it, put it into a safe, and mirror it everywhere.
We're not going to let that file get away.
--tomasz
Jamie
This is a repost, Tomasz please get back to me about this.
cheers,
Jamie
> Date: Fri, 19 Feb 2010 18:25:50 +0100
> From: Tomasz Finc
> Subject: Re: [Wikitech-l] enwiki complete page edit history
> Do you mean that the failed runs aren't web linked? If so then
> I'd
> rather not point people
> Date: Fri, 19 Feb 2010 18:25:50 +0100
> From: Tomasz Finc
> Subject: Re: [Wikitech-l] enwiki complete page edit history
> To: Wikimedia developers
> Message-ID: <4b7ec99e.4040...@wikimedia.org>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> >
> > The pages-meta-history
Hi,
There hasn't been a successful pages-meta-history.xml.bz2 or
pages-meta-history.xml.7z dump from the http://download.wikimedia.org/enwiki/
site in the last 5 dumps. How is the new dump system coming along for these
large wiki files? I personally am a bit concerned that these files haven't
ked for download lately I noticed. I think
they should be as they contain the full wikipedia history/discussion pages
which have humongous amounts of useful information that should be available for
easy distribution. What is the reason they aren't weblinked, the bandwidth
costs?
chee
Hi,
>
> Well, Google's translate service is an example of exactly what
> they were
> *trying* to block, people hotloading Wikipedia for fun and profit.
>
I
am sure that the intentions are good for what they are doing, people
just want to protect wikipedia (including me). I am most interested i
Message: 7
Date: Wed, 17 Feb 2010 13:47:47 +1100
From: John Vandenberg
Subject: Re: [Wikitech-l] User-Agent:
To: Wikimedia developers
Message-ID:
Content-Type: text/plain; charset=ISO-8859-1
On Wed, Feb 17, 2010 at 1:00 PM, Anthony wrote:
> On Wed, Feb 17, 2010 at 11:57 AM, Domas Mit
Date: Tue, 16 Feb 2010 09:34:41 -0800
From: Brion Vibber
Subject: Re: [Wikitech-l] [mwdumper] new maintainer?
To: wikitech-l@lists.wikimedia.org
Message-ID:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 2/16/10 7:03 AM, Jamie Morken wrote:
> Ok, the simple question: how m
> * Your are DB neutral, so you do not need to have a version for
> mysql, for postgres...
The mysql can be converted to postgres and vice versa directly, so the xml
isn't necessary for this, see:
http://www.mediawiki.org/wiki/Manual:PostgreSQL
> * You may apply filter easily
> * The XML is
Hi,
What is the benefit of the database dumps being archived/distributed in xml
format instead of sql format? Converting the xml to sql takes a long time for
big wiki's and people seem to have problems with this step, so why isn't the
sql format available for download instead of the xml forma
Hi,
I was looking at the enwiki dump progress and noticed the file size for the
enwiki pages-meta-history.xml.bz2 has decreased
from 255GB on 20100125 down to 105GB on 20100203. Is it possible that
old page revision edit data is being lost due to the smaller archive file
size?
2009-12-03 12:53:
Hello,
> Is the bandwidth used really a big problem? Bandwidth is pretty
> cheap
> these days, and given Wikipedia's total draw, I suspect the
> occasional
> dump download isn't much of a problem.
I am not sure about the cost of the bandwidth, but the wikipedia image dumps
are no longer ava
that
wikipedia's bandwidth costs would be reduced. I think it is important
that wikipedia can be downloaded for using it offline now and in the
future for people.
best regards,
Jamie Morken
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia
35 matches
Mail list logo