Re: [Wikitech-l] Wikitech-l Digest, Vol 98, Issue 30

2011-09-19 Thread Jamie Morken
Hi, It might be good to keep a private hash in parallel with the MD5 public hash. cheers, Jamie - Original Message - From: wikitech-l-requ...@lists.wikimedia.org Date: Sunday, September 18, 2011 3:12 pm Subject: Wikitech-l Digest, Vol 98, Issue 30 To: wikitech-l@lists.wikimedia.org >

Re: [Wikitech-l] [Xmldatadumps-l] March 17 en wikipedia history bz2 files ready

2011-03-30 Thread Jamie Morken
Date: Tuesday, March 29, 2011 11:43 pm Subject: Re: [Xmldatadumps-l] March 17 en wikipedia history bz2 files ready To: Jamie Morken Cc: xmldatadump...@lists.wikimedia.org, wikitech-l@lists.wikimedia.org > The individually numbered files change sizes radically because I'm > moving around

Re: [Wikitech-l] [Xmldatadumps-l] March 17 en wikipedia history bz2 files ready

2011-03-29 Thread Jamie Morken
- Original Message - From: Brian J Mingus Date: Tuesday, March 29, 2011 7:15 pm Subject: Re: [Wikitech-l] [Xmldatadumps-l] March 17 en wikipedia history bz2 files ready To: Wikimedia developers Cc: Jamie Morken , "Ariel T. Glenn" , xmldatadump...@lists.wikimedia.org   >

Re: [Wikitech-l] [Xmldatadumps-l] March 17 en wikipedia history bz2 files ready

2011-03-29 Thread Jamie Morken
- Original Message - From: Brian J Mingus Date: Tuesday, March 29, 2011 7:15 pm Subject: Re: [Wikitech-l] [Xmldatadumps-l] March 17 en wikipedia history bz2 files ready To: Wikimedia developers Cc: Jamie Morken , "Ariel T. Glenn" , xmldatadump...@lists.wik

Re: [Wikitech-l] [Xmldatadumps-l] March 17 en wikipedia history bz2 files ready

2011-03-29 Thread Jamie Morken
Hi all, Congrats Ariel! :)  The sum of pages-meta-history files for the last two enwiki dumps are 342.7GB for the 20110115 dump and 353.5GB for the 20110317 dump, which shows that the overall dump size grew over 2 months.  Seven of the individually numbered pages-meta-history files reduced i

Re: [Wikitech-l] [Xmldatadumps-l] upcoming 1.17 deployment and the xml dumps

2011-02-09 Thread Jamie Morken
Hi Ariel, I don't really understand why the dumps need to be halted as I thought the mediawiki code and database dump code were basically two separate entities already*.  I guess the 1.17 branch code changes the structure of the database causing potential errors in the database dump?  I also d

Re: [Wikitech-l] [Xmldatadumps-l] dataset1, xml dumps

2010-12-24 Thread Jamie Morken
Hi, That is great news, that you for all the hard work you have done on this and most of all Seasons Greetings, Merry Christmas, and Happy New Year! :) best regards, Jamie - Original Message - From: "Ariel T. Glenn" Date: Friday, December 24, 2010 10:42 am Subject: Re: [Xmldatadumps-

Re: [Wikitech-l] [Announce]: Mark Bergsma promotion to Operations Engineer Programs Manager

2010-09-15 Thread Jamie Morken
congratulations!  :)  :)  I am in awe at Wikimedia foundation and volunteers, and I am also truly ashamed that I expressed doubt earlier about the intentions of the Wikimedia foundation. cheers, Jamie - Original Message - From: Huib Laurens Date: Wednesday, September 15, 2010 2:42 a

Re: [Wikitech-l] Community vs. centralized development

2010-09-13 Thread Jamie Morken
Hi Tomasz, That is great news, congrats!  I am happy they are spending time to archive wikis, I also am praying that my post ends up in the correct thread. cheers, Jamie - Original Message - From: Tomasz Finc Date: Monday, September 13, 2010 3:09 pm Subject: Re: [Wikitech-l] Community

[Wikitech-l] WikiXMLArticleIndexer

2010-09-13 Thread Jamie Morken
Hi all, We have a beta version of the code for reading the XML dump and extracting the article names with their associated images.  It is in the yahoo group wikishare files section folder "WikiXMLArticleIndexer".  Also uploaded to: http://nekrom.com/red79/WikiXMLArticleIndexer.zip It uses a zip

Re: [Wikitech-l] Wikitech-l Digest, Vol 86, Issue 32

2010-09-13 Thread Jamie Morken
-From  Ryan Kaldari Date  Mon, 13 Sep 2010 10:44:54 -0700 To  wikitech-l@lists.wikimedia.org Subject  Re: [Wikitech-l] Community vs. centralized development On 9/11/10 2:48 PM, Jamie Morken wrote: > Doing the same on my log of the secret channel gives 100903 00:03:40, meaning it has roughly

Re: [Wikitech-l] Community vs. centralized development

2010-09-11 Thread Jamie Morken
On 9/8/2010 10:18 AM, Aryeh Gregor wrote: Well, this is probably my last post on this subject for now.  I think I've made my points.  Those who don't get them yet probably will continue not to get them, and those who get them but disagree probably will continue to disagree.  It looks like not

[Wikitech-l] image dump status update1

2010-09-10 Thread Jamie Morken
Hi, I did some "testing" on Domas' pagecounts log files: original file: pagecounts-20100910-04.gz downloaded from: http://dammit.lt/wikistats/ the original file "pagecounts-20100910-04.gz" was parsed to remove all lines except those beginning with "en File".  This shows what files wer

[Wikitech-l] list of things to do for image dumps

2010-09-09 Thread Jamie Morken
Hi all, This is a preliminary list of what needs to be done to generate images dumps.  If anyone can help with #2 to provide the access log of image usage stats please send me an email! 1. run wikix to generate list of images for a given wiki ie. enwiki 2. sort the image list based on usage fr

[Wikitech-l] Community vs. centralized development

2010-09-08 Thread Jamie Morken
Hi, I created a yahoo group for people interested in continuing the discussion on "Community vs. centralized development" as well as up to date wiki backups.  Please join if you want to help to keep the Wikimedia foundation part of the community or just like chatting about it!  Here is the grou

[Wikitech-l] Community vs. centralized development

2010-09-08 Thread Jamie Morken
Hi, I was involved in an open source project that was usurped by one of the main developers for the sole reason of making money, and that project continues now to take advantage of the community to increase the profit of that developer.  I never would have thought such a thing was possible unti

[Wikitech-l] Community vs. centralized development

2010-09-06 Thread Jamie Morken
> > So it sounds like > respect is also centralized in the wikimedia foundation, please > include > me in your email to your underlings Tim, as I would also like to > have > respect, maybe it will mean my request for image dumps is taken > seriously!? It would be nice if respect was earned,

[Wikitech-l] Community vs. centralized development

2010-09-05 Thread Jamie Morken
Hi, > > > I can say that despite being a nobody at Mozilla and having gotten > > only one (rather trivial) patch accepted, I feel like I'm > taken more > > seriously by most of their paid developers than by most of > ours. > > I'm sorry to hear that, and I'd like to know (off list) which pai

[Wikitech-l] Community vs. centralized development

2010-09-05 Thread Jamie Morken
Hi, > > What do you mean by "opening"? > enwiki pages-meta-history is hard due to its size, not because > Ariel or > Tomasz being more stupid than any volunteer. > I trust them to do it at least as well as a volunteer would. > Of course, if you can perform better I'm all for giving you a > she

[Wikitech-l] Community vs. centralized development

2010-09-05 Thread Jamie Morken
Hi, I think it would be a nice gesture if the wikimedia foundation decentralized some of the internal projects that have had little success over the last few years.  Two that come to mind are the enwiki pages-meta-history file creation (1 successful dump in ~3 years), and apparently very littl

Re: [Wikitech-l] [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D

2010-03-17 Thread Jamie Morken
=flowed Jamie Morken wrote: > Also I wonder if it is possible to convert from 7z to bz2 without having >   to make the 5469GB file first?  If this can be done then having only 7z >   files would be fine, as the bz2 file could be created with a "normal" > PC (ie one without a

Re: [Wikitech-l] [Xmldatadumps-admin-l] 2010-03-11 01:10:08: enwiki Checksumming pages-meta-history.xml.bz2 :D

2010-03-16 Thread Jamie Morken
Hi, I think we should keep at least one version of a recent bz2 enwiki pages-meta-history file because there are already some programs that use the bz2 format directly, and I don't know of any program that uses the 7z format natively. heres some offline wiki readers that use the bz2 forma

Re: [Wikitech-l] enwiki complete page edit history

2010-03-05 Thread Jamie Morken
1; format=flowed The last successful was way before 2009 and sadly doesn't exist on the wikimedia servers. Trust me .. as soon as this runs is done were going to stamp it, copy it, put it into a safe, and mirror it everywhere. We're not going to let that file get away. --tomasz Jamie

[Wikitech-l] enwiki complete page edit history

2010-02-24 Thread Jamie Morken
This is a repost, Tomasz please get back to me about this. cheers, Jamie > Date: Fri, 19 Feb 2010 18:25:50 +0100 > From: Tomasz Finc > Subject: Re: [Wikitech-l] enwiki complete page edit history > Do you mean that the failed runs aren't web linked? If so then > I'd > rather not point people

Re: [Wikitech-l] Wikitech-l Digest, Vol 79, Issue 34

2010-02-20 Thread Jamie Morken
  > Date: Fri, 19 Feb 2010 18:25:50 +0100 > From: Tomasz Finc > Subject: Re: [Wikitech-l] enwiki complete page edit history > To: Wikimedia developers > Message-ID: <4b7ec99e.4040...@wikimedia.org> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > > > The pages-meta-history

[Wikitech-l] enwiki dump problems

2010-02-19 Thread Jamie Morken
Hi, There hasn't been a successful pages-meta-history.xml.bz2 or pages-meta-history.xml.7z dump from the http://download.wikimedia.org/enwiki/ site in the last 5 dumps.  How is the new dump system coming along for these large wiki files?  I personally am a bit concerned that these files haven't

Re: [Wikitech-l] enwiki complete page edit history

2010-02-17 Thread Jamie Morken
ked for download lately I noticed.  I think they should be as they contain the full wikipedia history/discussion pages which have humongous amounts of useful information that should be available for easy distribution.  What is the reason they aren't weblinked, the bandwidth costs? chee

Re: [Wikitech-l] User-Agent:

2010-02-17 Thread Jamie Morken
Hi, > > Well, Google's translate service is an example of exactly what > they were > *trying* to block, people hotloading Wikipedia for fun and profit. > I am sure that the intentions are good for what they are doing, people just want to protect wikipedia (including me).  I am most interested i

Re: [Wikitech-l] User-Agent:

2010-02-16 Thread Jamie Morken
Message: 7 Date: Wed, 17 Feb 2010 13:47:47 +1100 From: John Vandenberg Subject: Re: [Wikitech-l] User-Agent: To: Wikimedia developers Message-ID: Content-Type: text/plain; charset=ISO-8859-1 On Wed, Feb 17, 2010 at 1:00 PM, Anthony wrote: > On Wed, Feb 17, 2010 at 11:57 AM, Domas Mit

Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-16 Thread Jamie Morken
Date: Tue, 16 Feb 2010 09:34:41 -0800 From: Brion Vibber Subject: Re: [Wikitech-l] [mwdumper] new maintainer? To: wikitech-l@lists.wikimedia.org Message-ID: Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 2/16/10 7:03 AM, Jamie Morken wrote: > Ok, the simple question: how m

Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-16 Thread Jamie Morken
> * Your are DB neutral, so you do not need to have a version for > mysql, for postgres... The mysql can be converted to postgres and vice versa directly, so the xml isn't necessary for this, see: http://www.mediawiki.org/wiki/Manual:PostgreSQL > * You may apply filter easily > * The XML is

Re: [Wikitech-l] [mwdumper] new maintainer?

2010-02-16 Thread Jamie Morken
Hi, What is the benefit of the database dumps being archived/distributed in xml format instead of sql format?  Converting the xml to sql takes a long time for big wiki's and people seem to have problems with this step, so why isn't the sql format available for download instead of the xml forma

[Wikitech-l] enwiki complete page edit history

2010-02-15 Thread Jamie Morken
Hi, I was looking at the enwiki dump progress and noticed the file size for the enwiki pages-meta-history.xml.bz2 has decreased from 255GB on 20100125 down to 105GB on 20100203.  Is it possible that old page revision edit data is being lost due to the smaller archive file size? 2009-12-03 12:53:

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Jamie Morken
Hello, > Is the bandwidth used really a big problem? Bandwidth is pretty > cheap > these days, and given Wikipedia's total draw, I suspect the > occasional > dump download isn't much of a problem. I am not sure about the cost of the bandwidth, but the wikipedia image dumps are no longer ava

[Wikitech-l] downloading wikipedia database dumps

2010-01-07 Thread Jamie Morken
that wikipedia's bandwidth costs would be reduced.  I think it is important that wikipedia can be downloaded for using it offline now and in the future for people. best regards, Jamie Morken ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia