Re: [Wikitech-l] How to find the version of a dump

2010-12-17 Thread Monica shu
Finally

Thank you all a lot

Monica

On Thu, Dec 16, 2010 at 11:50 PM, emijrp emi...@gmail.com wrote:

 Hi Monica;

 You dump is this one, with date 2010-03-12:[1][2]

 a3a5ee062abc16a79d111273d4a1a99a  enwiki-20100312-pages-articles.xml.bz2

 There are some old English Wikipedia dumps and md5sum files in a directory
 called archive[3].

 Regards,
 emijrp

 [1]

 http://download.wikimedia.org/archive/enwiki/20100312/enwiki-20100312-md5sums.txt
 [2] http://download.wikimedia.org/archive/enwiki/20100312/
 [3] http://download.wikimedia.org/archive/

 2010/12/14 Monica shu monicashu...@gmail.com

  Hi emijrp,
 
  Here is my dump's info:
 
  *enwiki-latest-pages-articles.xml.bz2 *
  *a3a5ee062abc16a79d111273d4a1a99a*
 
  Thanks~
 
  On Mon, Dec 13, 2010 at 10:00 PM, emijrp emi...@gmail.com wrote:
 
   Hi;
  
   It would be better if you can give us the md5sum of the file. If you
 are
  on
   Linux, use the command md5sum filename (you have to install it with
   apt-get). If you are on Windows search for a tutorial.
  
   Also, the file size and the project language and family (wikipedia,
   wiktionary...) would be nice.
  
   Regards,
   emijrp
  
   2010/12/13 Monica shu monicashu...@gmail.com
  
Hi all,
   
I have downloaded a dump several month ago.
By accidentally, I lost the version info of this dump, so I don't
 know
   when
this dump was generated.
Is there any place that list out info about the past dumps(such as
size...)?
   
Thanks!
   
Monica
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
   
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread emijrp
Hi James;

download.wikimedia.org is available again, so, you can download that file
from
http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-pages-articles.xml.bz26.2
GB.

Regards,
emijrp

2010/12/14 James Linden kodekr...@gmail.com

 On Mon, Dec 13, 2010 at 7:09 PM, Michael Gurlitz
 michael.gurl...@gmail.com wrote:
  I grabbed the following files in the days before the server broke, and
  I can set up a torrent file if anyone's interested, or I could FTP
  them to a server. 2010-10-11 was the last full Wikipedia dump that was
  completed.
  6652983189 (6.2GB) enwiki-20101011-pages-articles.xml.bz2

 I would very much like to get a copy of
 enwiki-20101011-pages-articles.xml.bz2 if that's possible?

 If you need a server to upload to, message me off-list and I can provide
 it.

 -- James

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread emijrp
Hi Monica;

You dump is this one, with date 2010-03-12:[1][2]

a3a5ee062abc16a79d111273d4a1a99a  enwiki-20100312-pages-articles.xml.bz2

There are some old English Wikipedia dumps and md5sum files in a directory
called archive[3].

Regards,
emijrp

[1]
http://download.wikimedia.org/archive/enwiki/20100312/enwiki-20100312-md5sums.txt
[2] http://download.wikimedia.org/archive/enwiki/20100312/
[3] http://download.wikimedia.org/archive/

2010/12/14 Monica shu monicashu...@gmail.com

 Hi emijrp,

 Here is my dump's info:

 *enwiki-latest-pages-articles.xml.bz2 *
 *a3a5ee062abc16a79d111273d4a1a99a*

 Thanks~

 On Mon, Dec 13, 2010 at 10:00 PM, emijrp emi...@gmail.com wrote:

  Hi;
 
  It would be better if you can give us the md5sum of the file. If you are
 on
  Linux, use the command md5sum filename (you have to install it with
  apt-get). If you are on Windows search for a tutorial.
 
  Also, the file size and the project language and family (wikipedia,
  wiktionary...) would be nice.
 
  Regards,
  emijrp
 
  2010/12/13 Monica shu monicashu...@gmail.com
 
   Hi all,
  
   I have downloaded a dump several month ago.
   By accidentally, I lost the version info of this dump, so I don't know
  when
   this dump was generated.
   Is there any place that list out info about the past dumps(such as
   size...)?
  
   Thanks!
  
   Monica
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread Ariel T. Glenn
The dumps in the archive are there because they are incomplete, by the
way.

Ariel

Στις 16-12-2010, ημέρα Πεμ, και ώρα 16:50 +0100, ο/η emijrp έγραψε:
 Hi Monica;
 
 You dump is this one, with date 2010-03-12:[1][2]
 
 a3a5ee062abc16a79d111273d4a1a99a  enwiki-20100312-pages-articles.xml.bz2
 
 There are some old English Wikipedia dumps and md5sum files in a directory
 called archive[3].
 
 Regards,
 emijrp
 
 [1]
 http://download.wikimedia.org/archive/enwiki/20100312/enwiki-20100312-md5sums.txt
 [2] http://download.wikimedia.org/archive/enwiki/20100312/
 [3] http://download.wikimedia.org/archive/
 
 2010/12/14 Monica shu monicashu...@gmail.com
 
  Hi emijrp,
 
  Here is my dump's info:
 
  *enwiki-latest-pages-articles.xml.bz2 *
  *a3a5ee062abc16a79d111273d4a1a99a*
 
  Thanks~
 
  On Mon, Dec 13, 2010 at 10:00 PM, emijrp emi...@gmail.com wrote:
 
   Hi;
  
   It would be better if you can give us the md5sum of the file. If you are
  on
   Linux, use the command md5sum filename (you have to install it with
   apt-get). If you are on Windows search for a tutorial.
  
   Also, the file size and the project language and family (wikipedia,
   wiktionary...) would be nice.
  
   Regards,
   emijrp
  
   2010/12/13 Monica shu monicashu...@gmail.com
  
Hi all,
   
I have downloaded a dump several month ago.
By accidentally, I lost the version info of this dump, so I don't know
   when
this dump was generated.
Is there any place that list out info about the past dumps(such as
size...)?
   
Thanks!
   
Monica
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
   
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread emijrp
All? The 2006 one too?

2010/12/16 Ariel T. Glenn ar...@wikimedia.org

 The dumps in the archive are there because they are incomplete, by the
 way.

 Ariel

 Στις 16-12-2010, ημέρα Πεμ, και ώρα 16:50 +0100, ο/η emijrp έγραψε:
  Hi Monica;
 
  You dump is this one, with date 2010-03-12:[1][2]
 
  a3a5ee062abc16a79d111273d4a1a99a  enwiki-20100312-pages-articles.xml.bz2
 
  There are some old English Wikipedia dumps and md5sum files in a
 directory
  called archive[3].
 
  Regards,
  emijrp
 
  [1]
 
 http://download.wikimedia.org/archive/enwiki/20100312/enwiki-20100312-md5sums.txt
  [2] http://download.wikimedia.org/archive/enwiki/20100312/
  [3] http://download.wikimedia.org/archive/
 
  2010/12/14 Monica shu monicashu...@gmail.com
 
   Hi emijrp,
  
   Here is my dump's info:
  
   *enwiki-latest-pages-articles.xml.bz2 *
   *a3a5ee062abc16a79d111273d4a1a99a*
  
   Thanks~
  
   On Mon, Dec 13, 2010 at 10:00 PM, emijrp emi...@gmail.com wrote:
  
Hi;
   
It would be better if you can give us the md5sum of the file. If you
 are
   on
Linux, use the command md5sum filename (you have to install it with
apt-get). If you are on Windows search for a tutorial.
   
Also, the file size and the project language and family (wikipedia,
wiktionary...) would be nice.
   
Regards,
emijrp
   
2010/12/13 Monica shu monicashu...@gmail.com
   
 Hi all,

 I have downloaded a dump several month ago.
 By accidentally, I lost the version info of this dump, so I don't
 know
when
 this dump was generated.
 Is there any place that list out info about the past dumps(such as
 size...)?

 Thanks!

 Monica
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
   
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l



 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread Ariel T. Glenn
I have no idea about the 2006 one; the other ones I know to be
incomplete one way or another.  Working with the Jan and March 2010 run,
in conjunction with the earlier dumps, you can get complete info, see
http://techblog.wikimedia.org/2010/05/
In addition the September 2010 run 
http://dumps.wikimedia.org/enwiki/20100904/
is complete, though the 7z files for that are not available. 

We have various dumps from even earlier around; when we are in better
shape as far as dataset2, the new server yet to be set up, then these
will be made available to the public.

In related news the new server has arrived and we are waiting for the
arrays to be put together and shipped!

Ariel


Στις 16-12-2010, ημέρα Πεμ, και ώρα 17:06 +0100, ο/η emijrp έγραψε:
 All? The 2006 one too?
 
 2010/12/16 Ariel T. Glenn ar...@wikimedia.org
 
  The dumps in the archive are there because they are incomplete, by the
  way.
 
  Ariel
 
  Στις 16-12-2010, ημέρα Πεμ, και ώρα 16:50 +0100, ο/η emijrp έγραψε:
   Hi Monica;
  
   You dump is this one, with date 2010-03-12:[1][2]
  
   a3a5ee062abc16a79d111273d4a1a99a  enwiki-20100312-pages-articles.xml.bz2
  
   There are some old English Wikipedia dumps and md5sum files in a
  directory
   called archive[3].
  
   Regards,
   emijrp
  
   [1]
  
  http://download.wikimedia.org/archive/enwiki/20100312/enwiki-20100312-md5sums.txt
   [2] http://download.wikimedia.org/archive/enwiki/20100312/
   [3] http://download.wikimedia.org/archive/
  
   2010/12/14 Monica shu monicashu...@gmail.com
  
Hi emijrp,
   
Here is my dump's info:
   
*enwiki-latest-pages-articles.xml.bz2 *
*a3a5ee062abc16a79d111273d4a1a99a*
   
Thanks~
   
On Mon, Dec 13, 2010 at 10:00 PM, emijrp emi...@gmail.com wrote:
   
 Hi;

 It would be better if you can give us the md5sum of the file. If you
  are
on
 Linux, use the command md5sum filename (you have to install it with
 apt-get). If you are on Windows search for a tutorial.

 Also, the file size and the project language and family (wikipedia,
 wiktionary...) would be nice.

 Regards,
 emijrp

 2010/12/13 Monica shu monicashu...@gmail.com

  Hi all,
 
  I have downloaded a dump several month ago.
  By accidentally, I lost the version info of this dump, so I don't
  know
 when
  this dump was generated.
  Is there any place that list out info about the past dumps(such as
  size...)?
 
  Thanks!
 
  Monica
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
   
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] How to find the version of a dump

2010-12-14 Thread Andrew Dunbar
On 14 December 2010 01:57, Monica shu monicashu...@gmail.com wrote:
 Thanks Diederik and Waksman,

 It seems that I need to do parse the dump for article data to get this piece
 of information...
 Yes, this will be the last choice, but I think there maybe some easier
 way...

 I just got home and checked the dump I've downloaded.
 It's downloaded on June, 10, 2010, the size is 6117881141 in bz2.
 I remember when I download, it's the latest version at that moment.
 As the dumps are generated every N months, and the one I have is bigger that
 the version 2010-01-30 as Waksman said, my version should be between Feb to
 June.

A Google search hints that enwiki-20100312-pages-articles.xml.bz2
might be the one with size 6117881141.

Andrew Dunbar (hippietrail)


 Does anybody remember the version between this period, or happened to
 download the same version with me?

 Thanks very much to tell me any related information again!


 Best regards!
 Monica




 On Mon, Dec 13, 2010 at 3:24 PM, Shaun Waksman shaunwaks...@gmail.comwrote:

 Hi Monica,

 The file sizes of the EN pages dumps that are available today are:

 5204823166  enwiki-20100312-pages-articles.xml.7z
 5983814213  enwiki-20100130-pages-articles.xml.bz2

 Note that the former is in 7z and the later is in bz2

 Does this help?

 Shaun


 On Mon, Dec 13, 2010 at 8:45 AM, Monica shu monicashu...@gmail.com
 wrote:

  Hi all,
 
  I have downloaded a dump several month ago.
  By accidentally, I lost the version info of this dump, so I don't know
 when
  this dump was generated.
  Is there any place that list out info about the past dumps(such as
  size...)?
 
  Thanks!
 
  Monica
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-14 Thread Andrew Dunbar
On 14 December 2010 20:04, Andrew Dunbar hippytr...@gmail.com wrote:
 On 14 December 2010 01:57, Monica shu monicashu...@gmail.com wrote:
 Thanks Diederik and Waksman,

 It seems that I need to do parse the dump for article data to get this piece
 of information...
 Yes, this will be the last choice, but I think there maybe some easier
 way...

 I just got home and checked the dump I've downloaded.
 It's downloaded on June, 10, 2010, the size is 6117881141 in bz2.
 I remember when I download, it's the latest version at that moment.
 As the dumps are generated every N months, and the one I have is bigger that
 the version 2010-01-30 as Waksman said, my version should be between Feb to
 June.

 A Google search hints that enwiki-20100312-pages-articles.xml.bz2
 might be the one with size 6117881141.

 Andrew Dunbar (hippietrail)


 Does anybody remember the version between this period, or happened to
 download the same version with me?

 Thanks very much to tell me any related information again!


 Best regards!
 Monica




 On Mon, Dec 13, 2010 at 3:24 PM, Shaun Waksman shaunwaks...@gmail.comwrote:

 Hi Monica,

 The file sizes of the EN pages dumps that are available today are:

 5204823166  enwiki-20100312-pages-articles.xml.7z
 5983814213  enwiki-20100130-pages-articles.xml.bz2

 Note that the former is in 7z and the later is in bz2

 Does this help?

 Shaun


 On Mon, Dec 13, 2010 at 8:45 AM, Monica shu monicashu...@gmail.com
 wrote:

  Hi all,
 
  I have downloaded a dump several month ago.
  By accidentally, I lost the version info of this dump, so I don't know
 when
  this dump was generated.
  Is there any place that list out info about the past dumps(such as
  size...)?
 
  Thanks!
 
  Monica
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



It should be trivial to add the dump data to the header each dump
file. Since in the files themselves the date field of the filename is
often replaced by latest this could be very useful. It could also be
useful to include the revision ID and timestamp of the latest revision
but I assume this would be a little more difficult. Should I file a
feature request?

Andrew Dunbar (hippietrail)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-14 Thread James Linden
On Mon, Dec 13, 2010 at 7:09 PM, Michael Gurlitz
michael.gurl...@gmail.com wrote:
 I grabbed the following files in the days before the server broke, and
 I can set up a torrent file if anyone's interested, or I could FTP
 them to a server. 2010-10-11 was the last full Wikipedia dump that was
 completed.
 6652983189 (6.2GB) enwiki-20101011-pages-articles.xml.bz2

I would very much like to get a copy of
enwiki-20101011-pages-articles.xml.bz2 if that's possible?

If you need a server to upload to, message me off-list and I can provide it.

-- James

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-14 Thread Platonides
Monica shu wrote:
 Hi emijrp,
 
 Here is my dump's info:
 
 *enwiki-latest-pages-articles.xml.bz2 *
 *a3a5ee062abc16a79d111273d4a1a99a*
 
 Thanks~

I can't find such md5 on any dump.

Here are the md5s of the latest enwiki pages-articles:
a9506e8aedd3b830e059b7c8a3c0dbcd  enwiki-20100904-pages-articles.xml.bz2
09ae0db25ae95af53296e812bc67554b  enwiki-20100916-pages-articles.xml.bz2
7a4805475bba1599933b3acd5150bd4d  enwiki-20101011-pages-articles.xml.bz2


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-13 Thread emijrp
Hi;

It would be better if you can give us the md5sum of the file. If you are on
Linux, use the command md5sum filename (you have to install it with
apt-get). If you are on Windows search for a tutorial.

Also, the file size and the project language and family (wikipedia,
wiktionary...) would be nice.

Regards,
emijrp

2010/12/13 Monica shu monicashu...@gmail.com

 Hi all,

 I have downloaded a dump several month ago.
 By accidentally, I lost the version info of this dump, so I don't know when
 this dump was generated.
 Is there any place that list out info about the past dumps(such as
 size...)?

 Thanks!

 Monica
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-13 Thread Monica shu
Thanks Diederik and Waksman,

It seems that I need to do parse the dump for article data to get this piece
of information...
Yes, this will be the last choice, but I think there maybe some easier
way...

I just got home and checked the dump I've downloaded.
It's downloaded on June, 10, 2010, the size is 6117881141 in bz2.
I remember when I download, it's the latest version at that moment.
As the dumps are generated every N months, and the one I have is bigger that
the version 2010-01-30 as Waksman said, my version should be between Feb to
June.

Does anybody remember the version between this period, or happened to
download the same version with me?

Thanks very much to tell me any related information again!


Best regards!
Monica




On Mon, Dec 13, 2010 at 3:24 PM, Shaun Waksman shaunwaks...@gmail.comwrote:

 Hi Monica,

 The file sizes of the EN pages dumps that are available today are:

 5204823166  enwiki-20100312-pages-articles.xml.7z
 5983814213  enwiki-20100130-pages-articles.xml.bz2

 Note that the former is in 7z and the later is in bz2

 Does this help?

 Shaun


 On Mon, Dec 13, 2010 at 8:45 AM, Monica shu monicashu...@gmail.com
 wrote:

  Hi all,
 
  I have downloaded a dump several month ago.
  By accidentally, I lost the version info of this dump, so I don't know
 when
  this dump was generated.
  Is there any place that list out info about the past dumps(such as
  size...)?
 
  Thanks!
 
  Monica
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-13 Thread Michael Gurlitz
I grabbed the following files in the days before the server broke, and
I can set up a torrent file if anyone's interested, or I could FTP
them to a server. 2010-10-11 was the last full Wikipedia dump that was
completed.
6652983189 (6.2GB) enwiki-20101011-pages-articles.xml.bz2
12823734687 (12GB) enwiki-20101011-pages-meta-current.xml.bz2
146433984 (140MB) enwikiquote-20101012-pages-meta-history.xml.7z

On Mon, Dec 13, 2010 at 9:57 AM, Monica shu monicashu...@gmail.com wrote:
 Thanks Diederik and Waksman,

 It seems that I need to do parse the dump for article data to get this piece
 of information...
 Yes, this will be the last choice, but I think there maybe some easier
 way...

 I just got home and checked the dump I've downloaded.
 It's downloaded on June, 10, 2010, the size is 6117881141 in bz2.
 I remember when I download, it's the latest version at that moment.
 As the dumps are generated every N months, and the one I have is bigger that
 the version 2010-01-30 as Waksman said, my version should be between Feb to
 June.

 Does anybody remember the version between this period, or happened to
 download the same version with me?

 Thanks very much to tell me any related information again!


 Best regards!
 Monica




 On Mon, Dec 13, 2010 at 3:24 PM, Shaun Waksman shaunwaks...@gmail.comwrote:

 Hi Monica,

 The file sizes of the EN pages dumps that are available today are:

 5204823166  enwiki-20100312-pages-articles.xml.7z
 5983814213  enwiki-20100130-pages-articles.xml.bz2

 Note that the former is in 7z and the later is in bz2

 Does this help?

 Shaun


 On Mon, Dec 13, 2010 at 8:45 AM, Monica shu monicashu...@gmail.com
 wrote:

  Hi all,
 
  I have downloaded a dump several month ago.
  By accidentally, I lost the version info of this dump, so I don't know
 when
  this dump was generated.
  Is there any place that list out info about the past dumps(such as
  size...)?
 
  Thanks!
 
  Monica
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-13 Thread Monica shu
Hi emijrp,

Here is my dump's info:

*enwiki-latest-pages-articles.xml.bz2 *
*a3a5ee062abc16a79d111273d4a1a99a*

Thanks~

On Mon, Dec 13, 2010 at 10:00 PM, emijrp emi...@gmail.com wrote:

 Hi;

 It would be better if you can give us the md5sum of the file. If you are on
 Linux, use the command md5sum filename (you have to install it with
 apt-get). If you are on Windows search for a tutorial.

 Also, the file size and the project language and family (wikipedia,
 wiktionary...) would be nice.

 Regards,
 emijrp

 2010/12/13 Monica shu monicashu...@gmail.com

  Hi all,
 
  I have downloaded a dump several month ago.
  By accidentally, I lost the version info of this dump, so I don't know
 when
  this dump was generated.
  Is there any place that list out info about the past dumps(such as
  size...)?
 
  Thanks!
 
  Monica
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-12 Thread Diederik van Liere
Hi Monica, 

I don't think there is such a place, what you could do is parse the file and 
look for the date of the most recent edit. That will give you a fairly accurate 
estimate of the date that the dump was generated.

Best,

Diederik
On 2010-12-12, at 10:45 PM, Monica shu wrote:

 Hi all,
 
 I have downloaded a dump several month ago.
 By accidentally, I lost the version info of this dump, so I don't know when
 this dump was generated.
 Is there any place that list out info about the past dumps(such as size...)?
 
 Thanks!
 
 Monica
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-12 Thread Shaun Waksman
Hi Monica,

The file sizes of the EN pages dumps that are available today are:

5204823166  enwiki-20100312-pages-articles.xml.7z
5983814213  enwiki-20100130-pages-articles.xml.bz2

Note that the former is in 7z and the later is in bz2

Does this help?

Shaun


On Mon, Dec 13, 2010 at 8:45 AM, Monica shu monicashu...@gmail.com wrote:

 Hi all,

 I have downloaded a dump several month ago.
 By accidentally, I lost the version info of this dump, so I don't know when
 this dump was generated.
 Is there any place that list out info about the past dumps(such as
 size...)?

 Thanks!

 Monica
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l