Re: [Wikitech-l] How to find the version of a dump

2010-12-17 Thread Monica shu
Finally

Thank you all a lot

Monica

On Thu, Dec 16, 2010 at 11:50 PM, emijrp  wrote:

> Hi Monica;
>
> You dump is this one, with date 2010-03-12:[1][2]
>
> a3a5ee062abc16a79d111273d4a1a99a  enwiki-20100312-pages-articles.xml.bz2
>
> There are some old English Wikipedia dumps and md5sum files in a directory
> called "archive"[3].
>
> Regards,
> emijrp
>
> [1]
>
> http://download.wikimedia.org/archive/enwiki/20100312/enwiki-20100312-md5sums.txt
> [2] http://download.wikimedia.org/archive/enwiki/20100312/
> [3] http://download.wikimedia.org/archive/
>
> 2010/12/14 Monica shu 
>
> > Hi emijrp,
> >
> > Here is my dump's info:
> >
> > *enwiki-latest-pages-articles.xml.bz2 *
> > *a3a5ee062abc16a79d111273d4a1a99a*
> >
> > Thanks~
> >
> > On Mon, Dec 13, 2010 at 10:00 PM, emijrp  wrote:
> >
> > > Hi;
> > >
> > > It would be better if you can give us the md5sum of the file. If you
> are
> > on
> > > Linux, use the command "md5sum filename" (you have to install it with
> > > apt-get). If you are on Windows search for a tutorial.
> > >
> > > Also, the file size and the project language and family (wikipedia,
> > > wiktionary...) would be nice.
> > >
> > > Regards,
> > > emijrp
> > >
> > > 2010/12/13 Monica shu 
> > >
> > > > Hi all,
> > > >
> > > > I have downloaded a dump several month ago.
> > > > By accidentally, I lost the version info of this dump, so I don't
> know
> > > when
> > > > this dump was generated.
> > > > Is there any place that list out info about the past dumps(such as
> > > > size...)?
> > > >
> > > > Thanks!
> > > >
> > > > Monica
> > > > ___
> > > > Wikitech-l mailing list
> > > > Wikitech-l@lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > >
> > > ___
> > > Wikitech-l mailing list
> > > Wikitech-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread Ariel T. Glenn
I have no idea about the 2006 one; the other ones I know to be
incomplete one way or another.  Working with the Jan and March 2010 run,
in conjunction with the earlier dumps, you can get complete info, see
http://techblog.wikimedia.org/2010/05/
In addition the September 2010 run 
http://dumps.wikimedia.org/enwiki/20100904/
is complete, though the 7z files for that are not available. 

We have various dumps from even earlier around; when we are in better
shape as far as dataset2, the new server yet to be set up, then these
will be made available to the public.

In related news the new server has arrived and we are waiting for the
arrays to be put together and shipped!

Ariel


Στις 16-12-2010, ημέρα Πεμ, και ώρα 17:06 +0100, ο/η emijrp έγραψε:
> All? The 2006 one too?
> 
> 2010/12/16 Ariel T. Glenn 
> 
> > The dumps in the archive are there because they are incomplete, by the
> > way.
> >
> > Ariel
> >
> > Στις 16-12-2010, ημέρα Πεμ, και ώρα 16:50 +0100, ο/η emijrp έγραψε:
> > > Hi Monica;
> > >
> > > You dump is this one, with date 2010-03-12:[1][2]
> > >
> > > a3a5ee062abc16a79d111273d4a1a99a  enwiki-20100312-pages-articles.xml.bz2
> > >
> > > There are some old English Wikipedia dumps and md5sum files in a
> > directory
> > > called "archive"[3].
> > >
> > > Regards,
> > > emijrp
> > >
> > > [1]
> > >
> > http://download.wikimedia.org/archive/enwiki/20100312/enwiki-20100312-md5sums.txt
> > > [2] http://download.wikimedia.org/archive/enwiki/20100312/
> > > [3] http://download.wikimedia.org/archive/
> > >
> > > 2010/12/14 Monica shu 
> > >
> > > > Hi emijrp,
> > > >
> > > > Here is my dump's info:
> > > >
> > > > *enwiki-latest-pages-articles.xml.bz2 *
> > > > *a3a5ee062abc16a79d111273d4a1a99a*
> > > >
> > > > Thanks~
> > > >
> > > > On Mon, Dec 13, 2010 at 10:00 PM, emijrp  wrote:
> > > >
> > > > > Hi;
> > > > >
> > > > > It would be better if you can give us the md5sum of the file. If you
> > are
> > > > on
> > > > > Linux, use the command "md5sum filename" (you have to install it with
> > > > > apt-get). If you are on Windows search for a tutorial.
> > > > >
> > > > > Also, the file size and the project language and family (wikipedia,
> > > > > wiktionary...) would be nice.
> > > > >
> > > > > Regards,
> > > > > emijrp
> > > > >
> > > > > 2010/12/13 Monica shu 
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I have downloaded a dump several month ago.
> > > > > > By accidentally, I lost the version info of this dump, so I don't
> > know
> > > > > when
> > > > > > this dump was generated.
> > > > > > Is there any place that list out info about the past dumps(such as
> > > > > > size...)?
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Monica
> > > > > > ___
> > > > > > Wikitech-l mailing list
> > > > > > Wikitech-l@lists.wikimedia.org
> > > > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > > > >
> > > > > ___
> > > > > Wikitech-l mailing list
> > > > > Wikitech-l@lists.wikimedia.org
> > > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > > >
> > > > ___
> > > > Wikitech-l mailing list
> > > > Wikitech-l@lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > >
> > > ___
> > > Wikitech-l mailing list
> > > Wikitech-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> >
> >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread emijrp
All? The 2006 one too?

2010/12/16 Ariel T. Glenn 

> The dumps in the archive are there because they are incomplete, by the
> way.
>
> Ariel
>
> Στις 16-12-2010, ημέρα Πεμ, και ώρα 16:50 +0100, ο/η emijrp έγραψε:
> > Hi Monica;
> >
> > You dump is this one, with date 2010-03-12:[1][2]
> >
> > a3a5ee062abc16a79d111273d4a1a99a  enwiki-20100312-pages-articles.xml.bz2
> >
> > There are some old English Wikipedia dumps and md5sum files in a
> directory
> > called "archive"[3].
> >
> > Regards,
> > emijrp
> >
> > [1]
> >
> http://download.wikimedia.org/archive/enwiki/20100312/enwiki-20100312-md5sums.txt
> > [2] http://download.wikimedia.org/archive/enwiki/20100312/
> > [3] http://download.wikimedia.org/archive/
> >
> > 2010/12/14 Monica shu 
> >
> > > Hi emijrp,
> > >
> > > Here is my dump's info:
> > >
> > > *enwiki-latest-pages-articles.xml.bz2 *
> > > *a3a5ee062abc16a79d111273d4a1a99a*
> > >
> > > Thanks~
> > >
> > > On Mon, Dec 13, 2010 at 10:00 PM, emijrp  wrote:
> > >
> > > > Hi;
> > > >
> > > > It would be better if you can give us the md5sum of the file. If you
> are
> > > on
> > > > Linux, use the command "md5sum filename" (you have to install it with
> > > > apt-get). If you are on Windows search for a tutorial.
> > > >
> > > > Also, the file size and the project language and family (wikipedia,
> > > > wiktionary...) would be nice.
> > > >
> > > > Regards,
> > > > emijrp
> > > >
> > > > 2010/12/13 Monica shu 
> > > >
> > > > > Hi all,
> > > > >
> > > > > I have downloaded a dump several month ago.
> > > > > By accidentally, I lost the version info of this dump, so I don't
> know
> > > > when
> > > > > this dump was generated.
> > > > > Is there any place that list out info about the past dumps(such as
> > > > > size...)?
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Monica
> > > > > ___
> > > > > Wikitech-l mailing list
> > > > > Wikitech-l@lists.wikimedia.org
> > > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > > >
> > > > ___
> > > > Wikitech-l mailing list
> > > > Wikitech-l@lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > >
> > > ___
> > > Wikitech-l mailing list
> > > Wikitech-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread Ariel T. Glenn
The dumps in the archive are there because they are incomplete, by the
way.

Ariel

Στις 16-12-2010, ημέρα Πεμ, και ώρα 16:50 +0100, ο/η emijrp έγραψε:
> Hi Monica;
> 
> You dump is this one, with date 2010-03-12:[1][2]
> 
> a3a5ee062abc16a79d111273d4a1a99a  enwiki-20100312-pages-articles.xml.bz2
> 
> There are some old English Wikipedia dumps and md5sum files in a directory
> called "archive"[3].
> 
> Regards,
> emijrp
> 
> [1]
> http://download.wikimedia.org/archive/enwiki/20100312/enwiki-20100312-md5sums.txt
> [2] http://download.wikimedia.org/archive/enwiki/20100312/
> [3] http://download.wikimedia.org/archive/
> 
> 2010/12/14 Monica shu 
> 
> > Hi emijrp,
> >
> > Here is my dump's info:
> >
> > *enwiki-latest-pages-articles.xml.bz2 *
> > *a3a5ee062abc16a79d111273d4a1a99a*
> >
> > Thanks~
> >
> > On Mon, Dec 13, 2010 at 10:00 PM, emijrp  wrote:
> >
> > > Hi;
> > >
> > > It would be better if you can give us the md5sum of the file. If you are
> > on
> > > Linux, use the command "md5sum filename" (you have to install it with
> > > apt-get). If you are on Windows search for a tutorial.
> > >
> > > Also, the file size and the project language and family (wikipedia,
> > > wiktionary...) would be nice.
> > >
> > > Regards,
> > > emijrp
> > >
> > > 2010/12/13 Monica shu 
> > >
> > > > Hi all,
> > > >
> > > > I have downloaded a dump several month ago.
> > > > By accidentally, I lost the version info of this dump, so I don't know
> > > when
> > > > this dump was generated.
> > > > Is there any place that list out info about the past dumps(such as
> > > > size...)?
> > > >
> > > > Thanks!
> > > >
> > > > Monica
> > > > ___
> > > > Wikitech-l mailing list
> > > > Wikitech-l@lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > >
> > > ___
> > > Wikitech-l mailing list
> > > Wikitech-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread emijrp
Hi Monica;

You dump is this one, with date 2010-03-12:[1][2]

a3a5ee062abc16a79d111273d4a1a99a  enwiki-20100312-pages-articles.xml.bz2

There are some old English Wikipedia dumps and md5sum files in a directory
called "archive"[3].

Regards,
emijrp

[1]
http://download.wikimedia.org/archive/enwiki/20100312/enwiki-20100312-md5sums.txt
[2] http://download.wikimedia.org/archive/enwiki/20100312/
[3] http://download.wikimedia.org/archive/

2010/12/14 Monica shu 

> Hi emijrp,
>
> Here is my dump's info:
>
> *enwiki-latest-pages-articles.xml.bz2 *
> *a3a5ee062abc16a79d111273d4a1a99a*
>
> Thanks~
>
> On Mon, Dec 13, 2010 at 10:00 PM, emijrp  wrote:
>
> > Hi;
> >
> > It would be better if you can give us the md5sum of the file. If you are
> on
> > Linux, use the command "md5sum filename" (you have to install it with
> > apt-get). If you are on Windows search for a tutorial.
> >
> > Also, the file size and the project language and family (wikipedia,
> > wiktionary...) would be nice.
> >
> > Regards,
> > emijrp
> >
> > 2010/12/13 Monica shu 
> >
> > > Hi all,
> > >
> > > I have downloaded a dump several month ago.
> > > By accidentally, I lost the version info of this dump, so I don't know
> > when
> > > this dump was generated.
> > > Is there any place that list out info about the past dumps(such as
> > > size...)?
> > >
> > > Thanks!
> > >
> > > Monica
> > > ___
> > > Wikitech-l mailing list
> > > Wikitech-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread emijrp
Hi James;

download.wikimedia.org is available again, so, you can download that file
from
http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-pages-articles.xml.bz26.2
GB.

Regards,
emijrp

2010/12/14 James Linden 

> On Mon, Dec 13, 2010 at 7:09 PM, Michael Gurlitz
>  wrote:
> > I grabbed the following files in the days before the server broke, and
> > I can set up a torrent file if anyone's interested, or I could FTP
> > them to a server. 2010-10-11 was the last full Wikipedia dump that was
> > completed.
> > 6652983189 (6.2GB) enwiki-20101011-pages-articles.xml.bz2
>
> I would very much like to get a copy of
> enwiki-20101011-pages-articles.xml.bz2 if that's possible?
>
> If you need a server to upload to, message me off-list and I can provide
> it.
>
> -- James
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread Monica shu
Totally agree!
And also I think an info page listing all past versions will also be
helpful:)

Monica

On Tue, Dec 14, 2010 at 5:11 PM, Andrew Dunbar  wrote:

> On 14 December 2010 20:04, Andrew Dunbar  wrote:
> > On 14 December 2010 01:57, Monica shu  wrote:
> >> Thanks Diederik and Waksman,
> >>
> >> It seems that I need to do parse the dump for article data to get this
> piece
> >> of information...
> >> Yes, this will be the last choice, but I think there maybe some easier
> >> way...
> >>
> >> I just got home and checked the dump I've downloaded.
> >> It's downloaded on June, 10, 2010, the size is 6117881141 in bz2.
> >> I remember when I download, it's the latest version at that moment.
> >> As the dumps are generated every N months, and the one I have is bigger
> that
> >> the version 2010-01-30 as Waksman said, my version should be between Feb
> to
> >> June.
> >
> > A Google search hints that enwiki-20100312-pages-articles.xml.bz2
> > might be the one with size 6117881141.
> >
> > Andrew Dunbar (hippietrail)
> >
> >
> >> Does anybody remember the version between this period, or happened to
> >> download the same version with me?
> >>
> >> Thanks very much to tell me any related information again!
> >>
> >>
> >> Best regards!
> >> Monica
> >>
> >>
> >>
> >>
> >> On Mon, Dec 13, 2010 at 3:24 PM, Shaun Waksman  >wrote:
> >>
> >>> Hi Monica,
> >>>
> >>> The file sizes of the EN pages dumps that are available today are:
> >>>
> >>> 5204823166  enwiki-20100312-pages-articles.xml.7z
> >>> 5983814213  enwiki-20100130-pages-articles.xml.bz2
> >>>
> >>> Note that the former is in 7z and the later is in bz2
> >>>
> >>> Does this help?
> >>>
> >>> Shaun
> >>>
> >>>
> >>> On Mon, Dec 13, 2010 at 8:45 AM, Monica shu 
> >>> wrote:
> >>>
> >>> > Hi all,
> >>> >
> >>> > I have downloaded a dump several month ago.
> >>> > By accidentally, I lost the version info of this dump, so I don't
> know
> >>> when
> >>> > this dump was generated.
> >>> > Is there any place that list out info about the past dumps(such as
> >>> > size...)?
> >>> >
> >>> > Thanks!
> >>> >
> >>> > Monica
> >>> > ___
> >>> > Wikitech-l mailing list
> >>> > Wikitech-l@lists.wikimedia.org
> >>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >>> >
> >>> ___
> >>> Wikitech-l mailing list
> >>> Wikitech-l@lists.wikimedia.org
> >>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >>>
> >> ___
> >> Wikitech-l mailing list
> >> Wikitech-l@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >>
> >
>
> It should be trivial to add the dump data to the header each dump
> file. Since in the files themselves the date field of the filename is
> often replaced by "latest" this could be very useful. It could also be
> useful to include the revision ID and timestamp of the latest revision
> but I assume this would be a little more difficult. Should I file a
> feature request?
>
> Andrew Dunbar (hippietrail)
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-16 Thread Monica shu
Sorry Andrew, I just notice this reply
Can you give me the url of this search page?
Thanks!

Shu

On Tue, Dec 14, 2010 at 5:04 PM, Andrew Dunbar  wrote:

> On 14 December 2010 01:57, Monica shu  wrote:
> > Thanks Diederik and Waksman,
> >
> > It seems that I need to do parse the dump for article data to get this
> piece
> > of information...
> > Yes, this will be the last choice, but I think there maybe some easier
> > way...
> >
> > I just got home and checked the dump I've downloaded.
> > It's downloaded on June, 10, 2010, the size is 6117881141 in bz2.
> > I remember when I download, it's the latest version at that moment.
> > As the dumps are generated every N months, and the one I have is bigger
> that
> > the version 2010-01-30 as Waksman said, my version should be between Feb
> to
> > June.
>
> A Google search hints that enwiki-20100312-pages-articles.xml.bz2
> might be the one with size 6117881141.
>
> Andrew Dunbar (hippietrail)
>
>
> > Does anybody remember the version between this period, or happened to
> > download the same version with me?
> >
> > Thanks very much to tell me any related information again!
> >
> >
> > Best regards!
> > Monica
> >
> >
> >
> >
> > On Mon, Dec 13, 2010 at 3:24 PM, Shaun Waksman  >wrote:
> >
> >> Hi Monica,
> >>
> >> The file sizes of the EN pages dumps that are available today are:
> >>
> >> 5204823166  enwiki-20100312-pages-articles.xml.7z
> >> 5983814213  enwiki-20100130-pages-articles.xml.bz2
> >>
> >> Note that the former is in 7z and the later is in bz2
> >>
> >> Does this help?
> >>
> >> Shaun
> >>
> >>
> >> On Mon, Dec 13, 2010 at 8:45 AM, Monica shu 
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I have downloaded a dump several month ago.
> >> > By accidentally, I lost the version info of this dump, so I don't know
> >> when
> >> > this dump was generated.
> >> > Is there any place that list out info about the past dumps(such as
> >> > size...)?
> >> >
> >> > Thanks!
> >> >
> >> > Monica
> >> > ___
> >> > Wikitech-l mailing list
> >> > Wikitech-l@lists.wikimedia.org
> >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >> >
> >> ___
> >> Wikitech-l mailing list
> >> Wikitech-l@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >>
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-14 Thread Platonides
Monica shu wrote:
> Hi emijrp,
> 
> Here is my dump's info:
> 
> *enwiki-latest-pages-articles.xml.bz2 *
> *a3a5ee062abc16a79d111273d4a1a99a*
> 
> Thanks~

I can't find such md5 on any dump.

Here are the md5s of the latest enwiki pages-articles:
a9506e8aedd3b830e059b7c8a3c0dbcd  enwiki-20100904-pages-articles.xml.bz2
09ae0db25ae95af53296e812bc67554b  enwiki-20100916-pages-articles.xml.bz2
7a4805475bba1599933b3acd5150bd4d  enwiki-20101011-pages-articles.xml.bz2


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-14 Thread James Linden
On Mon, Dec 13, 2010 at 7:09 PM, Michael Gurlitz
 wrote:
> I grabbed the following files in the days before the server broke, and
> I can set up a torrent file if anyone's interested, or I could FTP
> them to a server. 2010-10-11 was the last full Wikipedia dump that was
> completed.
> 6652983189 (6.2GB) enwiki-20101011-pages-articles.xml.bz2

I would very much like to get a copy of
enwiki-20101011-pages-articles.xml.bz2 if that's possible?

If you need a server to upload to, message me off-list and I can provide it.

-- James

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-14 Thread Andrew Dunbar
On 14 December 2010 20:04, Andrew Dunbar  wrote:
> On 14 December 2010 01:57, Monica shu  wrote:
>> Thanks Diederik and Waksman,
>>
>> It seems that I need to do parse the dump for article data to get this piece
>> of information...
>> Yes, this will be the last choice, but I think there maybe some easier
>> way...
>>
>> I just got home and checked the dump I've downloaded.
>> It's downloaded on June, 10, 2010, the size is 6117881141 in bz2.
>> I remember when I download, it's the latest version at that moment.
>> As the dumps are generated every N months, and the one I have is bigger that
>> the version 2010-01-30 as Waksman said, my version should be between Feb to
>> June.
>
> A Google search hints that enwiki-20100312-pages-articles.xml.bz2
> might be the one with size 6117881141.
>
> Andrew Dunbar (hippietrail)
>
>
>> Does anybody remember the version between this period, or happened to
>> download the same version with me?
>>
>> Thanks very much to tell me any related information again!
>>
>>
>> Best regards!
>> Monica
>>
>>
>>
>>
>> On Mon, Dec 13, 2010 at 3:24 PM, Shaun Waksman wrote:
>>
>>> Hi Monica,
>>>
>>> The file sizes of the EN pages dumps that are available today are:
>>>
>>> 5204823166  enwiki-20100312-pages-articles.xml.7z
>>> 5983814213  enwiki-20100130-pages-articles.xml.bz2
>>>
>>> Note that the former is in 7z and the later is in bz2
>>>
>>> Does this help?
>>>
>>> Shaun
>>>
>>>
>>> On Mon, Dec 13, 2010 at 8:45 AM, Monica shu 
>>> wrote:
>>>
>>> > Hi all,
>>> >
>>> > I have downloaded a dump several month ago.
>>> > By accidentally, I lost the version info of this dump, so I don't know
>>> when
>>> > this dump was generated.
>>> > Is there any place that list out info about the past dumps(such as
>>> > size...)?
>>> >
>>> > Thanks!
>>> >
>>> > Monica
>>> > ___
>>> > Wikitech-l mailing list
>>> > Wikitech-l@lists.wikimedia.org
>>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>> >
>>> ___
>>> Wikitech-l mailing list
>>> Wikitech-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>

It should be trivial to add the dump data to the header each dump
file. Since in the files themselves the date field of the filename is
often replaced by "latest" this could be very useful. It could also be
useful to include the revision ID and timestamp of the latest revision
but I assume this would be a little more difficult. Should I file a
feature request?

Andrew Dunbar (hippietrail)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-14 Thread Andrew Dunbar
On 14 December 2010 01:57, Monica shu  wrote:
> Thanks Diederik and Waksman,
>
> It seems that I need to do parse the dump for article data to get this piece
> of information...
> Yes, this will be the last choice, but I think there maybe some easier
> way...
>
> I just got home and checked the dump I've downloaded.
> It's downloaded on June, 10, 2010, the size is 6117881141 in bz2.
> I remember when I download, it's the latest version at that moment.
> As the dumps are generated every N months, and the one I have is bigger that
> the version 2010-01-30 as Waksman said, my version should be between Feb to
> June.

A Google search hints that enwiki-20100312-pages-articles.xml.bz2
might be the one with size 6117881141.

Andrew Dunbar (hippietrail)


> Does anybody remember the version between this period, or happened to
> download the same version with me?
>
> Thanks very much to tell me any related information again!
>
>
> Best regards!
> Monica
>
>
>
>
> On Mon, Dec 13, 2010 at 3:24 PM, Shaun Waksman wrote:
>
>> Hi Monica,
>>
>> The file sizes of the EN pages dumps that are available today are:
>>
>> 5204823166  enwiki-20100312-pages-articles.xml.7z
>> 5983814213  enwiki-20100130-pages-articles.xml.bz2
>>
>> Note that the former is in 7z and the later is in bz2
>>
>> Does this help?
>>
>> Shaun
>>
>>
>> On Mon, Dec 13, 2010 at 8:45 AM, Monica shu 
>> wrote:
>>
>> > Hi all,
>> >
>> > I have downloaded a dump several month ago.
>> > By accidentally, I lost the version info of this dump, so I don't know
>> when
>> > this dump was generated.
>> > Is there any place that list out info about the past dumps(such as
>> > size...)?
>> >
>> > Thanks!
>> >
>> > Monica
>> > ___
>> > Wikitech-l mailing list
>> > Wikitech-l@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> >
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-13 Thread Monica shu
Hi emijrp,

Here is my dump's info:

*enwiki-latest-pages-articles.xml.bz2 *
*a3a5ee062abc16a79d111273d4a1a99a*

Thanks~

On Mon, Dec 13, 2010 at 10:00 PM, emijrp  wrote:

> Hi;
>
> It would be better if you can give us the md5sum of the file. If you are on
> Linux, use the command "md5sum filename" (you have to install it with
> apt-get). If you are on Windows search for a tutorial.
>
> Also, the file size and the project language and family (wikipedia,
> wiktionary...) would be nice.
>
> Regards,
> emijrp
>
> 2010/12/13 Monica shu 
>
> > Hi all,
> >
> > I have downloaded a dump several month ago.
> > By accidentally, I lost the version info of this dump, so I don't know
> when
> > this dump was generated.
> > Is there any place that list out info about the past dumps(such as
> > size...)?
> >
> > Thanks!
> >
> > Monica
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-13 Thread Michael Gurlitz
I grabbed the following files in the days before the server broke, and
I can set up a torrent file if anyone's interested, or I could FTP
them to a server. 2010-10-11 was the last full Wikipedia dump that was
completed.
6652983189 (6.2GB) enwiki-20101011-pages-articles.xml.bz2
12823734687 (12GB) enwiki-20101011-pages-meta-current.xml.bz2
146433984 (140MB) enwikiquote-20101012-pages-meta-history.xml.7z

On Mon, Dec 13, 2010 at 9:57 AM, Monica shu  wrote:
> Thanks Diederik and Waksman,
>
> It seems that I need to do parse the dump for article data to get this piece
> of information...
> Yes, this will be the last choice, but I think there maybe some easier
> way...
>
> I just got home and checked the dump I've downloaded.
> It's downloaded on June, 10, 2010, the size is 6117881141 in bz2.
> I remember when I download, it's the latest version at that moment.
> As the dumps are generated every N months, and the one I have is bigger that
> the version 2010-01-30 as Waksman said, my version should be between Feb to
> June.
>
> Does anybody remember the version between this period, or happened to
> download the same version with me?
>
> Thanks very much to tell me any related information again!
>
>
> Best regards!
> Monica
>
>
>
>
> On Mon, Dec 13, 2010 at 3:24 PM, Shaun Waksman wrote:
>
>> Hi Monica,
>>
>> The file sizes of the EN pages dumps that are available today are:
>>
>> 5204823166  enwiki-20100312-pages-articles.xml.7z
>> 5983814213  enwiki-20100130-pages-articles.xml.bz2
>>
>> Note that the former is in 7z and the later is in bz2
>>
>> Does this help?
>>
>> Shaun
>>
>>
>> On Mon, Dec 13, 2010 at 8:45 AM, Monica shu 
>> wrote:
>>
>> > Hi all,
>> >
>> > I have downloaded a dump several month ago.
>> > By accidentally, I lost the version info of this dump, so I don't know
>> when
>> > this dump was generated.
>> > Is there any place that list out info about the past dumps(such as
>> > size...)?
>> >
>> > Thanks!
>> >
>> > Monica
>> > ___
>> > Wikitech-l mailing list
>> > Wikitech-l@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> >
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-13 Thread Monica shu
Thanks Diederik and Waksman,

It seems that I need to do parse the dump for article data to get this piece
of information...
Yes, this will be the last choice, but I think there maybe some easier
way...

I just got home and checked the dump I've downloaded.
It's downloaded on June, 10, 2010, the size is 6117881141 in bz2.
I remember when I download, it's the latest version at that moment.
As the dumps are generated every N months, and the one I have is bigger that
the version 2010-01-30 as Waksman said, my version should be between Feb to
June.

Does anybody remember the version between this period, or happened to
download the same version with me?

Thanks very much to tell me any related information again!


Best regards!
Monica




On Mon, Dec 13, 2010 at 3:24 PM, Shaun Waksman wrote:

> Hi Monica,
>
> The file sizes of the EN pages dumps that are available today are:
>
> 5204823166  enwiki-20100312-pages-articles.xml.7z
> 5983814213  enwiki-20100130-pages-articles.xml.bz2
>
> Note that the former is in 7z and the later is in bz2
>
> Does this help?
>
> Shaun
>
>
> On Mon, Dec 13, 2010 at 8:45 AM, Monica shu 
> wrote:
>
> > Hi all,
> >
> > I have downloaded a dump several month ago.
> > By accidentally, I lost the version info of this dump, so I don't know
> when
> > this dump was generated.
> > Is there any place that list out info about the past dumps(such as
> > size...)?
> >
> > Thanks!
> >
> > Monica
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-13 Thread emijrp
Hi;

It would be better if you can give us the md5sum of the file. If you are on
Linux, use the command "md5sum filename" (you have to install it with
apt-get). If you are on Windows search for a tutorial.

Also, the file size and the project language and family (wikipedia,
wiktionary...) would be nice.

Regards,
emijrp

2010/12/13 Monica shu 

> Hi all,
>
> I have downloaded a dump several month ago.
> By accidentally, I lost the version info of this dump, so I don't know when
> this dump was generated.
> Is there any place that list out info about the past dumps(such as
> size...)?
>
> Thanks!
>
> Monica
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-12 Thread Shaun Waksman
Hi Monica,

The file sizes of the EN pages dumps that are available today are:

5204823166  enwiki-20100312-pages-articles.xml.7z
5983814213  enwiki-20100130-pages-articles.xml.bz2

Note that the former is in 7z and the later is in bz2

Does this help?

Shaun


On Mon, Dec 13, 2010 at 8:45 AM, Monica shu  wrote:

> Hi all,
>
> I have downloaded a dump several month ago.
> By accidentally, I lost the version info of this dump, so I don't know when
> this dump was generated.
> Is there any place that list out info about the past dumps(such as
> size...)?
>
> Thanks!
>
> Monica
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] How to find the version of a dump

2010-12-12 Thread Diederik van Liere
Hi Monica, 

I don't think there is such a place, what you could do is parse the file and 
look for the date of the most recent edit. That will give you a fairly accurate 
estimate of the date that the dump was generated.

Best,

Diederik
On 2010-12-12, at 10:45 PM, Monica shu wrote:

> Hi all,
> 
> I have downloaded a dump several month ago.
> By accidentally, I lost the version info of this dump, so I don't know when
> this dump was generated.
> Is there any place that list out info about the past dumps(such as size...)?
> 
> Thanks!
> 
> Monica
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l