For folks who have not been following the saga on
http://wikitech.wikimedia.org/view/Dataset1
we were able to get the raid array back in service last night on the XML
data dumps server, and we are now busily copying data off of it to
another host. There's about 11T of dumps to copy over; once tha
Great news! Thanks for the update and thanks for all you guys' work getting
it beaten back into shape. Keeping fingers crossed for all going well on the
transfer...
-- brion
On Dec 14, 2010 1:12 AM, "Ariel T. Glenn" wrote:
> For folks who have not been following the saga on
> http://wikitech.wiki
+1
Diederik
On 2010-12-14, at 12:02, Brion Vibber wrote:
> Great news! Thanks for the update and thanks for all you guys' work getting
> it beaten back into shape. Keeping fingers crossed for all going well on the
> transfer...
>
> -- brion
> On Dec 14, 2010 1:12 AM, "Ariel T. Glenn" wrote:
>
Thanks.
Double good news:
http://lists.wikimedia.org/pipermail/foundation-l/2010-December/063088.html
2010/12/14 Ariel T. Glenn
> For folks who have not been following the saga on
> http://wikitech.wikimedia.org/view/Dataset1
> we were able to get the raid array back in service last night on th
We now have a copy of the dumps on a backup host. Although we are still
resolving hardware issues on the XML dumps server, we think it is safe
enough to serve the existing dumps read-only. DNS was updated to that
effect already; people should see the dumps within the hour.
Ariel
___
Good news, but looking form a professional point of view having them
just on array will be leading to such outages.
Any idea to have a tape backup or mirror?
masti
On 12/15/2010 08:57 PM, Ariel T. Glenn wrote:
> We now have a copy of the dumps on a backup host. Although we are still
> resolving
Currently the files have been copied off of the server onto a backup
host, which is the only reason we feel safe about serving them again.
We will be getting a new host (it is due to be shipped soon) which will
host the live data. The current server will have a backup copy. That is
the short term
On Wed, Dec 15, 2010 at 3:30 PM, Ariel T. Glenn wrote:
> We are interested in other mirrors of the dumps; see
>
> http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
On the talk page, it says "torrents are useful to save bandwidth,
which is not our problem". If bandwidth is not
Στις 15-12-2010, ημέρα Τετ, και ώρα 15:57 -0500, ο/η Anthony έγραψε:
> On Wed, Dec 15, 2010 at 3:30 PM, Ariel T. Glenn wrote:
> > We are interested in other mirrors of the dumps; see
> >
> > http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
>
> On the talk page, it says "torren
On Wed, Dec 15, 2010 at 10:03 PM, Ariel T. Glenn wrote:
>
> We certainly want people to host it as well. It's not a matter of
> bandwidth but of protection: if someone can't get to our copy for
> whatever reason, another copy is accessible.
>
Is there a copy in Amsterdam? Seems like that would be
Στις 15-12-2010, ημέρα Τετ, και ώρα 22:50 +0100, ο/η Bryan Tong Minh
έγραψε:
> On Wed, Dec 15, 2010 at 10:03 PM, Ariel T. Glenn wrote:
> >
> > We certainly want people to host it as well. It's not a matter of
> > bandwidth but of protection: if someone can't get to our copy for
> > whatever reaso
On 12/15/2010 09:30 PM, Ariel T. Glenn wrote:
> We are interested in other mirrors of the dumps; see
>
> http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps
Just as a small-scale experiment, I tried to mirror the
Faroese (fowiki) and Sami (sewiki) language projects.
But "wget -m"
Ariel T. Glenn wikimedia.org> writes:
>
> We now have a copy of the dumps on a backup host. Although we are still
> resolving hardware issues on the XML dumps server, we think it is safe
> enough to serve the existing dumps read-only. DNS was updated to that
> effect already; people should see
Have you checked the md5sum?
2010/12/16 Gabriel Weinberg
> Ariel T. Glenn wikimedia.org> writes:
>
> >
> > We now have a copy of the dumps on a backup host. Although we are still
> > resolving hardware issues on the XML dumps server, we think it is safe
> > enough to serve the existing dumps r
md5sum doesn't match. I get e74170eaaedc65e02249e1a54b1087cb (as
opposed to 7a4805475bba1599933b3acd5150bd4d
on http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-md5sums.txt
).
I've downloaded it twice now and have gotten the same md5sum. Can anyone
else confirm?
On Thu, Dec 16, 2010
If the md5s don't match, the files are obviously different, I mean, one of
them is corrupt.
What is the size of your local file? I use to download dumps with wget UNIX
command and I don't get errors. If you are using FAT32, the file size is
limited to 2 GB and the file is truncated. Is your case?
I've been downloading this file (using wget on ubuntu or fetch on FreeBSD)
with no issues for years. The current one is 6.2GB as it should be.
On Thu, Dec 16, 2010 at 5:53 PM, emijrp wrote:
> If the md5s don't match, the files are obviously different, I mean, one of
> them is corrupt.
>
> What i
I was able to unzip a copy of the file on another host (taken from the
same location) without problems. On the download host itself I get the
correct md5sum: 7a4805475bba1599933b3acd5150bd4d
Ariel
Στις 16-12-2010, ημέρα Πεμ, και ώρα 17:48 -0500, ο/η Gabriel Weinberg
έγραψε:
> md5sum doesn't match
Thx--I guess I'll try again--third time's the charm I suppose :)
Sorry to waste your time,
Gabriel
On Thu, Dec 16, 2010 at 6:13 PM, Ariel T. Glenn wrote:
> I was able to unzip a copy of the file on another host (taken from the
> same location) without problems. On the download host itself I g
Gabriel Weinberg wrote:
> md5sum doesn't match. I get e74170eaaedc65e02249e1a54b1087cb (as
> opposed to 7a4805475bba1599933b3acd5150bd4d
> on http://download.wikimedia.org/enwiki/20101011/enwiki-20101011-md5sums.txt
> ).
>
> I've downloaded it twice now and have gotten the same md5sum. Can anyone
Google donated storage space for backups for XML dumps. Accordingly, a
copy of the latest complete dump for each project is being copied over
(public files only). We expect to run similar copies once every two
weeks, keeping the four latest copies as well as one permanent copy at
every six month
On Wed, Dec 15, 2010 at 4:56 PM, Ariel T. Glenn wrote:
> We want people besides us to host it. We expect to put a copy at the
> new data center (at least), as well.
Does anyone know if the Wikipedia XML Data AWS Public Dataset [1] is
being routinely updated? It's showing a last update of "Septem
22 matches
Mail list logo