I would hazard a guess that your bz2 unzip app does not handle multistream
files in an appropriate way, Wurgl. The multistream files consist of
several bzip2-compressed files concatenated together; see
https://meta.wikimedia.org/wiki/Data_dumps/Dump_format#Multistream_dumps
for details. Try downlo
Hello folks!
For some years now, I've been the main or only point of contact for the
Wiki project sql/xml dumps semimonthly, as well as for a number of
miscellaneous weekly datasets.
This work is now passing to Data Platform Engineering (DPE), and your new
points of contact, starting right away,
Hi Dušan,
The legal team handles all manner of legal issues. You'll need to be
patient. I can't speed up their process for you, nor give you more
information than I already have.
Also, please don't send duplicate messages to the list. That would be
considered spam. Thanks!
Ariel Glenn
dumps co-
I was away from work for the past two days and so unable to reply. My
apologies! Indeed, Dušan, if you want to sort out exactly what to do
with/about the licenses, the legal team is the way to go. Reach them at
legal (at) wikimedia.org. Hope you get it sorted!
Ariel
On Wed, Jul 26, 2023 at 5:57
I'm not sure which text you are relying on. But the legal information for
the licensing of content in the dumps can be found here:
https://dumps.wikimedia.org/legal.html I hope that helps.
Ariel Glenn
dumps co-maintainer
ar...@wikimedia.org
On Fri, Jul 21, 2023 at 12:10 PM Dušan Kreheľ wrote:
Hello Evan,
The Enterprise HTML dumps should be publicly available around the 22nd and
the 3rd of each month, though there can be delays. We don't expect that to
change any time soon. As to their content or the namespaces, I can't answer
to that; someone from WIkimedia Enterprise will have to disc
Due to switch maintenance, this week's dumps of wikidata entities, other
weekly datasets, and today's adds-changes dumps may not be produced.
All datasets should be back on a normal production schedule the following
week.
Apologies for the inconvenience!
Ariel Glenn
ar...@wikimedia.org
Community
My apologies for the duplicate FAQ this month. We recently deployed a new
server and the old one, now retired, still had the FAQ generation job
running on it. We should be back to the usual number of FAQ emails (one)
next month. Thanks!
Ariel Glenn
dumps co-maintainer
___
Eric,
We don't produce dumps of the revision table in sql format because some of
those revisions may be hidden from public view, and even metadata about
them should not be released. We do however publish so-called Adds/Changes
dumps once a day for each wiki, providing stubs and content files in xm
There is an issue with the availability of these dumps for retrieval for
publishing to the public. This is being tracked in
https://phabricator.wikimedia.org/T311441 and updates will be posted there.
Ariel Glenn
ar...@wikimedia.org
On Sun, Jul 3, 2022 at 9:37 PM wrote:
> The folder
> https://du
gt;
> [1] https://www.mediawiki.org/wiki/Manual:Text_table
>
>
> Mitar
>
> On Fri, Feb 4, 2022 at 6:54 AM Ariel Glenn WMF
> wrote:
> >
> > This looks great! If you like, you might add the link and a brief
> description to this page:
> https://meta.wikimed
y to extract data, like dumps in other formats.
>
> [1] https://gitlab.com/tozd/go/mediawiki
>
>
> Mitar
>
> On Thu, Feb 3, 2022 at 9:13 AM Mitar wrote:
> >
> > Hi!
> >
> > I see. Thanks.
> >
> >
> > Mitar
> >
> > On Thu, Feb 3,
The media/file descriptions contained in the dump are the wikitext of the
revisions of pages with the File: prefix, plus the metadata about those
pages and revisions (user that made the edit, timestamp of edit, edit
comment, and so on).
Width and hieght of the image, the media type, the sha1 of th
You can get the filename listing a couple of other ways:
Check the directory listing for the specific date, i.e.
https://dumps.wikimedia.org/wikidatawiki/20211120/
Get the status file from that or the "latest" directory, i.e.
https://dumps.wikimedia.org/wikidatawiki/20211120/dumpstatus.json
Get one
I am pleased to announce that Wikimedia Enterprise's HTML dumps [1] for
October 17-18th are available for public download; see
https://dumps.wikimedia.org/other/enterprise_html/ for more information. We
expect to make updated versions of these files available around the 1st/2nd
of the month and the
Not the script itself but we have a permissions problem on some status
files that I'm having trouble stamping out. See
https://phabricator.wikimedia.org/T288192 for updates as they come in.
Ariel
On Mon, Aug 9, 2021 at 10:18 AM griffin tucker <
lmxxlmwikwik3...@griffintucker.id.au> wrote:
> stra
The enwiki run got a later start this month as we switched hosts around for
migration to a more recent version of the OS. But it's currently moving
along nicely. Thanks for the report though!
Ariel
On Wed, Feb 3, 2021 at 1:27 PM Nicolas Vervelle wrote:
> Hi,
>
> Is there a problem with enwiki d
The files are now all available, as has been noted on the task. The bz2
files and 7z files are just fine and can be processed as usual.
Ariel
On Fri, Nov 20, 2020 at 2:37 PM Ariel Glenn WMF wrote:
> Hello folks,
>
> I hope everyone is in good health and staying safe in these troub
Hello folks,
I hope everyone is in good health and staying safe in these troubled times.
Speaking of trouble, in the course of making an improvement to the xml/sql
dumps, I introduced a bug, and so now I am doing the cleanup from that.
The short version:
There will be a 7z file missing from the
The page is in our puppet repo; see
https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/dumps/files/web/html/public_mirrors.html
You can submit a patch to gerrit yourself if you like; see
https://www.mediawiki.org/wiki/Gerrit/Tutorial for setting up and working
wi
Thanks for this report! Would you be willing to open a task in phabricator
about the bytemark mirror, and tag it with dumps-generation so that it gets
into the right queue?
https://phabricator.wikimedia.org/maniphest/task/edit/form/1/
The C3SL mirror has technical issues with DNS that are unresolv
labswiki and labtestwiki are copies of Wikitech, which is maintained and
dumped in a special fashion. You can find those dumps here:
https://dumps.wikimedia.org/other/wikitech/dumps/
uk.wikiversity.org does not exist.
ecwikimedia, as you rightly note, is private.
The remaining wikis have all been d
that would do all of this
> automatically, storing to a dedup volume.
>
>
>
> That’s my plan, anyway.
>
>
>
> *From:* Ariel Glenn WMF
> *Sent:* Wednesday, 29 July 2020 4:49 PM
> *To:* Count Count
> *Cc:* griffin tucker ;
> xmldatadumps-l@lists.wikimedi
The basic problem is that the page content dumps are ordered by revision
number within each page, which makes good sense for dumps users but means
that the addition of a single revision to a page will shift all of the
remaining data ,resulting in different compressed blocks. That's going to
be true
Dear Rajakumaran Archulan,
Older dumps can often be found on the Internet Archive. The February 2017
full dumps for the English language Wikipedia are here:
https://archive.org/details/enwiki-20170201
A reminder for all new and older members of this list: comprehensive
documentation for dumps use
NOTE: I did not produce the HTML dumps, they are being managed by another
team.
If you are interested in weighing in on the output format, what's missing,
etc, here is the phabricator task: https://phabricator.wikimedia.org/T257480
Your comments and suggestions would be welcome!
Ariel
___
RDF dumps of structured data from commons are now available at
https://dumps.wikimedia.org/other/wikibase/commonswiki/ They are run on a
weekly basis.
See https://lists.wikimedia.org/pipermail/wikidata/2020-July/014125.html
for more information.
Enjoy!
They aren't, but the rsync copying files to the web server is behind. See
https://phabricator.wikimedia.org/T254856 for that. They'll catch up in the
next day or so.
Ariel
On Wed, Jun 10, 2020 at 7:36 PM Bruce Myers via Xmldatadumps-l <
xmldatadumps-l@lists.wikimedia.org> wrote:
> The wikidatawi
I see files there now so maybe there was a delay in production or the rsync.
Ariel
On Mon, May 4, 2020 at 9:12 AM Mehdi GUIRAUD
wrote:
> Hello,
> If it was something advertised, sorry. Where should I get this kind of
> info. Otherwise, It seems that the dumps had stopped at 1500 for today the
The WikiTaxi software is maintained by a group unaffiliated with the
Wikimedia Foundation, if it is maintained at all. I see that the wiki (
www.wikitaxi.org) has not been updated in years. There is a contact email
listed there which you might try: m...@wikitaxi.org
The parts you highlight are ref
You might check our archives as well as archive.org: see
https://meta.wikimedia.org/wiki/Data_dumps/Finding_older_xml_dumps if you
have not already done so.
Otherwise perhaps someone on the list will have a copy available.
Ariel
On Thu, Apr 30, 2020 at 1:15 PM Katja Schmahl
wrote:
> Hi all,
>
Good morning!
New weekly dumps are available [1], containing the content of the tables
used by the MachineVision extension [2]. For information about these
tables, please see [3].
If you decide to use these tables, as with any other dumps, I would be
interested to know how you use them; feel fre
For the past few years we have not dumped private tables at all; they would
not be accessible to the public in any case, and they do not suffice as a
backup in case of catastrophic failure.
We are therefore removing the feature to dump private tables along with
public tables in a dump run. Anyone
, 2020 at 9:16 AM Ariel Glenn WMF wrote:
> Thanks for this report!
>
> This bug must have been introduced in my recent updates to file listing
> methods.
>
> The multistream file is produced and available for download by changing
> the file name in the download url.
>
> I
Thanks for this report!
This bug must have been introduced in my recent updates to file listing
methods.
The multistream file is produced and available for download by changing the
file name in the download url.
I'll have a look Monday to see about fixing up the index.html output
generation.
Ar
As mentioned earlier on the xmldatadumps-l, the dumps are running very slow
this month, ince the vslow db hosts they use are also serving live traffic
during a tables migration. Even manual runs of partial jobs would not help
the situation any, so there will be NO SECOND DUMP RUN THIS MONTH. The
Ma
Hello everybody,
Those of you who follow the dumps closely may have notice that they are
running slower than usual this month. That is because the db servers on
which they run are also serving live traffic, so that a wikidata-related
migration can complete before the end of the month.
I will try
Happy almost March, everyone!
Kowiki dumps jobs now take long enough to run for certain steps that the
wiki has been moved to the 'big wikis' list. This means that 6 parallel
jobs will produce output for stubs and page content dumps, similarly to
frwiki, dewiki and so on. See [1] for more.
This w
Good morning!
We are a bit delayed due to some code changes that need to go in. We hope
to make the switch in March; I'll send an update with the target date when
all patches have been deployed. My apologies for not updating the list.
You can follow the progress of this changeover on
https://pha
The queries to get page and revision metadata are ordered by page id, and
within each page, by revision id. This is guaranteed.
The behavior of rev_parent_id is not guaranteed however, in certain edge
cases. See e.g. https://phabricator.wikimedia.org/T193211
Anyone who uses this field care to weig
a few minutes, cutting out many hours from the dump runs overall.
Please check your tools using the files linked in the previous emails and
make sure that they work.
Thanks!
Ariel
On Thu, Dec 5, 2019 at 12:01 AM Ariel Glenn WMF wrote:
> if you use one of the utilities listed here:
if you use one of the utilities listed here:
https://phabricator.wikimedia.org/T239866
I'd like you to download one of the 'multistream' dumps and see if your
utility decompresses it fully or not (you can compare the md5sum of the
decompressed content to the regular file's decompressed content and
We plan to move to the new schema for xml dumps for the February 1, 2020
run. Update your scripts and apps accordingly!
The new schema contains an entry for each 'slot' of content. This means
that, for example, the commonswiki dump will contain MediaInfo information
as well as the usual wikitext.
Currently, the abstracts dump for Wikidata consists of 62 million entries,
all of which contain instead of any real
abstract. Instead of this, I am considering producing abstract files that
would contain only the mediawiki header and footer and the usual siteinfo
contents. What do people think abo
All dumps were interrupted for a period of several days due to a MediaWiki
change. See https://phabricator.wikimedia.org/T232268 for details.
Ariel
On Wed, Sep 11, 2019 at 4:43 PM colin johnston
wrote:
> Any news on retention time for backups as well :)
>
> Col
>
>
> > On 11 Sep 2019, at 14:38,
Greetings dumps users, remixers and sharers!
I'm happy to announce that we have another mirror of the last 5 XML dumps,
located in the United States, for your downloading pleasure.
All the information you need is here:
https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_
The production of revision history content files takes between 2.5 and 3
days for each of them; these are the longest to run of the wikis not yet
parallelized.
I plan to switch them over for the August 1st run; please adjust your
scripts accordingly. Follow along if you are interested, at
https://
This dump was incomplete due to a problem with MediaWiki code. It was
removed so that scripts such as yours would not process a file with half
the entities in it.
This week's run should provide a new and complete file. For more
information, you can follow along on the Phabricator task:
https://pha
Hello dumps users and re-users!
As you know, some wikis are large enough that we produce dumps of some
files in 6 pieces in parallel. We'll begin doing this for svwiki starting
on July 1. You can follow along on https://phabricator.wikimedia.org/T226200
if interested. If you have not previously wo
You can find some older dumps at https://dumps.wikimedia.org/archive/ (see
https://meta.wikimedia.org/wiki/Data_dumps/Finding_older_xml_dumps for more
about finding older dumps in general). I didn't see the March 2006 files
but these https://dumps.wikimedia.org/archive/enwiki/20060816/ are later in
The number should be around 19414056, the same number of pages in the
stubs-articles file.
On Tue, May 28, 2019 at 8:35 AM Sigbert Klinke
wrote:
> Hi,
>
> I would be interested to know how many pages in
> enwiki-latest-pages-articles.xml . My own count gives 19,4 Mio. pages.
> Can this be, at le
Folks who use the Wikimedia projects may have heard that a url shortener is
now available for them [1]. We now dump a list of the urls and their
shortcuts once a week. The first such dump is available for download
already [2]. It's been a long time coming. Enjoy!
Ariel
[1] https://meta.wikimedia.
That looks like a bug. Could you please report it in phabricator (
https://phabricator.wikimedia.org/maniphest/task/edit/form/1/) and tag the
report with Dumps-Generation? Thanks!
Ariel
On Fri, May 10, 2019 at 11:30 AM Aron Bergman wrote:
> Hi,
> I've recently taken interest in the Wikipedia da
We don't have copies of dumps from those years, but you can find some at
the Internet Archive. Try, for example,
https://archive.org/search.php?query=subject%3A%22data%20dumps%22%20AND%20NOT%20incremental%20AND%20wikipedia%20AND%20english
Good luck!
Ariel
On Thu, Mar 14, 2019 at 10:50 AM 泥 wrot
Those of you watching the xml/sql dumps run this month may have noticed
some dump failures today. These were caused by depooling of the database
server for maintenance while the dump hosts were querying it. The jobs in
question should be rerun automatically over the next few days, and I'll be
keepi
; dumps/mirrored updated to reflect compliance of removal.
>
> Colin
>
>
> On 4 Mar 2019, at 09:24, Ariel Glenn WMF wrote:
>
> All of the information in these mirrored dump files is publicly available
> to any user; no private information is provided. For GDPR-specific issu
red information ?
> How is retention guidelines followed with this mirrored information ?
>
> Colin
>
>
>
> On 4 Mar 2019, at 08:52, Ariel Glenn WMF wrote:
>
> Excuse this very late reply. The index.html page is out of date but the
> mirrored directories for various cur
Excuse this very late reply. The index.html page is out of date but the
mirrored directories for various current runs are there. I'm checking with
a colleague about making sure the index page gets copied over.
Ariel
On Wed, Feb 6, 2019 at 1:14 PM Mariusz "Nikow" Klinikowski <
mariuszklinikow...@g
Hey folks,
We've had a request to reschedule the way the various wikidata entity dumps
are run. Right now they go once a week on set days of the week; we've been
asked about pegging them to specific days of the month, rather as the
xml/sql dumps are run. See https://phabricator.wikimedia.org/T2161
I am happy to announce a new mirror site, located in Canada, which is
hosting the last two good dumps of all projects. Please welcome and put to
good use https://dumps.wikimedia.freemirror.org/ !
I want to thank Adam for volunteering bandwidth and space and for getting
everything set up. More info
Folks may have noticed already that the links presented for downlod of
pages-articles-multistream dumps are incorrect on the web pages for big
wikis. The files exist for download but the wrong links were created.
I'll be looking into that and fixing it up over the next days, but in the
meantime yo
TL;DR: Don't panic, the single articles multistream bz2 file for big wikis
will be produced shortly after the new smaller fles.
Long version: For big wikis which already have split up article files, we
now produce one multistream file per article file. These are now recombined
into a single file l
If you use recompressxml in the mwbzutils package, as of version 0.0.9
(just deployed) it no longer writes bz2 compressed data by default to
stdout; instead it relies on the extension of the output file and will
write either gzipped, bz2 or plain text output, accordingly. This means
that if it is d
The dumps are not blocked but a change in the way stubs dumps are processed
has slowed down the queries considerably. This issue is being tracked here:
https://phabricator.wikimedia.org/T207628
Ariel
On Mon, Oct 22, 2018 at 1:07 PM Nicolas Vervelle
wrote:
> Hi,
>
> The dump for enwiki seems to
If you are a user of the adds-changes (so-called "incremental") dumps, read
on.
All dumps use database servers in our eqiad data center. For the past
month, the wiki projects have used primary database masters out of our
codfw data center; on one of these days, a number of revisions did not
replic
These issues have been cleared up and flow dumps are being produced
properly.
Ariel
On Thu, Sep 6, 2018 at 1:51 PM Ariel Glenn WMF wrote:
> This is being tracked here: https://phabricator.wikimedia.org/T203647
> You probably won't see much in the way of updates until all th
Somehow I committed but did not deploy one of the changes, so local testing
worked great and the production run of course failed. The missing code is
now live (I checked) so everything should be back to normal tomorrow.
Ariel
On Mon, Oct 1, 2018 at 5:26 PM Ariel Glenn WMF wrote:
> The fail
The failure was a side effect of a configuration change that will,
ironically enough, make it easier to test the 'other' dumps, including
eventually these ones, in mediawiki-vagrant; see
https://phabricator.wikimedia.org/T201478 for more information about that.
They should run tomorrow and contain
Hey dumps users and contributors!
This Wednesday, Oct 3 at 2pm PST(21:00 UTC, 23:00 CET) in #wikimedia-office
TechCom will have a discussion about the RFC for the upcomign xml schema
update needed for Multi-Content Revision content.
Phabricator task: https://phabricator.wikimedia.org/T199121
Tech
This is being tracked here: https://phabricator.wikimedia.org/T203647
You probably won't see much in the way of updates until all the jobs ahve
completed; they are in progress now.
Ariel
On Thu, Sep 6, 2018 at 11:02 AM, Ariel Glenn WMF
wrote:
> Hello dumps users!
>
> You may hav
Hello dumps users!
You may have noticed that a number of wikis have had dumps failures on the
flow dumps step. The cause is known (a cleanup of mediawiki core that
didn't carry over to the extension) and these jobs should be fixed up today
or tomorrow.
Ariel
__
Starting September 1, huwiki and arwiki, which both take several days to
complete the revsion history content dumps, will be moved to the 'big
wikis' list, meaning that they will run jobs in parallel as do frwiki,
ptwiki and others now, for a speedup.
Please update your scripts accordingly. Thank
These jobs did not run today due to a change in how maintenance scripts
handle unknown arguments. The problem has been fixed and the jobs should
run regularly tomorrow.
___
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikim
As many of you may know, MultiContent Revisions are coming soon (October?)
to a wiki near you. This means that we need changes to the XML dumps
schema; these changes will likely NOT be backwards compatible.
Initial discussion will take place here:
https://phabricator.wikimedia.org/T199121
For bac
Good morning!
The pages-meta-history dumps for hewiki take 70 hours these days, the
longest of any wiki not already running with parallel jobs. I plan to add
it to the list of 'big wikis' starting August 1st, meaning that 6 jobs will
run in parallel producing the usual numbered file output; look a
TL;DR:
Scripts that reply on xml files numbered 1 through 4 should be updated to
check for 1 through 6.
Explanation:
A number of wikis have stubs and page content files generated 4 parts at a
time, with the appropriate number added to the filename. I'm going to be
increasing that thi month to 6.
pagecounts-ez sets disappeared from
> dumps.wikimedia.org starting this date. Is that a coincidence ?
> Is it https://phabricator.wikimedia.org/T189283 perhaps ?
>
> DJ
>
> On Thu, Mar 29, 2018 at 2:42 PM, Ariel Glenn WMF
> wrote:
> > Here it comes:
> >
> >
Folks,
As you'll have seen from previous email, we are now using a new beefier
webserver for your dataset downloading needs. And the old server is going
away on TUESDAY April 10th.
This means that if you are using 'dataset1001.wikimedia.org' or the IP
address itself in your scripts, you MUST chan
Those of you that rely on the abstracts dumps will have noticed that the
content for wikidata is pretty much useless. It doesn't look like a
summary of the page because main namespace articles on wikidata aren't
paragraphs of text. And there's really no useful summary to be generated,
even if we w
dumps.
Please forward wherever you deem appropriate. For further updates, don't
forget to check the Phab ticket! https://phabricator.wikimedia.org/T179059
On Mon, Mar 19, 2018 at 2:00 PM, Ariel Glenn WMF
wrote:
> A reprieve! Code's not ready and I need to do some timing tests, s
A reprieve! Code's not ready and I need to do some timing tests, so the
March 20th run will do the standard recombining.
For updates, don't forget to check the Phab ticket!
https://phabricator.wikimedia.org/T179059
On Mon, Mar 5, 2018 at 1:10 PM, Ariel Glenn WMF wrote:
> P
Please forward wherever you think appropriate.
For some time we have provided multiple numbered pages-articles bz2 file
for large wikis, as well as a single file with all of the contents combined
into one. This is consuming enough time for Wikidata that it is no longer
sustainable. For wikis whe
It turns out that this happens for exactly 27 pages, those at the end of
each enwiki-20180220-stub-articlesXX.xml.gz file. Tracking here:
https://phabricator.wikimedia.org/T188388
Ariel
On Tue, Feb 27, 2018 at 10:45 AM, Ryan Hitchman wrote:
> Multiple pages are missing from the enwiki pages-ar
Because the first run of the month was delayed, we need a couple days delay
now for the second run to start, so that the last of the wikis (dewiki) ca
finish up the first run. I expect the second monthly run to finish on time
however, once started.
Ariel
__
I checked the files directly, both the pages.sql.gz and the
categorylinks.sql.gz files for 20170920. The page is listed:
$ zcat enwiki-20170920-page.sql.gz | sed -e 's/),/),\n/g;' | grep
Computational_creativity | more
(16300571,0,'Computational_creativity','',0,0,0,0.718037721126,'20170903222622
Rsync of xml/sql dumps to the web server is now running on a rolling basis
via a script, so you should see updates regularly rather than "every
$random hours". There's more to be done on that front, see
https://phabricator.wikimedia.org/T179857 for what's next.
Ariel
_
These jobs are currently written uncompressed. Starting with the next run,
I plan to write these as gzip compressed files. This means that we'll save
a lot of space for the larger abstracts dumps. Additionally,only status and
html files will be uncompressed, which is convenient for maintenance
rea
sible that some index.html files may contain links to
files which did not get picked up on the rsync. They'll be there sometime
tomorrow after the next rsync.
Ariel
On Mon, Oct 30, 2017 at 5:39 PM, Ariel Glenn WMF
wrote:
> As was previously announced on the xmldatadumps-l list, the sql/xml
As was previously announced on the xmldatadumps-l list, the sql/xml dumps
generated twice a month will be written to an internal server, starting
with the November run. This is in part to reduce load on the web/rsync/nfs
server which has been doing this work also until now. We want separation
of
This issue will be tracked here. https://phabricator.wikimedia.org/T178893
As it says on the ticket, I hope to get this done in time for the Nov 1 run.
Here is what it means for folks who download the dumps:
* First off, the host where the dumps are generated will no longer be the
host that serves
The Wikimedia Foundation does not have an official site for dumps
torrents. It would be nice to add them to
https://meta.wikimedia.org/wiki/Data_dump_torrents however.
Ariel
On Mon, Sep 18, 2017 at 10:16 AM, Federico Leva (Nemo)
wrote:
> Felipe Ewald, 18/09/2017 04:31:
>
>> Is this the officia
Dumps watchers may have noticed that several zh wiki project dumps failed
the abstract dumps step today. This is probably fixed, tracking here:
https://phabricator.wikimedia.org/T174906
I'll be sure it's fixed when a few more wikis have run without problems.
Ariel
___
Dumps are running again, though the root cause of the nfs incident is still
undetermined.
Ariel
On Wed, Jul 5, 2017 at 5:08 PM, Ariel Glenn WMF wrote:
> Our dumps server is having nfs issues; we're debugging it; debugging is
> slow and tedious. You can follow along here should y
Our dumps server is having nfs issues; we're debugging it; debugging is
slow and tedious. You can follow along here should you wish all the gory
details: https://phabricator.wikimedia.org/T169680
As soon as service is back to normal I'll send an update here to the list.
Ariel
___
A heads up to anyone who uses these, builds packages for them, etc: after a
bit of tlc they have been moved to their own repo in the 'master' branch:
clone from gerrit:
operations/dumps/import-tools.git
or browse at
https://phabricator.wikimedia.org/diffusion/ODIM/
Patches to gerrit, bug report
I needed to clean up a bunch of tech debt before redoing the page content
dump 'divvy up into small pieces and rerun if necessary' mechanism. I
cleaned up a bit too much and broke stub and article recombine dumps in the
process.
The fix has been deployed, I shot all the dump processes, marked the
Those of you following along will notice that dewiki and wikidatawiki have
more files than usual for the page content dumps (pages-meta-history).
We'll have more of this going forward; if I get the work done in time,
starting April we'll split up these jbos ahead of time into small files
that can b
That's great news, thanks for taking the initiative!
Ariel
On Thu, Mar 16, 2017 at 5:57 AM, Felipe Ewald
wrote:
> Hello everyone!
>
>
>
> For those who like torrent and download dumps files, good news!
>
>
>
> I add the torrent for “enwiki-20170301-pages-meta-current.xml.bz2” and
> “enwiki-2017
Again thanks to Ladsgroup, this is a change to the per-dump index.html
page, and you can see sample screenshots here:
https://phabricator.wikimedia.org/T155697
Please weigh in on the ticket. I'd like to get any issues resolved and
have this in play by the time the next dump run starts on March 20
By now you know the drill: have a look at the changes [1] and weigh in on
the ticket [2]. Silence = consent to merge on Thursday evening. Thanks
again to Ladsgroup for the work. Happy Monday!
Ariel
[1] https://gerrit.wikimedia.org/r/#/c/337264/
[2] https://phabricator.wikimedia.org/T155697
_
1 - 100 of 142 matches
Mail list logo