ArielGlenn added a comment.
@fgiunchedi I notice that in some cases phab tasks are autocreated when
systemd units fail. Is that true for systemd jobs on snapshot hosts? Could we
get tagged on those (Dumps-Generation) or could we get emails from those
(ops-dumps@wm.o)?
TASK DETAIL
https
ArielGlenn closed subtask T226093: Capacity planning for Commons Structured
Data as "Resolved".
TASK DETAIL
https://phabricator.wikimedia.org/T68108
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: Mholloway, Ladsgroup, M
ArielGlenn closed this task as "Resolved".
ArielGlenn claimed this task.
ArielGlenn added a comment.
There's no point in having this open for a once a year check in, so I'll go
ahead and close it. When capacity planning needs to be done for dbs in the
regular course of
ArielGlenn added a comment.
In T226093#8512308 <https://phabricator.wikimedia.org/T226093#8512308>,
@LSobanski wrote:
> The task's original intent was to cover planning "over the next 3 years"
starting in 2019. @ArielGlenn is the task still relevant, can it be c
ArielGlenn added a comment.
In T138208#7844298 <https://phabricator.wikimedia.org/T138208#7844298>,
@Ladsgroup wrote:
> It's a bit hard to measure but it's probably fixed.
That would be wonderful if true. Let's leave this open for a while yet just
in case
ArielGlenn added a comment.
Hey jsut a note that we saw another failure:
Output of systemd timer for '/usr/local/bin/dumpwikibaserdf.sh -p wikidata
-d truthy -f nt'
SYSTEMDTIMER noreply@snapshot1008.eqiad.wmnet via wikimedia.org
ERROR 2013 (HY
ArielGlenn added a comment.
I am aware of and following this discussion but right now, my responsiveness
on this task will be slow, most of my time needs to go to getting my teammate
who will be dumps co-maintainer up to speed. Please bear with us.
TASK DETAIL
https
ArielGlenn added a comment.
Hm I wonder who we should add that would take on restarting these jobs if
they deem it useful. Uh. Deferring for now since I have no bright ideas, and
noting that here. Thanks again!
TASK DETAIL
https://phabricator.wikimedia.org/T300240
EMAIL PREFERENCES
ArielGlenn added a comment.
Uh @dcausse Do you want to add someone to the ops-dumps alias so that you can
be informed in these instances and perhaps schedule a restart of the job(s)? It
would be easy enough. Sorry to ask after the task is closed!
TASK DETAIL
https
ArielGlenn added a comment.
I saw an error from the cron job, it was sent to ops-dumps, which someone
from WMDE should be on as well I think. The error looked to me like it had to
do with a db server being depooled or otherwise unavailable:
ERROR 2013 (HY000): Lost connection to MySQL
ArielGlenn added a comment.
Thanks. I was pretty careful with my testing for the last fix, making sure
that in production the patch redirected to a vslow/dump server. But I may have
overlooked something. :-(
TASK DETAIL
https://phabricator.wikimedia.org/T138208
EMAIL PREFERENCES
https
ArielGlenn added a comment.
I hate to ask but can we capture any queries?
TASK DETAIL
https://phabricator.wikimedia.org/T138208
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: LSobanski, Ladsgroup, Marostegui, Addshore
Restricted Application added a project: wdwb-tech.
TASK DETAIL
https://phabricator.wikimedia.org/T238972
WORKBOARD
https://phabricator.wikimedia.org/project/board/1519/
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: Christian75
ArielGlenn added a comment.
The above patch was deployed with the train everywhere, so the specific set
of queries should no longer be directed to non-vslow/dump db servers. If that's
the cas, we are now back to the harder issue of what to do when a db server is
depooled, and I think
ArielGlenn closed this task as "Declined".
ArielGlenn added a comment.
I'm goin to go ahead and close this as declined. Feel to re-open if things
change in the future.
TASK DETAIL
https://phabricator.wikimedia.org/T297470
EMAIL PREFERENCES
https://phabricator.wikimed
ArielGlenn added a comment.
The patch at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/747455/ is
tested and ready to go, and in line with the way existing dumps scripts work.
So I'd like to go ahead with it.
TASK DETAIL
https://phabricator.wikimedia.org/T138208
EMAIL PREFER
ArielGlenn added a comment.
There is a complicated set of python scripts that coordinate the dump jobs
for each wiki during the two monthly runs.
https://wikitech.wikimedia.org/wiki/Dumps/Current_Architecture gives an
overview.
https://www.mediawiki.org/wiki/SQL/XML_Dumps
ArielGlenn added a comment.
In T138208#7611718 <https://phabricator.wikimedia.org/T138208#7611718>,
@Ladsgroup wrote:
> In T138208#7611712 <https://phabricator.wikimedia.org/T138208#7611712>,
@ArielGlenn wrote:
>
>> Not yet; I need to talk with someone mo
ArielGlenn added a comment.
In T138208#7611708 <https://phabricator.wikimedia.org/T138208#7611708>,
@Marostegui wrote:
> In T138208#7571559 <https://phabricator.wikimedia.org/T138208#7571559>,
@gerritbot wrote:
>
>> Change 747455 had a related patch set
ArielGlenn added a comment.
Note that the checksum files for those dumps are available for download as
well, since they are provided along with the main dump output files to all
mirrors.
Someone from WMCS will probably need to look at this (again) if the
discussion is being re-opened
ArielGlenn added a comment.
Thanks for this thought, Daniel. I think it's better if I can pass the
dbgroupdefault parameter to the maintenance script itself, instead of hacking
something into getBlob(). But I do need to check if that's going to work ok.
The longer term fix you men
ArielGlenn added a comment.
As I feared, fetchText.php calls
MediaWikiServices::getInstance()->getBlobStore()->getBlob() which gets a db
replica connection on its own, with no opportunity for us to ask that it be in
the vslow/dump group. We might be able to use the -dbgroupdefaul
ArielGlenn added a comment.
The above is happening from pages-meta-history dumps, and I will look into it
later today. The snapshot1008 (wikidata entity) dumps will be harder.
TASK DETAIL
https://phabricator.wikimedia.org/T138208
EMAIL PREFERENCES
https://phabricator.wikimedia.org
ArielGlenn added a comment.
The reason only those two snapshot hosts are involved is undoubtedly because
dumps on the others have finished for this run.
TASK DETAIL
https://phabricator.wikimedia.org/T138208
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel
ArielGlenn added a comment.
We don't provide torrent files from here because this is something that can
be done by members of the community. I would get in touch with one of the
people maintaining any of the torrents listed here:
https://meta.wikimedia.org/wiki/Data_dump_torrents and s
ArielGlenn added a comment.
In T222985#7164049 <https://phabricator.wikimedia.org/T222985#7164049>,
@Mitar wrote:
> Are you saying that existing wikidata json dumps can be decompressed in
parallel if using lbzip2, but not pbzip2?
lbzip2 is format-compatible with bzip2 and
ArielGlenn added a comment.
lbzip2 decompresses in parallel as well. We use that for compression of the
SQL/XML dumps.
TASK DETAIL
https://phabricator.wikimedia.org/T222985
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: Mitar
ArielGlenn added a comment.
What are the next steps on this? Should I be tweaking a manifest someplace?
TASK DETAIL
https://phabricator.wikimedia.org/T281267
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jbond, ArielGlenn
Cc: Addshore
ArielGlenn added a subscriber: hoo.
ArielGlenn added a comment.
I am proactively adding @hoo as he can provide some insight and perhaps tag
others as well.
TASK DETAIL
https://phabricator.wikimedia.org/T209390
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel
ArielGlenn added a comment.
In T279518#6981710 <https://phabricator.wikimedia.org/T279518#6981710>, @hoo
wrote:
>> Icinga sends alerts, and those would come to me I guess, which is probably
not the best outcome :-)
>
> We could use the `wikidata` cont
ArielGlenn added a comment.
Icinga sends alerts, and those would come to me I guess, which is probably
not the best outcome :-)
I believe that we use MAILTO for everything in the dumpsgen crontab, but the
question is whether there's a nice alias to send emails to, or whether we wan
ArielGlenn added a project: Dumps-Generation.
Restricted Application added a project: wdwb-tech.
TASK DETAIL
https://phabricator.wikimedia.org/T279518
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: Lydia_Pintscher, ArielGlenn
ArielGlenn added a comment.
This is now deployd and will be in effect for next week's lexeme run.
TASK DETAIL
https://phabricator.wikimedia.org/T277300
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: ArielGlenn
ArielGlenn added a project: Dumps-Generation.
Restricted Application added a project: wdwb-tech.
TASK DETAIL
https://phabricator.wikimedia.org/T278031
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: Mitar, Aklapper, Invadibot
ArielGlenn closed this task as "Resolved".
ArielGlenn added a comment.
Since @hoo validated the dump from the past week, verifiying that the current
dump generation process is fixed, we can now close this task. Thanks everyone!
TASK DETAIL
https://phabricator.wikimedia.org/T276
ArielGlenn added a comment.
I'll leave this open until the run is complete and folks have had time to try
to use them, so probably through the coming weekend.
TASK DETAIL
https://phabricator.wikimedia.org/T276643
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/
ArielGlenn added a comment.
In T276643#6890308 <https://phabricator.wikimedia.org/T276643#6890308>,
@Ash20001 wrote:
> Will this patch be included in the next dump or can be put back in the last
two dumps (regenerate dump)
This should be in time for the dump that will be
ArielGlenn added a comment.
These look fine to me from today, and I've done all the buster-side testing
so that's ok too. Closing this! Ah, do we want to anounce it anywhere though?
Maybe I won't close it pending that answer. Places it could be announced:
xmldatadump
ArielGlenn added a comment.
I am doing some prep work before I try to test this on buster. Getting close!
TASK DETAIL
https://phabricator.wikimedia.org/T264883
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: hoo, ArielGlenn
Cc: noarave
ArielGlenn added a comment.
mysql.php, used for wikidata entity dumps, does not apparently correctly
handle the --group flag. it's unclear to me what it does do, I need to check
into this sometime later. The queries run by it are extremely short so the
impact is minimal, but it still
ArielGlenn added a comment.
In T138208#6811418 <https://phabricator.wikimedia.org/T138208#6811418>,
@Addshore wrote:
> In T138208#6809784 <https://phabricator.wikimedia.org/T138208#6809784>,
@ArielGlenn wrote:
>
>> This is because the maintenance scripts tha
ArielGlenn added a comment.
These are for the weekly wikidata "entity dumps", and so separate from the
main xml/sql dumps implicated in the other task.
TASK DETAIL
https://phabricator.wikimedia.org/T147169
EMAIL PREFERENCES
https://phabricator.wikimedia.org/sett
ArielGlenn added a comment.
This is because the maintenance scripts that do "small" page ranges take
several hours to complete. I will keep this in mind for when we can go to
multiple bz2 streams in the page content history dumps; I'll be able to dump
much smaller ranges
ArielGlenn added a comment.
All set. We should check on these again in the middle of next week, as the
run starts on Monday at ridiculous-o-clock when we are all sleeping.
TASK DETAIL
https://phabricator.wikimedia.org/T264883
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings
ArielGlenn added a comment.
In T264883#6786811 <https://phabricator.wikimedia.org/T264883#6786811>,
@Lucas_Werkmeister_WMDE wrote:
> Are you sure they ran? That directory only contains RDF dumps as far as I
can tell (Turtle and NTriples), we’ve been generating those fo
ArielGlenn added a comment.
These ran and are available at
https://dumps.wikimedia.org/other/wikibase/wikidatawiki/20210122/
How do they look?
TASK DETAIL
https://phabricator.wikimedia.org/T264883
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences
ArielGlenn added a comment.
Following up on this, has there been any more discussion about making the
JSON a little less wordy/disk-filly? I don't see any other path forward on this
in the short to medium term.
TASK DETAIL
https://phabricator.wikimedia.org/T221504
EMAIL PREFER
ArielGlenn added a project: User-ArielGlenn.
TASK DETAIL
https://phabricator.wikimedia.org/T246415
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Michael, ArielGlenn
Cc: ArielGlenn, Michael, Marostegui, Ladsgroup, WMDE-leszek, Aklapper,
Addshore
ArielGlenn added a comment.
All of those tables are there: see
https://gerrit.wikimedia.org/r/c/operations/puppet/+/527505 and current
https://github.com/wikimedia/puppet/blob/production/modules/snapshot/files/dumps/table_jobs.yaml#L142
Is there anything else needed
ArielGlenn added a comment.
In T264298#6511634 <https://phabricator.wikimedia.org/T264298#6511634>,
@Lucas_Werkmeister_WMDE wrote:
> We also realized that the `tablejobs.yaml` file didn’t mention the new
tables (the replacement for `wb_terms`: `wbt_{item,property}_terms`,
`
ArielGlenn removed projects: Wikidata, Wikidata-Query-Service, Analytics.
TASK DETAIL
https://phabricator.wikimedia.org/T264850
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JAllemandou, ArielGlenn
Cc: Lucas_Werkmeister_WMDE, ArielGlenn, Milimetric
ArielGlenn added a comment.
In T264850#6531377 <https://phabricator.wikimedia.org/T264850#6531377>,
@Milimetric wrote:
> @ArielGlenn is this something you'd know about or know who to point me to?
I think the wdqs folks are going to be your best bet, I've added
ArielGlenn added a comment.
echo -n ânești | od -t x1
000 c3 a2 6e 65 c8 99 74 69
You appear to be seeing a string representation of the non-ascii characters
as hex bytes, i.e. xc3 xa2 ne xc8 x99 ti. What command are you using to
display the test in the file, and on what
ArielGlenn added projects: Wikidata-Query-Service, Dumps-Generation.
Restricted Application added a project: Wikidata.
TASK DETAIL
https://phabricator.wikimedia.org/T264850
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JAllemandou, ArielGlenn
Cc
ArielGlenn added a comment.
They are indeed gone from dumpsdata1002; we keep fewer back issues there,
since we're not serving them anywhere but only rsyncing them off. We keep the
last 3 wikibase dumps, see
https://github.com/wikimedia/puppet/blob/production/modules/dumps/manifest
ArielGlenn added a comment.
No impact. Only tables actually in the database are dumped, a check of each
table in the list is done beforehand. The code can be cleaned up anyways just
to be nice though.
TASK DETAIL
https://phabricator.wikimedia.org/T264298
EMAIL PREFERENCES
https
ArielGlenn added subscribers: Gehel, ArielGlenn.
ArielGlenn added a comment.
@Gehel was just asking about these yesterday and whether he should clean them
up. The procedure is: delete first from the appropriate dumpsdata host
(dumpsdata1002) where they are first written. Then delete them
ArielGlenn added a comment.
I renew my question above in T220883#5185999
<https://phabricator.wikimedia.org/T220883#5185999> and if someone can answer
this, I can work with them to make these go live.
TASK DETAIL
https://phabricator.wikimedia.org/T220883
EMAIL PREFERENCES
ArielGlenn closed this task as "Resolved".
ArielGlenn claimed this task.
ArielGlenn added a comment.
Re-enabled, checked daily runs, they look good, so I'm resolving this.
Thanks, everybody!
TASK DETAIL
https://phabricator.wikimedia.org/T260232
EMAIL PREF
ArielGlenn added a comment.
Updated (ouch!) F32352585: commons_slots.png
<https://phabricator.wikimedia.org/F32352585>
TASK DETAIL
https://phabricator.wikimedia.org/T226093
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: Lad
ArielGlenn added a comment.
In T260232#6448382 <https://phabricator.wikimedia.org/T260232#6448382>,
@gerritbot wrote:
> Change 625642 **merged** by jenkins-bot:
> [mediawiki/core@master] don't pass null page id to page related queries for
category change rdf du
ArielGlenn added a comment.
In T260232#6390706 <https://phabricator.wikimedia.org/T260232#6390706>,
@gerritbot wrote:
> Change 620775 had a related patch set uploaded (by ArielGlenn; owner:
ArielGlenn):
> [mediawiki/core@master] don't include null page ids in query l
ArielGlenn created this task.
ArielGlenn added projects: Wikidata, Dumps-Generation.
TASK DESCRIPTION
This change: P12492 <https://phabricator.wikimedia.org/P12492> left the dump
db group empty, and so any attempts to run wikidata entity dumps failed. The
host was added back in ea
ArielGlenn added a comment.
I think we can just move this through and keep our eyes on it.
TASK DETAIL
https://phabricator.wikimedia.org/T261204
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: ArielGlenn, dcausse, Alter-paule
ArielGlenn added a comment.
I took to brute force approach of writing all queries to a log file by
adding the appropriate fopen/fputs/fclose in Database::select (live on
snapshot1010, testbed host). I then ran:
dumpsgen@snapshot1010:/srv/mediawiki$ /usr/bin/php7.2
/srv/mediawiki
ArielGlenn added a comment.
Just for completeness, on db2073 I also I ran the original query with the
crap entry, the show explain showed use of a filesort as above, and the
execution time was... well it's still going, 330 seconds in. I killed it.
TASK DETAIL
ArielGlenn added a comment.
I saw multiple queries with this string in them while camping on the
production vslow and looking at the processlist. I don't know how many of the
queries have this issue.
TASK DETAIL
https://phabricator.wikimedia.org/T260232
EMAIL PREFERENCES
ArielGlenn added a comment.
When I ran the above query on db2073 (codfw dups and vslow host) without the
crap ' ' field in there, it returned in 0.00 seconds. Maybe the bad entries are
a new development?
TASK DETAIL
https://phabricator.wikimedia.org/T260232
EMAIL PREFERENC
ArielGlenn added a comment.
SELECT /* BatchRowIterator::next */ cl_from,cl_to FROM `categorylinks`
WHERE cl_type = 'subcat' AND
ArielGlenn added a comment.
Daily rdf dumps are probably broken until this is resolved, just a fyi for
folks importing these for search purposes.
TASK DETAIL
https://phabricator.wikimedia.org/T260232
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences
ArielGlenn added a project: Wikidata.
TASK DETAIL
https://phabricator.wikimedia.org/T257876
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: Alicezou26, jannee_e, Akuckartz, darthmon_wmde, Nandana, Jony, Lahi, Gq86,
NoohNaeem
ArielGlenn added a comment.
Links latest-full.ttl.bz2 -> 20200116/commons-20200116-full.ttl.bz2 and
latest-full.ttl.gz -> 20200116/commons-20200116-full.ttl.gz have been cleaned
up. Thanks for the suggestion!
TASK DETAIL
https://phabricator.wikimedia.org/T221917
EMAIL PREFERENCES
ArielGlenn added a comment.
It's linked off the 'other datasets' page near the top. But here's the direct
link: https://dumps.wikimedia.org/other/wikibase/commonswiki/
TASK DETAIL
https://phabricator.wikimedia.org/T221917
EMAIL PREFERENCES
https://phabricator.wi
ArielGlenn added a comment.
Updated.F31919691: commons_slots_new.png
<https://phabricator.wikimedia.org/F31919691>
TASK DETAIL
https://phabricator.wikimedia.org/T226093
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: Lad
ArielGlenn added a comment.
@dcausse what's your time frame?
TASK DETAIL
https://phabricator.wikimedia.org/T221917
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: nettrom_WMF, Mahir256, dcausse, EBernhardson, Cparle, Abit,
ArielGlenn added a comment.
Unless folks want to keep it open to work on speeding it up in the future?
TASK DETAIL
https://phabricator.wikimedia.org/T238199
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: SilentSpike, WMDE-leszek
ArielGlenn added a comment.
I see that we're no longer blocked. Does this mean that we're good to go for
weekly runs?
TASK DETAIL
https://phabricator.wikimedia.org/T221917
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Ari
ArielGlenn added a comment.
In T238199#6135018 <https://phabricator.wikimedia.org/T238199#6135018>,
@Ladsgroup wrote:
>
...
> Anyway, Lydia said it's fine to do it tomorrow when it gets announced by
our communication manager. Does that work for you?
A
ArielGlenn added a comment.
Can we do this temporarily while the query is being fixed up? It looks like
it had to be killed in Nov, Feb, Apr, May, so I'd rather temp disable than
require folks to shoot it (and anything else hung as a side effect).
TASK DETAIL
ArielGlenn added a comment.
Can we just skip the updateSpecialPages.php wikidatawiki --override
--only=Fewestrevisions script altogether, instead of shooting it every month?
TASK DETAIL
https://phabricator.wikimedia.org/T238199
EMAIL PREFERENCES
https://phabricator.wikimedia.org
ArielGlenn added a comment.
As I understand it the long running query comes from a monthly cron job.
TASK DETAIL
https://phabricator.wikimedia.org/T252632
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: hoo, ArielGlenn, jannee_e
ArielGlenn created this task.
ArielGlenn added projects: Dumps-Generation, Wikidata.
TASK DESCRIPTION
The weekly run was shot this morning when vslow db connections stalled due to
an unrelated long-running query, see T238199
<https://phabricator.wikimedia.org/T238199>
It can be res
ArielGlenn added a comment.
Hi, just checking in: any progress on invetigating the 'extra' dumps content?
TASK DETAIL
https://phabricator.wikimedia.org/T221917
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: nettrom_WMF
ArielGlenn added subscribers: hoo, ArielGlenn.
ArielGlenn added a comment.
See T248612 <https://phabricator.wikimedia.org/T248612> for that, I believe
@hoo is planning to deploy and restart the week's run today.
TASK DETAIL
https://phabricator.wikimedia.org/T248857
EMAIL
ArielGlenn added a comment.
@Cparle, No blocks on your side, the ball is now in @dcausse 's court. :-)
TASK DETAIL
https://phabricator.wikimedia.org/T221917
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: nettrom_WMF, Mah
ArielGlenn added a comment.
This is pending https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/556346/
and related patches, so we're looking at March 1 if all goes well.
TASK DETAIL
https://phabricator.wikimedia.org/T238972
EMAIL PREFERENCES
https://phabricator.wikimedia.org/set
ArielGlenn added a comment.
In T243701#5855352 <https://phabricator.wikimedia.org/T243701#5855352>,
@Lea_Lacroix_WMDE wrote:
> Over the past weeks, we noticed a huge increase of content in Wikidata.
Maybe that's something worth looking at?
Wikidata content is growing
ArielGlenn added a comment.
Some unexpected (?) triples popping up that @dcausse is looking into, so the
dumps will not be turned on in cron until we have the thumbs up on that. See
T243292 <https://phabricator.wikimedia.org/T243292>
If it turns out the data is all ok, we ca
ArielGlenn added a subtask: T243292: Fix the munger to support commons RDF dump.
TASK DETAIL
https://phabricator.wikimedia.org/T221917
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: Mahir256, dcausse, EBernhardson, Cparle, Abit, Gehel
ArielGlenn added a parent task: T221917: Create RDF dump of structured data on
Commons.
TASK DETAIL
https://phabricator.wikimedia.org/T243292
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: ArielGlenn, Aklapper, dcausse, darthmon_wmde
ArielGlenn added a subscriber: dcausse.
ArielGlenn added a comment.
@dcausse is going to check over the ttl dump and let me know if it looks ok;
if so then I'll flip the switch for generation weekly and make sure there's
cleanup too.
TASK DETAIL
https://phabricator.wikimedia.o
ArielGlenn added a comment.
In https://dumps.wikimedia.org/other/wikibase/commonswiki/ there are two ttl
files, gz and bz2 compressed. Please have a look!
The bash script producing them complained that
/usr/local/bin/dumpwikibaserdf.sh: line 224: setDcatConfig: command not
found
ArielGlenn added a comment.
I found a ticket that mentions use of ttl files so I'll run
/usr/local/bin/dumpwikibaserdf.sh commons full ttl
and keep an eye on it. Running on snapshot1008 in a screen session. Here we
go!
TASK DETAIL
https://phabricator.wikimedia.org/T2
ArielGlenn added a comment.
I plan to try running
/usr/local/bin/dumpwikibaserdf.sh commons full nt
on Thursday morning and see how long it takes with the 8 shards that are
currently configured. @Abit is the nt format the one needed for WDQS testing?
TASK DETAIL
https
ArielGlenn added a comment.
Ran
php /srv/mediawiki/multiversion/MWScript.php
extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki
--batch-size 500 --format nt --flavor full-dump --entity-type mediainfo
--no-cache --dbgroupdefault dump --ignore-missing --first-page-id
ArielGlenn added a comment.
Ran
php /srv/mediawiki/multiversion/MWScript.php
extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki
--batch-size 1000 --format nt --flavor full-dump --entity-type mediainfo
--no-cache --dbgroupdefault dump --ignore-missing --first-page-id
ArielGlenn added a comment.
Note to self that a run of
php /srv/mediawiki/multiversion/MWScript.php
extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki
--batch-size 250 --format nt --flavor full-dump --entity-type mediainfo
--no-cache --dbgroupdefault dump --ignore
ArielGlenn added a comment.
This morning the job was terminated by the oom killer:
[4288057.417443] Out of memory: Kill process 117265 (php) score 868 or
sacrifice child
[4288057.425084] Killed process 117265 (php) total-vm:58241128kB,
anon-rss:56901636kB, file-rss:0kB, shmem-rss
ArielGlenn added a comment.
A batchsize of 50k turned out to be too large. Same with 5k. I'm now running
with a batchsize of 500, which will surely be too small, but at least I am
getting output. I'll check on it tomorrow and see how it's doing.
TASK
ArielGlenn added a comment.
Because I've gotten a nice run in beta with the --ignore-missing flag, I'm
trying a test run on snapshot1008 in a screen session:
php /srv/mediawiki/multiversion/MWScript.php
extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki
--
1 - 100 of 560 matches
Mail list logo