hoo created this task. hoo added projects: Datasets-General-or-Unknown, Wikidata. |
TASK DESCRIPTION
Given Wikidata is currently grows at 3-10% a week, we need to make the Wikidata entity dumpers keep up with that.
The changes in batch size (4eedfb48e9fdc93eea13d9fd3bd341e66c1abfbc) and https://github.com/wmde/WikibaseDataModel/pull/762 will already ease some of the pain, but given the immense growth, this can probably hardly offset four weeks of Wikidata growth.
Possible things to do:
- Create a "master dump" (or some such) which all other dumps can be derived from (this will ease the pain on the DBs, but hardly considering CPU time)
- Increase the number of runners further (from 5 currently)
- Try to derive old dumps from new ones (not quite easy to do and not sure how much to gain here)
- Do more profiling and try to find more low-hanging fruits (like the examples above, or T157013)
- Switch away from PHP5 to PHP7 or HHVM
- …
TASK DETAIL
EMAIL PREFERENCES
To: hoo
Cc: Aklapper, ezachte, daniel, Lydia_Pintscher, mark, ArielGlenn, bd808, Liuxinyu970226, aude, JanZerebecki, Jimkont, Denis.bykov, Ricordisamoa, PokestarFan, hoo, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, Svick, Mbch331, jeremyb
Cc: Aklapper, ezachte, daniel, Lydia_Pintscher, mark, ArielGlenn, bd808, Liuxinyu970226, aude, JanZerebecki, Jimkont, Denis.bykov, Ricordisamoa, PokestarFan, hoo, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, Svick, Mbch331, jeremyb
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs