[Wikidata-bugs] [Maniphest] [Changed Subscribers] T226093: Capacity planning for Commons Structured Data
ArielGlenn added a subscriber: jcrespo. ArielGlenn added a comment. @jcrespo I'm adding you too, please remove yourself if you're already covered by other tasks. @MarkTraceur The number of new revisions to wikidata in a day varies between about 550k and 850k. Of these, only about 12k are new pages in ns 0 (the vast majority of pages), and about 51% of those new revisions on old pages (in the last 3 months) are bots. What that means is a lot of bot activity adding claims to existing entities. Admittedly, commons will take some time to ramp up to that, but I'd prefer to plan for it sooner rather than later. I definitely don't want us to be in the position of telling people we can't accommodate their edits, and/or throttling them severely. For external storage core dbs, and dumps hosts, we'll need to make the appropriate projections. Do you have a link to an overview of what the various statements per file might be (the 10-20 you mentioned)? Is there a road map for release you can point us at? TASK DETAIL https://phabricator.wikimedia.org/T226093 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: jcrespo, Yann, MarkTraceur, ArielGlenn, Aklapper, darthmon_wmde, Legado_Shulgin, Nandana, JKSTNK, thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, Cparle, Anooprao, SandraF_WMF, GoranSMilovanovic, Lunewa, Th3d3v1ls, Hfbn0, QZanden, Tramullas, Acer, LawExplorer, Salgo60, Zppix, Silverfish, _jensen, rosalieper, Susannaanas, Wong128hk, gnosygnu, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, faidon, Steinsplitter, Mbch331, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Subscribers] T226093: Capacity planning for Commons Structured Data
ArielGlenn added a subscriber: MarkTraceur. ArielGlenn added a comment. From email from @MarkTraceur Database needs -- - 54 million files on Commons - Estimated average of 10-20 statements per file - Estimated 1 revision per statement - Therefore, (very) roughly 1 billion estimated rows added to revisions table External storage needs -- - Each file will have its own MediaInfo entity, which will be analogous to Wikidata items - So, given Wikidata has about 57 million items, the storage needs should be about the same - Obviously that would need to be additional storage, not including the existing Wikitext Rates - - We expect multiple bots to run over Commons very shortly after release (within the next few months) - Don't anticipate these will be drastically faster than normal bot runs - Could see Multichill's bots for examples - I believe he's rate-limited them aggressively - There will likely be micro-contributions as well - Think Magnus's "Wikidata game" style, likely similar rates - Also sanctioned on-wiki machine-aided work (for depicts statements) - By the end of the calendar year, we expect at least 5 million files to have structured data - We're currently sitting in the low six figures (100-300k) TASK DETAIL https://phabricator.wikimedia.org/T226093 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: MarkTraceur, ArielGlenn, Aklapper, darthmon_wmde, Legado_Shulgin, Nandana, JKSTNK, thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, Cparle, Anooprao, SandraF_WMF, GoranSMilovanovic, Lunewa, Th3d3v1ls, Hfbn0, QZanden, Tramullas, Acer, LawExplorer, Salgo60, Zppix, Silverfish, _jensen, rosalieper, Susannaanas, Wong128hk, gnosygnu, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, faidon, Steinsplitter, Mbch331, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs