[Wikidata-bugs] [Maniphest] [Changed Subscribers] T226093: Capacity planning for Commons Structured Data

2019-06-20 Thread ArielGlenn
ArielGlenn added a subscriber: jcrespo.
ArielGlenn added a comment.


  @jcrespo I'm adding you too, please remove yourself if you're already covered 
by other tasks.
  
  @MarkTraceur The number of new revisions to wikidata in a day varies between 
about 550k and 850k. Of these, only about 12k are new pages in ns 0 (the vast 
majority of pages), and about 51% of those new revisions on old pages (in the 
last 3 months) are bots.
  
  What that means is a lot of bot activity adding claims to existing entities.
  
  Admittedly, commons will take some time to ramp up to that, but I'd prefer to 
plan for it sooner rather than later. I definitely don't want us to be in the 
position of telling people we can't accommodate their edits, and/or throttling 
them severely.
  
  For external storage core dbs, and dumps hosts, we'll need to make the 
appropriate projections.
  
  Do you have a link to an overview of what the various statements per file 
might be (the 10-20 you mentioned)?
  
  Is there a road map for release you can point us at?

TASK DETAIL
  https://phabricator.wikimedia.org/T226093

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn
Cc: jcrespo, Yann, MarkTraceur, ArielGlenn, Aklapper, darthmon_wmde, 
Legado_Shulgin, Nandana, JKSTNK, thifranc, AndyTan, Davinaclare77, Qtn1293, 
Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, Cparle, Anooprao, 
SandraF_WMF, GoranSMilovanovic, Lunewa, Th3d3v1ls, Hfbn0, QZanden, Tramullas, 
Acer, LawExplorer, Salgo60, Zppix, Silverfish, _jensen, rosalieper, 
Susannaanas, Wong128hk, gnosygnu, Jane023, Wikidata-bugs, Base, matthiasmullie, 
aude, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, faidon, 
Steinsplitter, Mbch331, Jay8g, fgiunchedi
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Subscribers] T226093: Capacity planning for Commons Structured Data

2019-06-19 Thread ArielGlenn
ArielGlenn added a subscriber: MarkTraceur.
ArielGlenn added a comment.


  From email from @MarkTraceur
  
  Database needs
  --
  
  - 54 million files on Commons
  - Estimated average of 10-20 statements per file
  - Estimated 1 revision per statement
  - Therefore, (very) roughly 1 billion estimated rows added to revisions table
  
  External storage needs
  --
  
  - Each file will have its own MediaInfo entity, which will be analogous to 
Wikidata items
  - So, given Wikidata has about 57 million items, the storage needs should be 
about the same
- Obviously that would need to be additional storage, not including the 
existing Wikitext
  
  Rates
  -
  
  - We expect multiple bots to run over Commons very shortly after release 
(within the next few months)
- Don't anticipate these will be drastically faster than normal bot runs
- Could see Multichill's bots for examples - I believe he's rate-limited 
them aggressively
  - There will likely be micro-contributions as well
- Think Magnus's "Wikidata game" style, likely similar rates
- Also sanctioned on-wiki machine-aided work (for depicts statements)
  - By the end of the calendar year, we expect at least 5 million files to have 
structured data
  - We're currently sitting in the low six figures (100-300k)

TASK DETAIL
  https://phabricator.wikimedia.org/T226093

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn
Cc: MarkTraceur, ArielGlenn, Aklapper, darthmon_wmde, Legado_Shulgin, Nandana, 
JKSTNK, thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, PDrouin-WMF, Gq86, 
E1presidente, Ramsey-WMF, Cparle, Anooprao, SandraF_WMF, GoranSMilovanovic, 
Lunewa, Th3d3v1ls, Hfbn0, QZanden, Tramullas, Acer, LawExplorer, Salgo60, 
Zppix, Silverfish, _jensen, rosalieper, Susannaanas, Wong128hk, gnosygnu, 
Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Ricordisamoa, Wesalius, 
Lydia_Pintscher, Fabrice_Florin, Raymond, faidon, Steinsplitter, Mbch331, 
Jay8g, fgiunchedi
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs