Nuria added a comment.

  The work done by @mpopov
  
  The wbc_entity_usage table is supposed to hold info on Wikidata usage for the 
pages For example, here's a random file I added some structured data to a few 
days ago: 
https://commons.wikimedia.org/wiki/File:P%C3%B3voa_de_Varzim_-i---i-_(25379025808).jpg
  
  When you look for it the commonswiki replica, it has a page ID of 68860692. 
Looking for it in the wbc_entity_usage table we only see that it has a caption 
in English, which I added at basically the same time as several statements:
  
  Screen Shot 2019-11-19 at 5.58.57 PM.png
  eu_aspect column does have other values like "O" (statements) and "D" (not 
documented, but from a brief investigation looks like it's specifically for 
linking categories on Commons to Wikidata Q-items). There are some records of 
files with "O" aspects (as the MW page notes, it can refer to a variety to 
entity usages but typically it's statements) but then it gets weird because the 
language of the label isn't recorded and there's a bunch of seemingly 
unnecessary info? Take for example the MediaWiki DB data for 
https://commons.wikimedia.org/wiki/File:Jodrell_Bank_Mark_II_5.jpg
  
  Screen Shot 2019-11-19 at 6.36.04 PM.png
  Screen Shot 2019-11-19 at 6.36.33 PM.png
  Woof! That's…not great. So, uh, clearly there's something funky going on with 
the Wikibase client extension? Or maybe that's data that was recorded by an 
earlier version of the extension before it knew to append language codes to 
labels? I don't know enough about the nitty-gritty there, so these are just 
vaguely educated guesses.
  
  So, with the caveat that we're dealing with imperfect data and that some data 
is definitely missing, here are SOME lower bounds on files which have 
structured data:
  
  SELECT
  
    COUNT(DISTINCT eu_page_id) AS n_files_with_statements,
    COUNT(1) AS n_statements
  
  FROM wbc_entity_usage
  INNER JOIN page ON wbc_entity_usage.eu_page_id = page.page_id
  WHERE page_namespace = 6
  
    AND eu_aspect = 'O';
  
  n_files_with_statements: 5 212 680
  n_statements: 8 677 888
  
  Files with labels:
  
  SELECT
  
    COUNT(DISTINCT eu_page_id) AS n_files_with_captions
  
  FROM wbc_entity_usage
  INNER JOIN page ON wbc_entity_usage.eu_page_id = page.page_id
  WHERE page_namespace = 6
  
    AND eu_aspect RLIKE '^L';
  
  n_files_with_captions: 7 857 520
  
  Files with captions and/or statements:
  
  SELECT
  
    COUNT(DISTINCT eu_page_id) AS n_files_with_structured_data
  
  FROM wbc_entity_usage
  INNER JOIN page ON wbc_entity_usage.eu_page_id = page.page_id
  WHERE page_namespace = 6
  
    AND (eu_aspect RLIKE '^L' OR eu_aspect = 'O');
  
  n_files_with_structured_data: 7 861 887
  
  Total files (at the moment I ran these queries, so we have a record of % w/ 
structured data):
  
  SELECT COUNT(1) AS n_files_total
  FROM page WHERE page_namespace = 6;
  
  n_files_total: 58 698 512
  
  Thanks for reading! Feel free to correct me if I made any mistakes with the 
queries or my thought process.

TASK DETAIL
  https://phabricator.wikimedia.org/T238878

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Nuria
Cc: kzimmerman, mpopov, Ramsey-WMF, Abit, Nuria, 4748kitoko, darthmon_wmde, 
DannyS712, Nandana, JKSTNK, Akovalyov, Lahi, PDrouin-WMF, Gq86, E1presidente, 
Cparle, Anooprao, SandraF_WMF, GoranSMilovanovic, QZanden, Tramullas, Acer, 
LawExplorer, Salgo60, Silverfish, _jensen, rosalieper, Scott_WUaS, Susannaanas, 
JAllemandou, Jane023, terrrydactyl, Wikidata-bugs, Base, matthiasmullie, aude, 
Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, 
Steinsplitter, Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to