dcausse moved this task from In Progress to Needs review on the 
Discovery-Search (Current work) board.
dcausse added a comment.


  > percentage, number of WDQS queries per month that involve Lexemes
  >
  >> percentage, number of the above queries that only involve Lexemes (i.e. 
doesn't require anything from the larger Wikidata graph)
  
  with very naive heuristics and for one day I extracted 529097 queries 
involving lexemes.
  357917 seemed to require data from wikidata but I would not trust this too 
much. Since the language is a wikidata item a query requesting labels in a 
language using its language code rather than its QID falls into the category of 
queries requiring the wikidata graph.
  I did not run the analysis on the full month because it's rather slow and 
given the precision of the heuristics I chose I would not trust these numbers 
anyways.
  
  If we need more precise numbers the analysis will have to be more involved.
  
  For ref, here are the list of predicates I used to detect a `lexeme` query: 
`wikibase:lemma`,   `ontolex:lexicalForm`, `ontolex:representation`,  
`ontolex:LexicalEntry`, `ontolex:sense`,`dct:language`, 
`wikibase:lexicalCategory`, `wikibase:grammaticalFeature`.
  
  > given the current rate of growth of Wikidata, approximately how much time 
it would take for non-Lexeme Wikidata to grow back to its current size
  
  The lexemes RDF dataset is about 77M triples (0.6% of the total size of the 
graph).
  If we were to remove lexemes from the main graph at current growth rate it 
would take ~10days for wikidata to grow back to the equivalent size.
  Note that in the current graph "only" 29316 distinct wikidata items are being 
referenced from the lexemes.

TASK DETAIL
  https://phabricator.wikimedia.org/T275068

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Lydia_Pintscher, DVrandecic, Lucas_Werkmeister_WMDE, Aklapper, MPhamWMF, 
Invadibot, maantietaja, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to