[Wikidata-bugs] [Maniphest] T285458: Generate inputs for 1st sensemaking session about ORES quality score distributions across the Wikidata classes

GoranSMilovanovic Tue, 06 Jul 2021 00:10:03 -0700

GoranSMilovanovic added a comment.


  @Manuel
  
  Here is my current take on
  
  > ideas about possible next steps (towards a better understanding of the 
current distribution of the ORES quality scores across Wikidata’s classes)
  
  
  
  - Gather potential explanatory variables and model their influence upon ORES 
scores, e.g. number of edits per WD class, proportion of human vs bot edits, 
how frequently were the class items revised...
  
  - Separate quality assessment for (a) classes that were predominantly edited 
by bots vs classes that were predominantly edited by editors, and maybe (b) 
classes that predominantly result as consequences of mass imports vs 
"spontaneously grown" classes?
  
  - Describe clusters of Wikidata classes by higher level classes in the 
ontology (i.e. what is found in their `P31/P279` paths towards `entitity`), but 
this might be tricky to obtain
  
  - I was never able to figure out precisely the size and characteristics of 
the ORES training set for Wikidata; however, I wonder if training separate 
quality models (via boosted trees, as in ORES, or otherwise) for different 
large Wikidata classes would make more sense than training a model to predict a 
quality of just any item in some general framework.

TASK DETAIL
  https://phabricator.wikimedia.org/T285458

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Ladsgroup, Lydia_Pintscher, Tobi_WMDE_SW, Manuel, GoranSMilovanovic, 
Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331

_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Wikidata-bugs] [Maniphest] T285458: Generate inputs for 1st sensemaking session about ORES quality score distributions across the Wikidata classes

Reply via email to