GoranSMilovanovic added a comment.
@Manuel Here is my current take on > ideas about possible next steps (towards a better understanding of the current distribution of the ORES quality scores across Wikidata’s classes) - Gather potential explanatory variables and model their influence upon ORES scores, e.g. number of edits per WD class, proportion of human vs bot edits, how frequently were the class items revised... - Separate quality assessment for (a) classes that were predominantly edited by bots vs classes that were predominantly edited by editors, and maybe (b) classes that predominantly result as consequences of mass imports vs "spontaneously grown" classes? - Describe clusters of Wikidata classes by higher level classes in the ontology (i.e. what is found in their `P31/P279` paths towards `entitity`), but this might be tricky to obtain - I was never able to figure out precisely the size and characteristics of the ORES training set for Wikidata; however, I wonder if training separate quality models (via boosted trees, as in ORES, or otherwise) for different large Wikidata classes would make more sense than training a model to predict a quality of just any item in some general framework. TASK DETAIL https://phabricator.wikimedia.org/T285458 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Ladsgroup, Lydia_Pintscher, Tobi_WMDE_SW, Manuel, GoranSMilovanovic, Aklapper, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
