Ladsgroup added a comment.

To make the dataset (sorta) balanced, we automatically mark edits made by users with more than 1K edits as trusted and doesn't need review (Look at the Makefile) and wikidata is populated by bots (more than any other wiki) so if we want to achieve a dataset to review we need to either: 1- Sample 500K and autolabel most of them using the edit count restriction and pick up the 2k for users 2- Apply the editcount restriction in the sampling. We don't use --pop-rate in wikidata models so it doesn't matter to get the proportion right.


TASK DETAIL
https://phabricator.wikimedia.org/T195701

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Halfak, Ladsgroup, matej_suchanek, Aklapper, Lydia_Pintscher, Lahi, Gq86, Vacio, bkowshik, GoranSMilovanovic, QZanden, LawExplorer, Avner, Mkdw, srodlund, Wikidata-bugs, aude, Alchimista, He7d3r, Mbch331, Rxy
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to