Ladsgroup added a comment. |
To make the dataset (sorta) balanced, we automatically mark edits made by users with more than 1K edits as trusted and doesn't need review (Look at the Makefile) and wikidata is populated by bots (more than any other wiki) so if we want to achieve a dataset to review we need to either: 1- Sample 500K and autolabel most of them using the edit count restriction and pick up the 2k for users 2- Apply the editcount restriction in the sampling. We don't use --pop-rate in wikidata models so it doesn't matter to get the proportion right.
TASK DETAIL
EMAIL PREFERENCES
To: Ladsgroup
Cc: Halfak, Ladsgroup, matej_suchanek, Aklapper, Lydia_Pintscher, Lahi, Gq86, Vacio, bkowshik, GoranSMilovanovic, QZanden, LawExplorer, Avner, Mkdw, srodlund, Wikidata-bugs, aude, Alchimista, He7d3r, Mbch331, Rxy
Cc: Halfak, Ladsgroup, matej_suchanek, Aklapper, Lydia_Pintscher, Lahi, Gq86, Vacio, bkowshik, GoranSMilovanovic, QZanden, LawExplorer, Avner, Mkdw, srodlund, Wikidata-bugs, aude, Alchimista, He7d3r, Mbch331, Rxy
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs