[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2022-01-25 Thread Manuel
Manuel closed this task as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T297347 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo, Manuel Cc: Ladsgroup, Lucas_Werkmeister_WMDE, Michael, Manuel, Aklapper, Lydia_Pintscher, Invadibot, maa

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2022-01-25 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment. > @Lucas_Werkmeister_WMDE: Is it fair to assume that the randomization is in this case random enough for our purposes? I think so, yes. TASK DETAIL https://phabricator.wikimedia.org/T297347 EMAIL PREFERENCES https://phabricator.wikimedia.org/s

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2022-01-25 Thread Manuel
Manuel added a comment. Thank you! @Lucas_Werkmeister_WMDE: Is it fair to assume that the randomization is in this case random enough for our purposes? According to the conversation, I would suggest the following: 1. Let's do the manual evaluation only for edits that were done w

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2022-01-25 Thread hoo
hoo added a comment. In T297347#7648387 , @Lucas_Werkmeister_WMDE wrote: > The script looks alright to me – I remember reading something about how `ORDER BY RAND()` isn’t an ideal way to shuffle a collection (especially depending on the

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2022-01-25 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE moved this task from Peer Review to Product Verification on the Wikidata-Campsite (Team A Hearth 🏰🔥) board. Lucas_Werkmeister_WMDE added a comment. The script looks alright to me – I remember reading something about how `ORDER BY RAND()` isn’t an ideal way to shuffle a

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2022-01-24 Thread hoo
hoo added a comment. I wrote a one-off Python script for this which I ran on stat1007: It used a SQL query to get 10k random revisions from the last year (that were on non-redirect items). The "classifying" is a little hacky here, but probably good enough (also this is impossible to get full

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2022-01-11 Thread hoo
hoo claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T297347 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo Cc: Ladsgroup, Lucas_Werkmeister_WMDE, Michael, Manuel, Aklapper, Lydia_Pintscher, Invadibot, maantietaja, Akuckartz, Nan

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2021-12-27 Thread Michael
Michael removed Michael as the assignee of this task. TASK DETAIL https://phabricator.wikimedia.org/T297347 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Michael Cc: Ladsgroup, Lucas_Werkmeister_WMDE, Michael, Manuel, Aklapper, Lydia_Pintscher, Inv

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2021-12-10 Thread Ladsgroup
Ladsgroup added a comment. Yup but just make sure you add bot edits to the original training model (while labeling them as good automatically). Otherwise the model that haven't seen those edits would think they are also vandalism. As an analogy, if you exclude all of basketball pictures in a

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2021-12-09 Thread Michael
Michael added a subscriber: Ladsgroup. Michael added a comment. > What would be the optimal way to use these 2.000 human-categorized revisions? Can we focus on hard and useful stuff without jeopardizing the balance of ORES? Could we for example treat bot edits and massive classes like "acade

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2021-12-09 Thread Michael
Michael claimed this task. Restricted Application added a project: User-Michael. TASK DETAIL https://phabricator.wikimedia.org/T297347 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Michael Cc: Lucas_Werkmeister_WMDE, Michael, Manuel, Aklapper, Lydia

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2021-12-09 Thread Manuel
Manuel updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T297347 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manuel Cc: Lucas_Werkmeister_WMDE, Michael, Manuel, Aklapper, Lydia_Pintscher, Invadibot, maantietaja, Akuckart

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2021-12-09 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T297347 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucas_Werkmeister_WMDE Cc: Lucas_Werkmeister_WMDE, Michael, Manuel, Aklapper, Lydia_Pintscher,

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2021-12-09 Thread Manuel
Manuel updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T297347 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manuel Cc: Lucas_Werkmeister_WMDE, Michael, Manuel, Aklapper, Lydia_Pintscher, Invadibot, maantietaja, Akuckart

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2021-12-09 Thread Manuel
Manuel updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T297347 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manuel Cc: Lucas_Werkmeister_WMDE, Michael, Manuel, Aklapper, Lydia_Pintscher, Invadibot, maantietaja, Akuckart

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2021-12-09 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T297347 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucas_Werkmeister_WMDE Cc: Lucas_Werkmeister_WMDE, Michael, Manuel, Aklapper, Lydia_Pintscher,

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2021-12-09 Thread Manuel
Manuel updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T297347 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Manuel Cc: Lucas_Werkmeister_WMDE, Michael, Manuel, Aklapper, Lydia_Pintscher, Invadibot, maantietaja, Akuckart

[Wikidata-bugs] [Maniphest] T297347: Draw sample of Wikidata revisions for ORES training data

2021-12-09 Thread Maintenance_bot
Maintenance_bot added a project: Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T297347 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Maintenance_bot Cc: Lucas_Werkmeister_WMDE, Michael, Manuel, Aklapper, Lydia_Pintscher, Invadibot, maanti