Hey,

This is the 21st weekly update from revision scoring team that we have sent
to this mailing list.

New development

   - We received a request to get moving on Spanish Wikibooks support, so
   we dug in:


   - We deployed a new Wiki labels campaign[1]


   - We fixed an issue in Wiki labels that prevented requests from *.
   wikibooks.org[2]


   - We trained a basic "revert" detection model that seems to be pretty
   effective[3]


   - We also generated a dataset of article quality scores for English
   Wikipedia[4].  You can download it here: [5]


This week, we invested in some long term tasks.  If you review our
phabricator board, you'll see substantial progress in improving our damage
detection models with hashing vectorization strategies[6, 7], implementing
a more robust model testing strategy[8], and implementing some advance
natural language processing strategies[9, 10].  Stay tuned for the
completion of these activities in the coming weeks.

1. https://phabricator.wikimedia.org/T143962 -- Add uniqueness constraints
to ores_classification
2. https://phabricator.wikimedia.org/T145406 -- Fix CORS for wikibooks
3. https://phabricator.wikimedia.org/T145428 -- Train/test reverted model
for Spanish Wikibooks
4. https://phabricator.wikimedia.org/T135684 -- Generate recent article
quality scores for English Wikipedia
5.
https://datasets.wikimedia.org/public-datasets/enwiki/article_quality/wp10-scores-enwiki-20160820.tsv.bz2
6. https://phabricator.wikimedia.org/T128087 -- [Spike] Investigate
HashingVectorizer
7. https://en.wikipedia.org/wiki/Feature_hashing
8. https://phabricator.wikimedia.org/T142953 -- Train on all data, Report
test statistics on cross-validation
9. https://phabricator.wikimedia.org/T144636 -- Implement PCFG features
10. https://en.wikipedia.org/wiki/Stochastic_context-free_grammar

Sincerely,
Aaron from the Revision Scoring team
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to