Hi all, I’m happy to announce the outcome of an Outreachy internship <https://phabricator.wikimedia.org/T233707> that I’m finishing up. It is a new tool and public dataset named Citation Detective which tool developers and researchers can now use for their projects.
Citation Detective <https://meta.wikimedia.org/wiki/Citation_Detective> contains sentences that have been identified as needing a citation using a machine learning-based classifier published earlier last year <https://arxiv.org/pdf/1902.11116.pdf> by WMF researchers and collaborators. As part of Outreachy, I developed a tool <https://github.com/AikoChou/citationdetective> (hosted on Toolforge <https://tools.wmflabs.org>) to run through Wikipedia and extract high-scoring sentences along with contextual information. As an example use case for this data, I also created a proof of concept for integrating Citation Detective and Citation Hunt <https://tools.wmflabs.org/citationhunt>. Check out my prototype Citation Hunt <https://tools.wmflabs.org/aiko-citationhunt>, which uses Citation Detective to import sentences that would not normally be featured in Citation Hunt. The repository for that is here <https://github.com/AikoChou/citationhunt>. This dataset currently includes sentences from ~120,000 randomly selected articles from the English Wikipedia. In future work, we hope to expand this to more language Wikipedia projects and a greater number of articles. It is also possible to expand the database to contain more fields in a future version according to feedback from tool developers and researchers. More use cases for this type of data were identified in a design research project <https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements/API_design_research> conducted last year by Jonathan Morgan. You can find more information in our Wiki Workshop submission <https://commons.wikimedia.org/wiki/File:Citation_Detective_WikiWorkshop2020.pdf> and in my blog <https://rollingmist.home.blog/> which documented the whole journey. Thank you very much! Kind regard, Aiko _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l