Thank you Aiko! This is excellent work. Thank you for helping us offer this valuable new data service to the Wikimedia Movement.
Best, Jonathan On Sat, Mar 7, 2020 at 6:03 AM Ai-Jou Chou <qwanqwa...@gmail.com> wrote: > Hi all, > > I’m happy to announce the outcome of an Outreachy internship > <https://phabricator.wikimedia.org/T233707> that I’m finishing up. It is a > new tool and public dataset named Citation Detective which tool developers > and researchers can now use for their projects. > > Citation Detective <https://meta.wikimedia.org/wiki/Citation_Detective> > contains sentences that have been identified as needing a citation using a > machine learning-based classifier published earlier last year > <https://arxiv.org/pdf/1902.11116.pdf> by WMF researchers and > collaborators. As part of Outreachy, I developed a tool > <https://github.com/AikoChou/citationdetective> (hosted on Toolforge > <https://tools.wmflabs.org>) to run through Wikipedia and extract > high-scoring sentences along with contextual information. > > As an example use case for this data, I also created a proof of concept for > integrating Citation Detective and Citation Hunt > <https://tools.wmflabs.org/citationhunt>. Check out my prototype Citation > Hunt <https://tools.wmflabs.org/aiko-citationhunt>, which uses Citation > Detective to import sentences that would not normally be featured in > Citation Hunt. The repository for that is here > <https://github.com/AikoChou/citationhunt>. > > This dataset currently includes sentences from ~120,000 randomly selected > articles from the English Wikipedia. In future work, we hope to expand this > to more language Wikipedia projects and a greater number of articles. It is > also possible to expand the database to contain more fields in a future > version according to feedback from tool developers and researchers. More > use cases for this type of data were identified in a design research > project > < > https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements/API_design_research > > > conducted last year by Jonathan Morgan. > > You can find more information in our Wiki Workshop submission > < > https://commons.wikimedia.org/wiki/File:Citation_Detective_WikiWorkshop2020.pdf > > > and in my blog <https://rollingmist.home.blog/> which documented the whole > journey. > > Thank you very much! > > Kind regard, > Aiko > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)> (Uses He/Him) *Please note that I do not expect a response from you on evenings or weekends* _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l