Re: [Wiki-research-l] Announcing Citation Detective, a public dataset of sentences missing citations

2020-03-10 Thread Jonathan Morgan
Thank you Aiko! This is excellent work. Thank you for helping us offer this
valuable new data service to the Wikimedia Movement.

Best,
Jonathan

On Sat, Mar 7, 2020 at 6:03 AM Ai-Jou Chou  wrote:

> Hi all,
>
> I’m happy to announce the outcome of an Outreachy internship
>  that I’m finishing up. It is a
> new tool and public dataset named Citation Detective which tool developers
> and researchers can now use for their projects.
>
> Citation Detective 
> contains sentences that have been identified as needing a citation using a
> machine learning-based classifier published earlier last year
>  by WMF researchers and
> collaborators. As part of Outreachy, I developed a tool
>  (hosted on Toolforge
> ) to run through Wikipedia and extract
> high-scoring sentences along with contextual information.
>
> As an example use case for this data, I also created a proof of concept for
> integrating Citation Detective and Citation Hunt
> . Check out my prototype Citation
> Hunt , which uses Citation
> Detective to import sentences that would not normally be featured in
> Citation Hunt. The repository for that is here
> .
>
> This dataset currently includes sentences from ~120,000 randomly selected
> articles from the English Wikipedia. In future work, we hope to expand this
> to more language Wikipedia projects and a greater number of articles. It is
> also possible to expand the database to contain more fields in a future
> version according to feedback from tool developers and researchers. More
> use cases for this type of data were identified in a design research
> project
> <
> https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements/API_design_research
> >
> conducted last year by Jonathan Morgan.
>
> You can find more information in our Wiki Workshop submission
> <
> https://commons.wikimedia.org/wiki/File:Citation_Detective_WikiWorkshop2020.pdf
> >
> and in my blog  which documented the whole
> journey.
>
> Thank you very much!
>
> Kind regard,
> Aiko
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>


-- 
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) 
(Uses He/Him)

*Please note that I do not expect a response from you on evenings or
weekends*
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Announcing Citation Detective, a public dataset of sentences missing citations

2020-03-07 Thread Ai-Jou Chou
Hi all,

I’m happy to announce the outcome of an Outreachy internship
 that I’m finishing up. It is a
new tool and public dataset named Citation Detective which tool developers
and researchers can now use for their projects.

Citation Detective 
contains sentences that have been identified as needing a citation using a
machine learning-based classifier published earlier last year
 by WMF researchers and
collaborators. As part of Outreachy, I developed a tool
 (hosted on Toolforge
) to run through Wikipedia and extract
high-scoring sentences along with contextual information.

As an example use case for this data, I also created a proof of concept for
integrating Citation Detective and Citation Hunt
. Check out my prototype Citation
Hunt , which uses Citation
Detective to import sentences that would not normally be featured in
Citation Hunt. The repository for that is here
.

This dataset currently includes sentences from ~120,000 randomly selected
articles from the English Wikipedia. In future work, we hope to expand this
to more language Wikipedia projects and a greater number of articles. It is
also possible to expand the database to contain more fields in a future
version according to feedback from tool developers and researchers. More
use cases for this type of data were identified in a design research project

conducted last year by Jonathan Morgan.

You can find more information in our Wiki Workshop submission

and in my blog  which documented the whole
journey.

Thank you very much!

Kind regard,
Aiko
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l