Re: [Wiki-research-l] Questions about SuggestBot

2019-06-26 Thread Kerry Raymond
I am familiar with the citation hunt tool, but am not much of a fan of it. 
Basically there is a tracking category built into the {{citation needed}} and 
citation hunt returns you a random article in that category showing you a few 
lines of text preceding the (I think) first needed citation in the article and 
you can choose to accept it or skip it (whereupon it offers you another one). 
Although intended as a lightweight way to try to engage librarians during 
1lib1ref to scurry unto their collections to find a citation and add it. It 
sounds superficially like a great idea, unless you do outreach with librarians 
as I do and then you see how flawed it. If fails in at least three ways.

The first is that just because there is a citation needed template present, it 
doesn’t follow that a citation will exists (it may be untrue information) so it 
is often a waste of time searching. An inherent problem with that template is 
that many people (librarians or others) may see it and try to find a citation. 
When they fail to find one, what do they do? Answer: do nothing and move on. 
Very few people feel confident to say “there is no citation, I will remove all 
the text associated with the template” as firstly they realise there may be a 
source that they failed to find and secondly they are uncertain how much 
preceding text to remove in any case. It’s hard to make that call as an 
experienced Wikipedian, it’s not an entry level task. The template does not 
have a field to record how many attempts have been made which might be 
accompanied by a (say) “3 strikes and it’s out” policy or a time-based deletion 
criteria.  Once added {{citation needed}} tends to linger for years, wasting 
people’s time trying to resolve it.

Problem 2. A librarian will be asked to find a citation for content unlikely to 
be held in their library; there are not a lot of books on baseball players in 
an Australian library. You can skips through an awful lot of suggestions before 
hitting one that might be in your collection. Although my personal observation 
is that most librarians don’t look in their collection, they look for a quick 
win with a simple google search which tends to fail (if it was that easy, it 
would already be cited).

Problem 3. It is fundamentally sexist and some librarians notice this and 
comment on it to others. Who writes Wikipedia? Well, we know it’s predominantly 
men. Who are librarians? Predominantly women. So, ... the idea of using 
Citation Hunt for 1Lib1Ref is that a woman is being asked to do a lot of work 
(scouring their collection) to clean up after a lazy man who didn’t bother to 
do their job right in the first place. Why does this scenario seem familiar to 
many women? When you put it like that, would you ever suggest it again? I 
wouldn’t but WMF does. Instead I have come up with 1lib1ref tasks that add 
content with citation in topics that are relevant to the librarians I work 
with. The librarians see this as a very positive activity. The task is almost 
always do-able and adds content to a topic they perceive as relevant to their 
library’s focus (no baseball tasks).

You can fix problem 2 though. If you use Petscan to compile some 
whole-of-category trees, you can run Citation Hunt over that smaller list of 
articles and keep the topics relevant to a particular group of librarians. I 
did this manually one year, now it’s built-in to the tool I think.

So in terms of suggesting tasks to users, as much as resolving {{citation 
needed}} is a high value result if it succeeds, the success rate of actually 
doing so is low and the task can be frustrating. The other risk with new people 
doing is that they don’t understand what a reliable citation is so they may 
link to a Facebook post or a tweet or a webpage that on close inspection is a 
mirror of Wikipedia and again even as an experienced Wikipedia, I struggle to 
determine if a random webpage out there which is more or less identical to a 
Wikipedia article is a copy from Wikipedia (and hence not a reliable source) or 
whether the Wikipedia article is a copyvio of that webpage.

I’d have to say that putting a time limit on {{citation needed}} would be a 
very good thing as it would limit the time questionable content exists without 
citation and we could use imminent deadlines as the basis for a Suggest A Task 
tool, on a “cite it or delete it” basis. This would empower people to delete. 
I’d go further and suggest that citation-needed should have short default time 
limits set on edits made by new users and the higher the importance or 
readership of the article, the lower the expiry time should be. I think this 
could be done with a bot where it is not set manually by the person who adds 
citation needed. It would be really great if Twinkle allowed you to add the 
citation needed tag and automatically set the expiry time according to whatever 
policies exist.

Sent from my iPad

> On 27 Jun 2019, at 1:49 am, Morten Wang  wrote:
> 
> As St

Re: [Wiki-research-l] [Analytics] [Wikimedia Research Showcase] June 26, 2019 at 11:30 AM PST, 19:30 UTC

2019-06-26 Thread RhinosF1 Wikipedia
For those that couldn't make it, Is there are summary of what was said?

Thanks in advance,
RhinosF1

On Wed, 26 Jun 2019 at 18:58, Janna Layton  wrote:

> Hello everyone,
>
> Just a reminder that this event will be happening in about half an hour!
> Here's the Youtube link again: https://www.youtube.com/watch?v=WiUfpmeJG7E
>
> On Tue, Jun 25, 2019 at 9:14 AM Janna Layton 
> wrote:
>
>> Time correction:
>>
>> The next Research Showcase will be live-streamed next Wednesday, June 26,
>> at *11:30 AM PDT/18:30 UTC*.
>>
>> On Mon, Jun 24, 2019 at 4:11 PM Janna Layton 
>> wrote:
>>
>>> Hi all,
>>>
>>> The next Research Showcase will be live-streamed this Wednesday, June
>>> 26, at 11:30 AM PST/19:30 UTC. We will have three presentations this
>>> showcase, all relating to Wikipedia blocks.
>>>
>>> YouTube stream: https://www.youtube.com/watch?v=WiUfpmeJG7E
>>>
>>> As usual, you can join the conversation on IRC at #wikimedia-research.
>>> You can also watch our past research showcases here:
>>> https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
>>>
>>> This month's presentations:
>>>
>>> Trajectories of Blocked Community Members: Redemption, Recidivism and
>>> Departure
>>>
>>> By Jonathan Chang, Cornell University
>>>
>>> Community norm violations can impair constructive communication and
>>> collaboration online. As a defense mechanism, community moderators often
>>> address such transgressions by temporarily blocking the perpetrator. Such
>>> actions, however, come with the cost of potentially alienating community
>>> members. Given this tradeoff, it is essential to understand to what extent,
>>> and in which situations, this common moderation practice is effective in
>>> reinforcing community rules. In this work, we introduce a computational
>>> framework for studying the future behavior of blocked users on Wikipedia.
>>> After their block expires, they can take several distinct paths: they can
>>> reform and adhere to the rules, but they can also recidivate, or
>>> straight-out abandon the community. We reveal that these trajectories are
>>> tied to factors rooted both in the characteristics of the blocked
>>> individual and in whether they perceived the block to be fair and
>>> justified. Based on these insights, we formulate a series of prediction
>>> tasks aiming to determine which of these paths a user is likely to take
>>> after being blocked for their first offense, and demonstrate the
>>> feasibility of these new tasks. Overall, this work builds towards a more
>>> nuanced approach to moderation by highlighting the tradeoffs that are in
>>> play.
>>>
>>>
>>> Automatic Detection of Online Abuse in Wikipedia
>>>
>>> By Lane Rasberry, University of Virginia
>>>
>>> Researchers analyzed all English Wikipedia blocks prior to 2018 using
>>> machine learning. With insights gained, the researchers examined all
>>> English Wikipedia users who are not blocked against the identified
>>> characteristics of blocked users. The results were a ranked set of
>>> predictions of users who are not blocked, but who have a history of conduct
>>> similar to that of blocked users. This research and process models a system
>>> for the use of computing to aid human moderators in identifying conduct on
>>> English Wikipedia which merits a block.
>>>
>>> Project page:
>>> https://meta.wikimedia.org/wiki/University_of_Virginia/Automatic_Detection_of_Online_Abuse
>>>
>>> Video: https://www.youtube.com/watch?v=AIhdb4-hKBo
>>>
>>>
>>> First Insights from Partial Blocks in Wikimedia Wikis
>>>
>>> By Morten Warncke-Wang, Wikimedia Foundation
>>>
>>> The Anti-Harassment Tools team at the Wikimedia Foundation released the
>>> partial block feature in early 2019. Where previously blocks on Wikimedia
>>> wikis were sitewide (users were blocked from editing an entire wiki),
>>> partial blocks makes it possible to block users from editing specific pages
>>> and/or namespaces. The Italian Wikipedia was the first wiki to start using
>>> this feature, and it has since been rolled out to other wikis as well. In
>>> this presentation, we will look at how this feature has been used in the
>>> first few months since release.
>>>
>>>
>>> --
>>> Janna Layton (she, her)
>>> Administrative Assistant - Audiences & Technology
>>> Wikimedia Foundation 
>>>
>>
>>
>> --
>> Janna Layton (she, her)
>> Administrative Assistant - Audiences & Technology
>> Wikimedia Foundation 
>>
>
>
> --
> Janna Layton (she, her)
> Administrative Assistant - Audiences & Technology
> Wikimedia Foundation 
> ___
> Analytics mailing list
> analyt...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] [Wikimedia Research Showcase] June 26, 2019 at 11:30 AM PST, 19:30 UTC

2019-06-26 Thread Janna Layton
Hello everyone,

Just a reminder that this event will be happening in about half an hour!
Here's the Youtube link again: https://www.youtube.com/watch?v=WiUfpmeJG7E

On Tue, Jun 25, 2019 at 9:14 AM Janna Layton  wrote:

> Time correction:
>
> The next Research Showcase will be live-streamed next Wednesday, June 26,
> at *11:30 AM PDT/18:30 UTC*.
>
> On Mon, Jun 24, 2019 at 4:11 PM Janna Layton 
> wrote:
>
>> Hi all,
>>
>> The next Research Showcase will be live-streamed this Wednesday, June 26,
>> at 11:30 AM PST/19:30 UTC. We will have three presentations this showcase,
>> all relating to Wikipedia blocks.
>>
>> YouTube stream: https://www.youtube.com/watch?v=WiUfpmeJG7E
>>
>> As usual, you can join the conversation on IRC at #wikimedia-research.
>> You can also watch our past research showcases here:
>> https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
>>
>> This month's presentations:
>>
>> Trajectories of Blocked Community Members: Redemption, Recidivism and
>> Departure
>>
>> By Jonathan Chang, Cornell University
>>
>> Community norm violations can impair constructive communication and
>> collaboration online. As a defense mechanism, community moderators often
>> address such transgressions by temporarily blocking the perpetrator. Such
>> actions, however, come with the cost of potentially alienating community
>> members. Given this tradeoff, it is essential to understand to what extent,
>> and in which situations, this common moderation practice is effective in
>> reinforcing community rules. In this work, we introduce a computational
>> framework for studying the future behavior of blocked users on Wikipedia.
>> After their block expires, they can take several distinct paths: they can
>> reform and adhere to the rules, but they can also recidivate, or
>> straight-out abandon the community. We reveal that these trajectories are
>> tied to factors rooted both in the characteristics of the blocked
>> individual and in whether they perceived the block to be fair and
>> justified. Based on these insights, we formulate a series of prediction
>> tasks aiming to determine which of these paths a user is likely to take
>> after being blocked for their first offense, and demonstrate the
>> feasibility of these new tasks. Overall, this work builds towards a more
>> nuanced approach to moderation by highlighting the tradeoffs that are in
>> play.
>>
>>
>> Automatic Detection of Online Abuse in Wikipedia
>>
>> By Lane Rasberry, University of Virginia
>>
>> Researchers analyzed all English Wikipedia blocks prior to 2018 using
>> machine learning. With insights gained, the researchers examined all
>> English Wikipedia users who are not blocked against the identified
>> characteristics of blocked users. The results were a ranked set of
>> predictions of users who are not blocked, but who have a history of conduct
>> similar to that of blocked users. This research and process models a system
>> for the use of computing to aid human moderators in identifying conduct on
>> English Wikipedia which merits a block.
>>
>> Project page:
>> https://meta.wikimedia.org/wiki/University_of_Virginia/Automatic_Detection_of_Online_Abuse
>>
>> Video: https://www.youtube.com/watch?v=AIhdb4-hKBo
>>
>>
>> First Insights from Partial Blocks in Wikimedia Wikis
>>
>> By Morten Warncke-Wang, Wikimedia Foundation
>>
>> The Anti-Harassment Tools team at the Wikimedia Foundation released the
>> partial block feature in early 2019. Where previously blocks on Wikimedia
>> wikis were sitewide (users were blocked from editing an entire wiki),
>> partial blocks makes it possible to block users from editing specific pages
>> and/or namespaces. The Italian Wikipedia was the first wiki to start using
>> this feature, and it has since been rolled out to other wikis as well. In
>> this presentation, we will look at how this feature has been used in the
>> first few months since release.
>>
>>
>> --
>> Janna Layton (she, her)
>> Administrative Assistant - Audiences & Technology
>> Wikimedia Foundation 
>>
>
>
> --
> Janna Layton (she, her)
> Administrative Assistant - Audiences & Technology
> Wikimedia Foundation 
>


-- 
Janna Layton (she, her)
Administrative Assistant - Audiences & Technology
Wikimedia Foundation 
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Questions about SuggestBot

2019-06-26 Thread Morten Wang
As Stuart Yates kindly pointed out, SuggestBot is alive and well! (And in
case it wasn't obvious, I know this because I'm the one maintaining it :)
It's currently serving up article suggestions in seven languages. It also
updates the list of open tasks (e.g. the one shown on the English Community
Portal ) in a few
languages (those task list updates pick a random selection of articles from
a given set of categories, they're not personalized recommendations).

There is currently not as far as I know any similar tool that does
personalized recommendations. Stuart mentioned some ways that wikipedias
organize work lists and keep track of things that need to be done. There's
also some tools that provide topical suggestions for things to do
(e.g. Citation
Hunt ). I haven't dug into
learning how those work.

When it comes to published research on how Wikipedia contributors work with
tasks, in addition to the two papers that have been published about
SuggestBot there's also this one: Krieger, M., Stark, E. M., & Klemmer, S.
R. "Coordinating tasks on the commons: designing for personal goals,
expertise and serendipity" CHI 2009.

Happy to answer any other questions you (or others) might have about
SuggestBot, of course!


Cheers,
Morten


On Mon, 24 Jun 2019 at 18:21, Haifeng Zhang  wrote:

> Thanks so much for answering my questions, Stuart.
>
> It seems redlinks are related to article creation only.
>
> Could you give me some detail about how "administrative groups" work in
> term of task routing?
>
> I also found the following TASK CENTER page (
> https://en.wikipedia.org/wiki/Wikipedia:Task_Center).
>
> Are the links/lists (under "Do it!") used frequently by editors as routing
> tools?
>
>
> Thanks,
>
> Haifeng Zhang
> 
> From: Wiki-research-l  on
> behalf of Stuart A. Yeates 
> Sent: Sunday, June 23, 2019 11:37:38 PM
> To: Research into Wikimedia content and communities
> Subject: Re: [Wiki-research-l] Questions about SuggestBot
>
> (a) SuggestBot visited me in the last week.
>
> https://en.wikipedia.org/w/index.php?title=User_talk%3AStuartyeates&type=revision&diff=902456290&oldid=901462765
>
> (b) There are lots of different task routing approaches: lists of
> redlinks,administrative groups, etc.
>
> (c) Sentences containing the words 'bot' and 'documented' appear to
> mainly exist for comedic value. Bots are typically even less
> documented than usual.
>
> cheers
> stuart
> --
> ...let us be heard from red core to black sky
>
> On Mon, 24 Jun 2019 at 15:24, Haifeng Zhang 
> wrote:
> >
> > Hi all,
> >
> > Is the SuggestBot still in use in Wikipedia?
> >
> > Are there similar task routing tools that have been deployed in
> Wikipedia?
> >
> > Where in Wikipedia the use of such tools or bots was documented?
> >
> >
> > Thanks,
> >
> > Haifeng Zhang
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] [CfP] 1st International Workshop on Approaches for Making Data Interoperable (AMAR 2019)

2019-06-26 Thread Lucie-Aimée Kaffee
Call For Papers

1st International Workshop on Approaches for Making Data Interoperable
(AMAR 2019)

https://events.tib.eu/amar2019/
co-located with SEMANTiCS 2019

September 09 – 12, 2019 Karlsruhe, Germany

Submission deadline: July 9, 2019



Overview



Recently, there has been a rapid growth in the amount of data available on
the Web. Data is produced by different communities working in a wide range
of domains, using several techniques. This way a large volume of data in
different formats and languages is generated. Accessibility of such
heterogeneous and multilingual data becomes an obstacle for reuse due to
the incompatibility of data formats and the language gap. This
incompatibility of data formats impedes the accessibility of data sources
to the right community. For instance, most of open domain question
answering systems are developed to be effective when data is represented in
RDF. They can not operate with data in the very common CSV files or
presented in unstructured formats. Usually, the data they draw from is in
English rendering them unable to answer questions e.g. in Spanish. On the
other hand, NLP applications in Spanish cannot make use of a knowledge
graph in English. Different communities have different requirements in
terms of data representation and modeling. It is crucial to make the data
interoperable to make it accessible for a variety of applications.



Topics of Interest



We invite paper submissions from two communities: (i) data consumers and
(ii) data providers. This includes practitioners, such as data scientists,
that have experience in fitting the data available to their use case;
Semantic Web researchers, that have been investigating data reuse from
heterogeneous data in tools; researchers in the field of data linking and
translation; and other researchers working on the general field of data
integration.

We invite submissions from the following communities:

   -

   Data Integration
   -

   Multilingual Data
   -

   Data Linking
   -

   Ontology and Knowledge Engineering

We welcome original contributions about all topics related to data
interoperability, including but not limited to:

   -

   Approaches to convert data between formats, languages, and schema
   -

   Best practices for processing heterogeneous data
   -

   Translation of different language data
   -

   Cross-lingual applications
   -

   Recommendations for language modeling in linked data
   -

   Labeling of data with natural language information
   -

   Datasets for different communities’ data needs
   -

   Tools reusing different data formats
   -

   Converting datasets between different formats
   -

   Applications in different domains, e.g., Life Sciences, Scholarly,
   Industry 4.0, Humanities


Author Instructions


Paper submission this workshop will be via EasyChair (
https://easychair.org/conferences/?conf=amar2019). The papers should follow
the Springer LNCS format, and be submitted in PDF on or before July 9, 2019
(midnight Hawaii time).

We accept papers of the following formats:

   -

   Full research papers (8 - 12 pages)
   -

   Short research papers (3 - 5 pages)
   -

   Position papers (6 - 8 pages)
   -

   Resource papers (8 - 12 pages, including the publication of the dataset)
   -

   In-Use papers (6 - 8 pages)

Accepted papers will be published as CEUR workshop proceedings. We target
the creation of a special issue including the best papers of the workshop.


Important Dates


Submission: July 9, 2019
Notification: July 30, 2019
Workshop: September 9, 2019


Workshop Organizers


Lucie-Aimée Kaffee, University of Southampton, UK & TIB Leibniz Information
Centre for Science and Technology, Hannover, Germany
Kemele M. Endris, TIB Leibniz Information Centre for Science and Technology
and L