Re: [Wikimedia-l] Catching copy and pasting early

2014-07-21 Thread Pine W
It should be relatively easy to catch a significant percentage of those
copyright violations with the assistance of automated search tools. The
trick is to do it at a large scale in near-realtime, which might require
some computationally intensive and bandwidth intensive work. James, can I
suggest that you take this discussion to Wiki-Research-l? There are a
number of ways that the copyright violation problem could be addressed and
I think this would be a good subject for discussion on that list, or at
Wikimania. Depending on how the discussion on Research goes, it might be
good to invite some dev or tech ops people to participate in the discussion
as well.

Pine


On Sun, Jul 20, 2014 at 7:05 PM, Leigh Thelmadatter osama...@hotmail.com
wrote:

 This is one of the best ideas Ive read on here!


  Date: Sun, 20 Jul 2014 20:00:28 -0600
  From: jmh...@gmail.com
  To: wikimedia-l@lists.wikimedia.org; eloque...@gmail.com;
 fschulenb...@wikimedia.org; ladsgr...@gmail.com; jorlow...@gmail.com;
 madman.enw...@gmail.com; west.andre...@gmail.com
  Subject: [Wikimedia-l] Catching copy and pasting early
 
  Come across another few thousand edits of copy and paste violations again
  today. These have occurred over more than a year. It is wearing me out.
  Really what is the point on collaborating on Wikipedia if it is simply a
  copyright violation. We need a solution and one has been proposed here a
  couple of years ago https://en.wikipedia.org/wiki/Wikipedia:Turnitin
 
  We now need programmers to carry it out. The Wiki Education Foundation
 has
  expressed interest. We will need support from the foundation as this
  software will likely need to mesh closely with edits as they come in. I
 am
  willing to offer $5,000 dollars Canadian (almost the same as American)
 for
  a working solution that tags potential copyright issues in near real time
  with a greater than 90% accuracy. It is to function on at least all
 medical
  and pharmacology articles but I would not complain if it worked on all of
  Wikipedia. The WMF is free to apply.
 
  --
  James Heilman
  MD, CCFP-EM, Wikipedian
 
  The Wikipedia Open Textbook of Medicine
  www.opentextbookofmedicine.com
  ___
  Wikimedia-l mailing list, guidelines at:
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

 ___
 Wikimedia-l mailing list, guidelines at:
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Catching copy and pasting early

2014-07-21 Thread Jane Darnell
Isn't that what Corenbot does/did? I always found it very confusing though
whenever I ran into it, and the false positives are huge (so many sites
copy Wikimedia content these days)


On Mon, Jul 21, 2014 at 9:11 AM, Pine W wiki.p...@gmail.com wrote:

 It should be relatively easy to catch a significant percentage of those
 copyright violations with the assistance of automated search tools. The
 trick is to do it at a large scale in near-realtime, which might require
 some computationally intensive and bandwidth intensive work. James, can I
 suggest that you take this discussion to Wiki-Research-l? There are a
 number of ways that the copyright violation problem could be addressed and
 I think this would be a good subject for discussion on that list, or at
 Wikimania. Depending on how the discussion on Research goes, it might be
 good to invite some dev or tech ops people to participate in the discussion
 as well.

 Pine


 On Sun, Jul 20, 2014 at 7:05 PM, Leigh Thelmadatter osama...@hotmail.com
 wrote:

  This is one of the best ideas Ive read on here!
 
 
   Date: Sun, 20 Jul 2014 20:00:28 -0600
   From: jmh...@gmail.com
   To: wikimedia-l@lists.wikimedia.org; eloque...@gmail.com;
  fschulenb...@wikimedia.org; ladsgr...@gmail.com; jorlow...@gmail.com;
  madman.enw...@gmail.com; west.andre...@gmail.com
   Subject: [Wikimedia-l] Catching copy and pasting early
  
   Come across another few thousand edits of copy and paste violations
 again
   today. These have occurred over more than a year. It is wearing me out.
   Really what is the point on collaborating on Wikipedia if it is simply
 a
   copyright violation. We need a solution and one has been proposed here
 a
   couple of years ago https://en.wikipedia.org/wiki/Wikipedia:Turnitin
  
   We now need programmers to carry it out. The Wiki Education Foundation
  has
   expressed interest. We will need support from the foundation as this
   software will likely need to mesh closely with edits as they come in. I
  am
   willing to offer $5,000 dollars Canadian (almost the same as American)
  for
   a working solution that tags potential copyright issues in near real
 time
   with a greater than 90% accuracy. It is to function on at least all
  medical
   and pharmacology articles but I would not complain if it worked on all
 of
   Wikipedia. The WMF is free to apply.
  
   --
   James Heilman
   MD, CCFP-EM, Wikipedian
  
   The Wikipedia Open Textbook of Medicine
   www.opentextbookofmedicine.com
   ___
   Wikimedia-l mailing list, guidelines at:
  https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
   Wikimedia-l@lists.wikimedia.org
   Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
  mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
 
  ___
  Wikimedia-l mailing list, guidelines at:
  https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
  mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
 
 ___
 Wikimedia-l mailing list, guidelines at:
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Catching copy and pasting early

2014-07-21 Thread Federico Leva (Nemo)
This has nothing to do with wiki-research-l, it's mostly about who can
pay for the API keys. https://wikitech.wikimedia.org/wiki/Web_search

Nemo

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Catching copy and pasting early

2014-07-21 Thread rupert THURNER
hi james, this sounds interesting. the only obstacle i see is that neither
the service nor the software nor the database are open. by using turnitin
we would give up on the principle of opening contents, at least in this
area. this might be perceived bad for opening content, editor retention,
and gaining new editors.

from a turnitin standpoint this might be marketing until they are well
known. as soon their goals are reached there is danger this will be
discontinued or the terms changed to be not acceptable any more.

technical and license obstacles do exist. to properly check wikipedia
editors would need access to restricted content. either content is copied
somewhere in the open, permission granted for all, or in the worst case, a
restricted number of editors.

not to forget is potential link spam. turnitin would put links to a site
not in control of the movement all over the place. they could without
problems put other content behind the links at any time.

but, i do think one could find a less risky and more open way to do check
for plagiarism. one might find better ways to access wikipedia dumps,
better access to recent changes, permission to automatically report back
issues, nitification of contributors and admins. all in a kind of
standardized way (i.e. well defined interface), and most important, usable
by everybody, for free.

rupert
 Am 21.07.2014 04:05 schrieb Leigh Thelmadatter osama...@hotmail.com:

 This is one of the best ideas Ive read on here!


  Date: Sun, 20 Jul 2014 20:00:28 -0600
  From: jmh...@gmail.com
  To: wikimedia-l@lists.wikimedia.org; eloque...@gmail.com;
 fschulenb...@wikimedia.org; ladsgr...@gmail.com; jorlow...@gmail.com;
 madman.enw...@gmail.com; west.andre...@gmail.com
  Subject: [Wikimedia-l] Catching copy and pasting early
 
  Come across another few thousand edits of copy and paste violations again
  today. These have occurred over more than a year. It is wearing me out.
  Really what is the point on collaborating on Wikipedia if it is simply a
  copyright violation. We need a solution and one has been proposed here a
  couple of years ago https://en.wikipedia.org/wiki/Wikipedia:Turnitin
 
  We now need programmers to carry it out. The Wiki Education Foundation
 has
  expressed interest. We will need support from the foundation as this
  software will likely need to mesh closely with edits as they come in. I
 am
  willing to offer $5,000 dollars Canadian (almost the same as American)
 for
  a working solution that tags potential copyright issues in near real time
  with a greater than 90% accuracy. It is to function on at least all
 medical
  and pharmacology articles but I would not complain if it worked on all of
  Wikipedia. The WMF is free to apply.
 
  --
  James Heilman
  MD, CCFP-EM, Wikipedian
 
  The Wikipedia Open Textbook of Medicine
  www.opentextbookofmedicine.com
  ___
  Wikimedia-l mailing list, guidelines at:
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

 ___
 Wikimedia-l mailing list, guidelines at:
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Catching copy and pasting early

2014-07-21 Thread Leigh Thelmadatter
I dont know if we have to work with Turn-it-in per se. Given that most 
copy-paste is from digital sources, we could start with a program/tool to 
search the web, which could lessen the burden on editors significantly by 
itself. (Speaking as a writing teacher, I can tell you this is how I find proof 
of the majority of plagiarism instances... people are often not-too-smart when 
they are cheating).

If this works, I bet we could then work with some of the databases to allow the 
access needed to check, as it would require only excepts from the text, not the 
entire text... enough to justify a revert.

Granted this would not catch everything (no system does) but it would be a very 
good step in the right direction.



 Date: Mon, 21 Jul 2014 10:15:44 +0200
 From: rupert.thur...@gmail.com
 To: wikimedia-l@lists.wikimedia.org
 Subject: Re: [Wikimedia-l] Catching copy and pasting early
 
 hi james, this sounds interesting. the only obstacle i see is that neither
 the service nor the software nor the database are open. by using turnitin
 we would give up on the principle of opening contents, at least in this
 area. this might be perceived bad for opening content, editor retention,
 and gaining new editors.
 
 from a turnitin standpoint this might be marketing until they are well
 known. as soon their goals are reached there is danger this will be
 discontinued or the terms changed to be not acceptable any more.
 
 technical and license obstacles do exist. to properly check wikipedia
 editors would need access to restricted content. either content is copied
 somewhere in the open, permission granted for all, or in the worst case, a
 restricted number of editors.
 
 not to forget is potential link spam. turnitin would put links to a site
 not in control of the movement all over the place. they could without
 problems put other content behind the links at any time.
 
 but, i do think one could find a less risky and more open way to do check
 for plagiarism. one might find better ways to access wikipedia dumps,
 better access to recent changes, permission to automatically report back
 issues, nitification of contributors and admins. all in a kind of
 standardized way (i.e. well defined interface), and most important, usable
 by everybody, for free.
 
 rupert
  Am 21.07.2014 04:05 schrieb Leigh Thelmadatter osama...@hotmail.com:
 
  This is one of the best ideas Ive read on here!
 
 
   Date: Sun, 20 Jul 2014 20:00:28 -0600
   From: jmh...@gmail.com
   To: wikimedia-l@lists.wikimedia.org; eloque...@gmail.com;
  fschulenb...@wikimedia.org; ladsgr...@gmail.com; jorlow...@gmail.com;
  madman.enw...@gmail.com; west.andre...@gmail.com
   Subject: [Wikimedia-l] Catching copy and pasting early
  
   Come across another few thousand edits of copy and paste violations again
   today. These have occurred over more than a year. It is wearing me out.
   Really what is the point on collaborating on Wikipedia if it is simply a
   copyright violation. We need a solution and one has been proposed here a
   couple of years ago https://en.wikipedia.org/wiki/Wikipedia:Turnitin
  
   We now need programmers to carry it out. The Wiki Education Foundation
  has
   expressed interest. We will need support from the foundation as this
   software will likely need to mesh closely with edits as they come in. I
  am
   willing to offer $5,000 dollars Canadian (almost the same as American)
  for
   a working solution that tags potential copyright issues in near real time
   with a greater than 90% accuracy. It is to function on at least all
  medical
   and pharmacology articles but I would not complain if it worked on all of
   Wikipedia. The WMF is free to apply.
  
   --
   James Heilman
   MD, CCFP-EM, Wikipedian
  
   The Wikipedia Open Textbook of Medicine
   www.opentextbookofmedicine.com
   ___
   Wikimedia-l mailing list, guidelines at:
  https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
   Wikimedia-l@lists.wikimedia.org
   Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
  mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
 
  ___
  Wikimedia-l mailing list, guidelines at:
  https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
  Wikimedia-l@lists.wikimedia.org
  Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
  mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
 ___
 Wikimedia-l mailing list, guidelines at: 
 https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
  
___
Wikimedia-l mailing list, guidelines at: 

[Wikimedia-l] [Wikimedia Announcements] Wikimedia Nederland in June

2014-07-21 Thread Sandra Rientjes Wikimedia Nederland
The report on Wikimedia Nederland activities in June is available:

https://meta.wikimedia.org/wiki/Wikimedia_chapters/Reports/Wikimedia_Nederland/201406

It is also included as text in this message.


Sandra Rientjes
Directeur/Executive Director Wikimedia Nederland





*COMMUNITY: supporting and mobilising volunteers and editors*

· Wiki-Saturdays.

The WMNL office was open for community members on June 7 and 21. The event
on the 7th was visited by 10 members. The first ideas for the year plan
2015 were collected by the 12 participants on the 21st. (see below)



*WORK: content, collaboration and activity development*


   - Wiki Loves Earth events

In total, the Wiki Loves Earth competition resulted in 1349 images of Dutch
nature, wildlife and landscapes uploaded to Wikimedia Commons. The
Institute for Vision and Sounds donated 500 videos of Dutch birds (see
below under content donations for more info) and launched a competition to
integrate the films in the relevant Wikipedia-articles. We also met with
Naturalis, the museum of natural history, to discuss content donations and
an editathon, and are approaching volunteer organisations working on nature
and wildlife for future cooperation.

· Education Programme

WMNL has commissioned a small consultancy firm to carry out a study into
the feasibility of starting an education programme in the Netherlands. The
consultants did desk research and interviewed relevant stakeholders. In
their report, they conclude that there is definitely potential for an
education programme focussing on higher education (universities and
colleges. They recommended carrying out three pilotprojects, exploring
improving content of articles, translating articles from other language
versions of Wikipedia and improving language and style of articles. More
information can be found on the project page.

· World War Two

Volunteers working on the World War Two project made an inventory of topics
whose content could be expanded and improved. This will serve as a
framework for content donations by the National Institute for War Research
NIOD and the Institute for Vision and Sound. We also decided to approach
local history groups and museums. More information on the project page. And
we are also still looking for more volunteers to work on this activity!

· Wikipedia workshops at libraries

Two workshop were organised: a monthly open workshop in Amersfoort (June
13) and a workshop for 9 employees of the public library in Amsterdam (OBA,
June 24). The workshop in Amersfoort attracted people that specifically
came to the library for the workshop and the director of the archive (the
archive is located in the same building).

· Music Edit-a-thon

On Saturday June 14th an Edit-a-thon was organized at the Netherlands
Institute for Sound and Vision. About 30 people showed up to write about
Dutch music from the first half of the twentieth century. There were 15
first-time editors present. About twenty articles were written (see the
course page for all the contributions of the day). Photos of the day.

· Content donations

*The Netherlands Institute for Sound and Vision: 500+ videos of birds*

The Netherlands Institute for Sound and Vision has just finished uploading
over 500 video's of bird filmed in The Netherlands by a professional film
producer. This is a donation made possible by the Foundation for Nature
Footage (Stichting Natuurbeelden) and the Netherlands Institute for Sound
and Vision. The entire collection can be found in the category
commons:Category:Stichting_Natuurbeelden_on_Open_Images Stichting
Natuurbeelden on Open Images.


 *Atlas of Mutual Heritage on Commons: 2.500 old maps, prints and drawings*

2500 images related to the overseas history of The Netherlands have been
uploaded to Wikimedia Commons by Hay Kranen. More information about this
upload can be found on the project page and in Hay's email to the GLAM
mailinglist.

· Re-source fashion conference

Sebastiaan ter Burg was a panellist at the Re-source fashion conference in
Arnhem.



*WMNL*

· Newsletter

The newsletter was published on June 30. 69 people signed up for the WMNL
newsletter, mainly as a result of the Wiki Loves Earth events.

· Linux-group meeting

Board members Ad Huikeshoven and Justus de Bruijn attended the annual
summer meeting of the Netherlands Linux Usersgroup. They presented WMNL and
explored possibilities for synergy and cooperation.



*GLOBAL*



· Free Culture Weekend

Sebastiaan ter Burg attended the Culture Weekend in London (notes).

· Wikimania Scholarships

WMNL provided Wikimania scholarships to 11 members of the Dutch Wikimedia
community. In addition, several staff and boardmembers will also attend the
event in London.





*GOVERNANCE*

· Board

The WMNL Board met on June 12.

· Annual plan 2015

We started drafting of the annual plan 2015 with a brainstorming meeting on
June 21,