Re: [Wikimedia-l] Catching copy and pasting early
It should be relatively easy to catch a significant percentage of those copyright violations with the assistance of automated search tools. The trick is to do it at a large scale in near-realtime, which might require some computationally intensive and bandwidth intensive work. James, can I suggest that you take this discussion to Wiki-Research-l? There are a number of ways that the copyright violation problem could be addressed and I think this would be a good subject for discussion on that list, or at Wikimania. Depending on how the discussion on Research goes, it might be good to invite some dev or tech ops people to participate in the discussion as well. Pine On Sun, Jul 20, 2014 at 7:05 PM, Leigh Thelmadatter osama...@hotmail.com wrote: This is one of the best ideas Ive read on here! Date: Sun, 20 Jul 2014 20:00:28 -0600 From: jmh...@gmail.com To: wikimedia-l@lists.wikimedia.org; eloque...@gmail.com; fschulenb...@wikimedia.org; ladsgr...@gmail.com; jorlow...@gmail.com; madman.enw...@gmail.com; west.andre...@gmail.com Subject: [Wikimedia-l] Catching copy and pasting early Come across another few thousand edits of copy and paste violations again today. These have occurred over more than a year. It is wearing me out. Really what is the point on collaborating on Wikipedia if it is simply a copyright violation. We need a solution and one has been proposed here a couple of years ago https://en.wikipedia.org/wiki/Wikipedia:Turnitin We now need programmers to carry it out. The Wiki Education Foundation has expressed interest. We will need support from the foundation as this software will likely need to mesh closely with edits as they come in. I am willing to offer $5,000 dollars Canadian (almost the same as American) for a working solution that tags potential copyright issues in near real time with a greater than 90% accuracy. It is to function on at least all medical and pharmacology articles but I would not complain if it worked on all of Wikipedia. The WMF is free to apply. -- James Heilman MD, CCFP-EM, Wikipedian The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Catching copy and pasting early
Isn't that what Corenbot does/did? I always found it very confusing though whenever I ran into it, and the false positives are huge (so many sites copy Wikimedia content these days) On Mon, Jul 21, 2014 at 9:11 AM, Pine W wiki.p...@gmail.com wrote: It should be relatively easy to catch a significant percentage of those copyright violations with the assistance of automated search tools. The trick is to do it at a large scale in near-realtime, which might require some computationally intensive and bandwidth intensive work. James, can I suggest that you take this discussion to Wiki-Research-l? There are a number of ways that the copyright violation problem could be addressed and I think this would be a good subject for discussion on that list, or at Wikimania. Depending on how the discussion on Research goes, it might be good to invite some dev or tech ops people to participate in the discussion as well. Pine On Sun, Jul 20, 2014 at 7:05 PM, Leigh Thelmadatter osama...@hotmail.com wrote: This is one of the best ideas Ive read on here! Date: Sun, 20 Jul 2014 20:00:28 -0600 From: jmh...@gmail.com To: wikimedia-l@lists.wikimedia.org; eloque...@gmail.com; fschulenb...@wikimedia.org; ladsgr...@gmail.com; jorlow...@gmail.com; madman.enw...@gmail.com; west.andre...@gmail.com Subject: [Wikimedia-l] Catching copy and pasting early Come across another few thousand edits of copy and paste violations again today. These have occurred over more than a year. It is wearing me out. Really what is the point on collaborating on Wikipedia if it is simply a copyright violation. We need a solution and one has been proposed here a couple of years ago https://en.wikipedia.org/wiki/Wikipedia:Turnitin We now need programmers to carry it out. The Wiki Education Foundation has expressed interest. We will need support from the foundation as this software will likely need to mesh closely with edits as they come in. I am willing to offer $5,000 dollars Canadian (almost the same as American) for a working solution that tags potential copyright issues in near real time with a greater than 90% accuracy. It is to function on at least all medical and pharmacology articles but I would not complain if it worked on all of Wikipedia. The WMF is free to apply. -- James Heilman MD, CCFP-EM, Wikipedian The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Catching copy and pasting early
This has nothing to do with wiki-research-l, it's mostly about who can pay for the API keys. https://wikitech.wikimedia.org/wiki/Web_search Nemo ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Catching copy and pasting early
hi james, this sounds interesting. the only obstacle i see is that neither the service nor the software nor the database are open. by using turnitin we would give up on the principle of opening contents, at least in this area. this might be perceived bad for opening content, editor retention, and gaining new editors. from a turnitin standpoint this might be marketing until they are well known. as soon their goals are reached there is danger this will be discontinued or the terms changed to be not acceptable any more. technical and license obstacles do exist. to properly check wikipedia editors would need access to restricted content. either content is copied somewhere in the open, permission granted for all, or in the worst case, a restricted number of editors. not to forget is potential link spam. turnitin would put links to a site not in control of the movement all over the place. they could without problems put other content behind the links at any time. but, i do think one could find a less risky and more open way to do check for plagiarism. one might find better ways to access wikipedia dumps, better access to recent changes, permission to automatically report back issues, nitification of contributors and admins. all in a kind of standardized way (i.e. well defined interface), and most important, usable by everybody, for free. rupert Am 21.07.2014 04:05 schrieb Leigh Thelmadatter osama...@hotmail.com: This is one of the best ideas Ive read on here! Date: Sun, 20 Jul 2014 20:00:28 -0600 From: jmh...@gmail.com To: wikimedia-l@lists.wikimedia.org; eloque...@gmail.com; fschulenb...@wikimedia.org; ladsgr...@gmail.com; jorlow...@gmail.com; madman.enw...@gmail.com; west.andre...@gmail.com Subject: [Wikimedia-l] Catching copy and pasting early Come across another few thousand edits of copy and paste violations again today. These have occurred over more than a year. It is wearing me out. Really what is the point on collaborating on Wikipedia if it is simply a copyright violation. We need a solution and one has been proposed here a couple of years ago https://en.wikipedia.org/wiki/Wikipedia:Turnitin We now need programmers to carry it out. The Wiki Education Foundation has expressed interest. We will need support from the foundation as this software will likely need to mesh closely with edits as they come in. I am willing to offer $5,000 dollars Canadian (almost the same as American) for a working solution that tags potential copyright issues in near real time with a greater than 90% accuracy. It is to function on at least all medical and pharmacology articles but I would not complain if it worked on all of Wikipedia. The WMF is free to apply. -- James Heilman MD, CCFP-EM, Wikipedian The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Catching copy and pasting early
I dont know if we have to work with Turn-it-in per se. Given that most copy-paste is from digital sources, we could start with a program/tool to search the web, which could lessen the burden on editors significantly by itself. (Speaking as a writing teacher, I can tell you this is how I find proof of the majority of plagiarism instances... people are often not-too-smart when they are cheating). If this works, I bet we could then work with some of the databases to allow the access needed to check, as it would require only excepts from the text, not the entire text... enough to justify a revert. Granted this would not catch everything (no system does) but it would be a very good step in the right direction. Date: Mon, 21 Jul 2014 10:15:44 +0200 From: rupert.thur...@gmail.com To: wikimedia-l@lists.wikimedia.org Subject: Re: [Wikimedia-l] Catching copy and pasting early hi james, this sounds interesting. the only obstacle i see is that neither the service nor the software nor the database are open. by using turnitin we would give up on the principle of opening contents, at least in this area. this might be perceived bad for opening content, editor retention, and gaining new editors. from a turnitin standpoint this might be marketing until they are well known. as soon their goals are reached there is danger this will be discontinued or the terms changed to be not acceptable any more. technical and license obstacles do exist. to properly check wikipedia editors would need access to restricted content. either content is copied somewhere in the open, permission granted for all, or in the worst case, a restricted number of editors. not to forget is potential link spam. turnitin would put links to a site not in control of the movement all over the place. they could without problems put other content behind the links at any time. but, i do think one could find a less risky and more open way to do check for plagiarism. one might find better ways to access wikipedia dumps, better access to recent changes, permission to automatically report back issues, nitification of contributors and admins. all in a kind of standardized way (i.e. well defined interface), and most important, usable by everybody, for free. rupert Am 21.07.2014 04:05 schrieb Leigh Thelmadatter osama...@hotmail.com: This is one of the best ideas Ive read on here! Date: Sun, 20 Jul 2014 20:00:28 -0600 From: jmh...@gmail.com To: wikimedia-l@lists.wikimedia.org; eloque...@gmail.com; fschulenb...@wikimedia.org; ladsgr...@gmail.com; jorlow...@gmail.com; madman.enw...@gmail.com; west.andre...@gmail.com Subject: [Wikimedia-l] Catching copy and pasting early Come across another few thousand edits of copy and paste violations again today. These have occurred over more than a year. It is wearing me out. Really what is the point on collaborating on Wikipedia if it is simply a copyright violation. We need a solution and one has been proposed here a couple of years ago https://en.wikipedia.org/wiki/Wikipedia:Turnitin We now need programmers to carry it out. The Wiki Education Foundation has expressed interest. We will need support from the foundation as this software will likely need to mesh closely with edits as they come in. I am willing to offer $5,000 dollars Canadian (almost the same as American) for a working solution that tags potential copyright issues in near real time with a greater than 90% accuracy. It is to function on at least all medical and pharmacology articles but I would not complain if it worked on all of Wikipedia. The WMF is free to apply. -- James Heilman MD, CCFP-EM, Wikipedian The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe ___ Wikimedia-l mailing list, guidelines at:
[Wikimedia-l] [Wikimedia Announcements] Wikimedia Nederland in June
The report on Wikimedia Nederland activities in June is available: https://meta.wikimedia.org/wiki/Wikimedia_chapters/Reports/Wikimedia_Nederland/201406 It is also included as text in this message. Sandra Rientjes Directeur/Executive Director Wikimedia Nederland *COMMUNITY: supporting and mobilising volunteers and editors* · Wiki-Saturdays. The WMNL office was open for community members on June 7 and 21. The event on the 7th was visited by 10 members. The first ideas for the year plan 2015 were collected by the 12 participants on the 21st. (see below) *WORK: content, collaboration and activity development* - Wiki Loves Earth events In total, the Wiki Loves Earth competition resulted in 1349 images of Dutch nature, wildlife and landscapes uploaded to Wikimedia Commons. The Institute for Vision and Sounds donated 500 videos of Dutch birds (see below under content donations for more info) and launched a competition to integrate the films in the relevant Wikipedia-articles. We also met with Naturalis, the museum of natural history, to discuss content donations and an editathon, and are approaching volunteer organisations working on nature and wildlife for future cooperation. · Education Programme WMNL has commissioned a small consultancy firm to carry out a study into the feasibility of starting an education programme in the Netherlands. The consultants did desk research and interviewed relevant stakeholders. In their report, they conclude that there is definitely potential for an education programme focussing on higher education (universities and colleges. They recommended carrying out three pilotprojects, exploring improving content of articles, translating articles from other language versions of Wikipedia and improving language and style of articles. More information can be found on the project page. · World War Two Volunteers working on the World War Two project made an inventory of topics whose content could be expanded and improved. This will serve as a framework for content donations by the National Institute for War Research NIOD and the Institute for Vision and Sound. We also decided to approach local history groups and museums. More information on the project page. And we are also still looking for more volunteers to work on this activity! · Wikipedia workshops at libraries Two workshop were organised: a monthly open workshop in Amersfoort (June 13) and a workshop for 9 employees of the public library in Amsterdam (OBA, June 24). The workshop in Amersfoort attracted people that specifically came to the library for the workshop and the director of the archive (the archive is located in the same building). · Music Edit-a-thon On Saturday June 14th an Edit-a-thon was organized at the Netherlands Institute for Sound and Vision. About 30 people showed up to write about Dutch music from the first half of the twentieth century. There were 15 first-time editors present. About twenty articles were written (see the course page for all the contributions of the day). Photos of the day. · Content donations *The Netherlands Institute for Sound and Vision: 500+ videos of birds* The Netherlands Institute for Sound and Vision has just finished uploading over 500 video's of bird filmed in The Netherlands by a professional film producer. This is a donation made possible by the Foundation for Nature Footage (Stichting Natuurbeelden) and the Netherlands Institute for Sound and Vision. The entire collection can be found in the category commons:Category:Stichting_Natuurbeelden_on_Open_Images Stichting Natuurbeelden on Open Images. *Atlas of Mutual Heritage on Commons: 2.500 old maps, prints and drawings* 2500 images related to the overseas history of The Netherlands have been uploaded to Wikimedia Commons by Hay Kranen. More information about this upload can be found on the project page and in Hay's email to the GLAM mailinglist. · Re-source fashion conference Sebastiaan ter Burg was a panellist at the Re-source fashion conference in Arnhem. *WMNL* · Newsletter The newsletter was published on June 30. 69 people signed up for the WMNL newsletter, mainly as a result of the Wiki Loves Earth events. · Linux-group meeting Board members Ad Huikeshoven and Justus de Bruijn attended the annual summer meeting of the Netherlands Linux Usersgroup. They presented WMNL and explored possibilities for synergy and cooperation. *GLOBAL* · Free Culture Weekend Sebastiaan ter Burg attended the Culture Weekend in London (notes). · Wikimania Scholarships WMNL provided Wikimania scholarships to 11 members of the Dutch Wikimedia community. In addition, several staff and boardmembers will also attend the event in London. *GOVERNANCE* · Board The WMNL Board met on June 12. · Annual plan 2015 We started drafting of the annual plan 2015 with a brainstorming meeting on June 21,