Thanks, Charles.

A lot of scholarship is still available only through subscription or 
per-article toll access. I recall considerable discussion a few years ago (in a 
prior position focused on licensing electronic resources for many libraries) 
about publishers refusing to allow massive downloading that is needed for text 
mining, viewing this as a breach of the license and cutting off access for the 
library, or setting up DRM to prevent automatic downloading. Over the years 
libraries began to add text mining to model license agreements.  Some 
publishers see this as a new use they have a right to charge more for, even 
publishers already making high profits. I think we can agree that this blocks 
progress in advancing our knowledge and this needs to change.

To avoid blocking advances we need researchers to be able to make working 
copies (cached or permanently stored on their hard drives but not for 
redistribution) of the entire corpus of scholarly works going back to the 
beginning of scholarship.

This requires changes in copyright law and publishing practice with respect to 
toll access works at the buying as well as selling end, in addition to 
dissemination of OA works in both repositories and publications that facilitate 
text mining.

This principle is necessary beyond research using scholarly materials, for 
example research in the social sciences, humanities and arts needs to be able 
to do this with news sources, works of art and literature, social media and so 
forth.

With respect to copyright, if publishers are seeking to expand their rights in 
the EU and may have legitimate reasons to protect their profits from other 
commercial entities (that is others who would take content and use it to sell 
advertising for a profit in competition with the original publisher etc, not 
researchers doing research in the context of a job), one option would be to 
push to carve out a broad-based exception for research. This would be 
beneficial for the EU where I understand fair dealing rights are not always in 
local copyright law and interpretations of user rights under Berne is 
conservative in some countries.

The reason research rights needs to be broad-based in because others need these 
rights, too. Journalists do research; newspaper publishers who are pushing for 
an expansion of rights may understand the benefits of research exceptions that 
include them.

For open access, permitting downloading, storage and manipulation of documents 
to facilitate creation of new knowledge is a basic that I would agree we should 
all be striving towards. This isn't just about text mining, individuals need to 
be able to add their own notes and comments to working copies, copy and paste 
text and easily maintain citation information to reorganize in preparation for 
writing, etc. Also ideally works should be in electronic forms that print 
disabled readers can easily convert to formats that work for them.

Getting there requires:

-  work on publication formats as the current popular ones are not designed for 
this

- user education about the potential starting at the reader end; it is easier 
to see why to allow this if you think about this as a reader

- education including in the Libre access camp about the reading needs and 
challenging for people with disabilities, eg that our tendency to move towards 
more visual presentation of data increases the challenges for them

- general education about plagiarism and legitimate copyright restrictions 
(copy and paste facilitates legitimate uses, but also plagiarism and violations 
of trademarks - logos are popular items for re-use, and other neighbouring 
rights)

- abandoning conflating of this potential with CC licenses which tie arguments 
for OA for research with non-research downstream use that has negative 
implications that have nothing to do with advancing knowledge such as selling 
works or advertising for a profit

- understanding that this will take time. People who completely agree with this 
are likely not at liberty to enforce every OA IR deposit to meet the licensing 
and formatting requirements for optimal downstream research use. PDF is a 
popular reading format. Libre OA for new and emerging research does not address 
back issues and the non-research works researchers use in conducting research.

My two bits. It would be helpful to hear about others' experiences and the 
latest in text mining provisions in subscriptions licenses.

Heather




-------- Original message --------
From: CHARLES OPPENHEIM 
<c.oppenh...@btinternet.com<mailto:c.oppenh...@btinternet.com>>
Date: 2017-01-26 4:39 AM (GMT-05:00)
To: pm...@cam.ac.uk<mailto:pm...@cam.ac.uk>, Heather Morrison 
<heather.morri...@uottawa.ca<mailto:heather.morri...@uottawa.ca>>, 
goal@eprints.org<mailto:goal@eprints.org>
Subject: Re: [GOAL] How much of the content in open repositories is able to 
meet the definition of open access?

To do automated TDM, one needs to copy the entire table, irrespective of which 
bits are subsequently analysed, and so there is a potential breach of ©.  
Whilst  this MAY be acceptable under an exception to ©, such as fair 
dealing/fair use, that would only generally apply if it was for 
"non-commercial" research purposes, whatever that term might mean in different 
jurisdictions. So researchers (and their librarians) will be understandably 
cautious and risk-averse regarding TDM, and this, in turn, is currently 
inhibiting the use of TDM techniques.

Charles


Professor Charles Oppenheim
----Original message----
>From : pm...@cam.ac.uk<mailto:pm...@cam.ac.uk>
Date : 24/01/2017 - 15:10 (GMT)
To : goal@eprints.org<mailto:goal@eprints.org>
Subject : Re: [GOAL] How much of the content in open repositories is able to 
meet the definition of open access?



On Tue, Jan 24, 2017 at 2:10 PM, Heather Morrison 
<heather.morri...@uottawa.ca<mailto:heather.morri...@uottawa.ca>> wrote:
Another critique that may be more relevant to this argument: I challenge PMR's 
contention that it is necessary to limit this kind of research to works that 
are licensed CC-BY. If you gather data from a great many different tables and 
analyze it, what you will be publishing is your own work.

This is no different from doing a great deal of reading and thinking and 
writing a new work that draws on this knowledge, with appropriate citations to 
the works that you have read.

Copyright is only invoked if you want to actually copy an original table for 
inclusion in a publication. If you are drawing on data from thousands of tables 
it is not clear how often this will happen. If what you want to copy is an 
insubstantial amount this would be covered under fair dealing. If the work is 
free-to-read, whether All Rights Reserved or under an open license, you can 
point readers to the original. At worst, this is a minor inconvenience.

This is completely wrong. The problem is that this is a legal issue and 
copyright law, by default, covers all aspects of copying. Copying material into 
a machine for the purpose of mining involves copyright. Whether it seems 
reasonable or fair is irrelevant. If you carry out mining then you should be 
prepared to answer in court.

The problem is compounded by:
* it is jurisdiction-dependent. Fair-use only exists in certain domains. It is 
not the same as fair dealing which is generally weaker. What is permissible in 
the US may not be in UK and vice versa.
* It is extremely complex. Guessing the law will not be useful.
* Much of the law has not been tested in court. "Non-commercial" is not what 
you or I would like it to mean. It is what a court finds when I or others are 
summoned before it.

I have been involved in this for over 4 years in the UK and in Europe 
(Parliament and Commission). There is no consensus on what should be allowed 
and what will ultimately be decided by the Commission and Member States. I have 
taken legal opinion on some of this and consulted with other experts and the 
answers are often unclear.

The legality of Text and Data Mining is formally unrelated to whether the miner 
publishes the results or not.


If you prefer to limit your research to works that are CC-BY licensed, it is 
your right to make this choice. Many other researchers, myself included, work 
with a wide range of data and do not choose to limit what we gather to works 
that are licensed CC-BY. One example from my own research: if a publisher has a 
table listing APCs, I screen scrape the table, pop the data into a spreadsheet, 
and work with it.

The primary issue for Text and data Mining is automated analysis of many 
tables. This is an inconsistency in the law that we are trying to get 
legislators to change.

Even publishers that use CC-BY for articles usually have All Rights Reserved 
for pages that contain this type of information.

Do you have metrics for this. Because this is incompatible with the licence and 
should be challenged - as I frequently do.

If I limited myself to data sources that are CC-BY I could not do this kind of 
research.

I agree that this is limiting and that is why it would be useful for scientific 
material to be licensed CC BY.

In summary this is a complex legal question and the answers have to be based on 
law not guesswork.




--
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069





_______________________________________________
GOAL mailing list
GOAL@eprints.org
http://mailman.ecs.soton.ac.uk/mailman/listinfo/goal

Reply via email to