Re: dataset descriptions

Joachim Baran Thu, 13 Feb 2014 16:08:10 -0800

I have added a few sentences under 6.5 (Statistics) as discussed during the 
last conf call.


Joachim

On February 12, 2014 at 8:55:50 PM, Michel Dumontier 
(michel.dumont...@gmail.com) wrote:

Hmmm... maybe every dataset, whether a proper subset or not, should be seen as 
its own dataset.  that way we keep our focus on the versions and distributions 
of any dataset.

m.

Michel Dumontier
Associate Professor of Medicine (Biomedical Informatics), Stanford University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
http://dumontierlab.com


On Mon, Feb 10, 2014 at 3:37 PM, Freimuth, Robert R., Ph.D. 
<freimuth.rob...@mayo.edu> wrote:
Hi Michel,

 

As you know, I don’t attend this particular call.  However, if I understand the 
question properly, I’d like to risk weighing in.  Feel free to tell me that I’m 
off-base and I’ll go back to lurking. J  Two points:

 

If arbitrary collections are supported, it must be assumed that (eventually) 
collections of collections will be created.  In addition, subsets of subsets 
will be created.  I assume this is supported.

 

Would the subsets be of the same type as the parent?  If not, problems may 
arise when one person’s set is another person’s subset.

 

These comments are based on my experiences developing the LS DAM, where we ran 
into this issue in a couple of places, especially when we tried to incorporate 
the ISA (Investigation Study Assay)  model.  The distinction between the levels 
was somewhat arbitrary, which created difficulties as it was up to the user to 
decide (arbitrarily) how to model a given thing.

 

I hope this helps.

 

Thanks,

Bob

 

From: Michel Dumontier [mailto:michel.dumont...@gmail.com]
Sent: Monday, February 10, 2014 2:34 PM
To: w3c semweb hcls
Cc: Alasdair Gray
Subject: dataset descriptions

 

Hi all,

  on today's call we got some feedback from Chris Mungall, Melissa Haendel, and 
Harry Hochheiser. Chris asked whether (and how), we could make arbitrary 
collections, for instance, chembl-rdf as a dataset (without necessarily 
specifying the version). i wondered if perhaps we could generalize our "version 
level" to a "subset level", which could very well include version subsets. 

 

https://docs.google.com/drawings/d/136kVhd2ffx8qauyT2qMJKgKcWu7O-uvZ2tuH6DejCQ4/edit

 

I also wondered whether this subset level description could point to the 
distribution level descriptions as sources used in creating it, as more 
abstract than our previous distribution-to-distribution case.

 

https://docs.google.com/drawings/d/1qCG2Gl2ZtwuAO2clcya5q067FxPFs7UAHiIk18xzEcY/edit

 

 what do you think?

 

m.

 



Michel Dumontier

Associate Professor of Medicine (Biomedical Informatics), Stanford University

Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group

http://dumontierlab.com

Re: dataset descriptions

Reply via email to