Re: [tdwg-content] Implementing Darwin Core in RDF

Douglas Campbell Tue, 16 Aug 2016 23:02:57 -0700

Thanks for taking the time to bring me up to speed, Steve.

I'm familiar with the complexities of preparing specifications and realise I've 
come in mid-way.  I'll spend some time reading up on the containers and 
occurrences history.  But I'm unclear whether it is better to use DSW terms in 
anticipation of them having longevity, or just to mint our own in the meantime?


For the taxon convenience fields...

I thought I read in the DwC RDF schema that properties like taxonRank were in 
the Taxon class (so using them in Identification conflicts with their 
definition), but looking again I see that the spec uses 
"dwcattributes:organizedInClass" which specifically does not imply domain or 
range.  So now I'm at peace with that. :)

However, I am pointing to our own RDF versions of taxon classification terms 
that we use, and using DwC properties to define these taxon terms, plus I am 
combing these all together in a single JSON-LD API result.  So at this point I 
don't think I need to repeat the convenience properties in Identification (as 
they are available directly in the Taxon object).  While this seems to fit our 
purpose I can see that this may be sub-optimal for others who download the data 
and use it separately - I'll need to contemplate that scenario some more.

Cheers,
Douglas


From: Steve Baskauf [mailto:[email protected]]
Sent: Wednesday, 17 August 2016 6:32 a.m.
To: Douglas Campbell
Cc: [email protected]
Subject: Re: [tdwg-content] Implementing Darwin Core in RDF

Douglas,
I was the lead author on the DwC RDF Guide, so I can try to answer your 
questions about it.  The TDWG RDF Task Group is still in operation, although it 
hasn't been very active for the past several years.  The RDG TG has an online 
"home" at the TDWG Github site.[1]  However, the content didn't survive the 
migration from Google Code very well, so it takes some effort at this point to 
sort through it.  The TG also has an email list [2] but there has been little 
traffic on it recently.

*Dereferencing of the DwC IRI namespace* - Unfortunately, the dwciri: namespace 
terms don't dereference at the present time.  This needs to be corrected.  I've 
created a Turtle serialization [3] of how I think the RDF should be written for 
the dwciri: terms, but it isn't served when one attempts to dereference the 
terms and hasn't been incorporated into the official DwC repository.  Part of 
the problem here is that the guidelines for documenting terms in 
machine-readable form are still going through the adoption process.[4] I'm 
hopeful that when the Documentation Specification is ratified, we can make sure 
that all existing DwC terms dereference in a consistent manner.

*Best practice for connecting containers together* - By this, I'm assuming you 
mean linking instances of the various Darwin Core classes, or in RDF terms, 
linking nodes.  The RDF Guide is silent on how to do this.  That's not great 
from the standpoint of actually turning Darwin Core records into RDF, but it 
was a way to complete the guide in a finite amount of time.  What is missing is 
a consensus domain model that would lay out how instances of the Darwin Core 
classes would be linked.  Such a model should be developed, but that has not 
yet happened.  Again, there is a draft standard submitted for review [5], which 
if adopted will specify (in Section 4) a process for developing such a model.  
When we wrote the RDF Guide, we provided ancillary documents [6], which 
included examples that followed the RDF Guide and linked instances using 
various proposed models.  There are links to web pages containing examples 
using TaxonConcept, BiSciCol, and Darwin-SW object properties to link class 
instances.  I am not sure whether there is any RDF "in the wild" for the first 
two examples.  I'm more familiar with Darwin-SW, as I was involved in its 
development [7].  There is a Semantic Web Journal article about Darwin-SW [8], 
so I won't go into detail about it here, except to say that its data model was 
developed following an extensive discussion on the tdwg-content email list [9] 
about how members of the community understood the Darwin Core classes.  The 
relationship between Darwin-SW model and the historical 1993 ACS Model can be 
viewed at [10].  There are a bit over a million triples of data "in the wild" 
modeled on Darwin-SW in accordance with the DwC RDF guide, accessible at a 
SPARQL end point. [11]  Some examples showing how to play around with SPARQL 
queries of these data are at [12].

*The overlapping scope of Occurrence and Specimen types* - There is a long 
history behind the meaning of "Occurrence".  There is an out-of-date-summary of 
some of the discussion around this topic in the Darwin-SW documentation [13].  
I think that at the time when Darwin Core was originally adopted, an Occurrence 
was considered a sort of superclass of the Specimen and Observation classes.  
However, after a lot of discussion, the meaning of dwc:Occurrence was clarified 
by changing it to its current definition: "An existence of an Organism (sensu 
http://rs.tdwg.org/dwc/terms/Organism) at a particular place at a particular 
time."  In this view, an Occurrence isn't a concrete thing like a Specimen - 
it's more like a database join between an Event instance (time and place) and 
an Organism, which allows for a one-to-many relationship between a Organism and 
Occurrences, and a one-to-many relationships between an Event and Occurrences.  
It also allows for a single occurrence of an organism at a time and place to be 
documented by one-to-many forms of evidence, which could include 
PreservedSpecimens, HumanObservation data, or images of various sorts.  In RDF 
terms, an Occurrence could be thought of as a node that is linked to Event, 
Organism, and evidence instances nodes.  You can see this represented 
graphically at [7], where "dsw:Token" refers to a generic class for evidence.  
In any case, separating Occurrence (as a node linking Events to Organisms) from 
Specimen allows an Occurrence to be documented by one to many instances of any 
kind of evidence, or even multiple kinds of evidence.  For example, an 
Occurrence could be documented by a PhysicalSpecimen as well as several images. 
 Here is an example of an organism with two Occurrences:
http://bioimages.vanderbilt.edu/org-jorgem/rec13_0004
The first occurrence on 2013-07-24 was documented by 42 camera trap images, and 
the second occurrence on 2013-07-25 was documented by 21 camera trap images.  
You can see how this was represented in RDF at [14].  In most cases, specimen 
records will be much simpler than this, with one organism, documented at one 
occurrence, with evidence of one PreservedSpecimen.  Such a simpler case could 
be represented with a simpler model.  But the more complex model allows 
specimen-derived occurrence records to be merged with other kinds of occurrence 
records, such as the camera trap example I gave, mark-recapture bird banding 
observations, iNaturalist occurrences documented by photos of the organism, etc.

*Conflicting usage of Taxon fields in the Identification object* - In order to 
explain the rationale behind why what seem to be taxon-related properties are 
assigned to Identification instances, I must refer to the idea of "convenience 
terms" as expressed in Section 2.7 of the RDF Guide.[15]   In a perfect world, 
we would have the following:

a collection item linked by dwciri:inCollection to an IRI-identified collection
an identification instance linked by dwciri:toTaxon to an IRI-identified taxon 
(a.k.a. taxon concept)
a location instance linked by dwciri:inDescribedPlace to an IRI-identified 
geographic place (a.k.a. "feature")

If the linked IRI-identified object resources were described by RDF, it would 
not be necessary to include any of the Darwin Core "convenience" properties 
included in Table 3.5 [16].  The information contained in the values of those 
properties could be discovered by dereferencing the object IRIs and traversing 
subsequent links from that RDF.  However, if those IRIs don't exist, then the 
convenience properties provide a string-based mechanism to relate the subject 
resource to other resources that should be linked to the same (unidentified) 
object resource.  So for example, if we say a specimen has the convenience 
properties and values

dwc:collectionCode="Mamm"
dwc:institutionCode ="MVZ"

we are not saying that "Mamm" is the collection code of the specimen and that 
"MVZ" is the institution code of the specimen.  Rather, we mean that the 
specimen should be linked to a collection (with unknown IRI) whose code is MVZ 
and whose owning institution has the code "MVZ".  Similarly, if we say that an 
identification has the convenience properties and values

dwc:genus="Hersiliiadae"
dwc:specificEpithet="yaeyamaensis"

we are not saying that "yaeyamaensis" is the specific epithet of the 
identification and that "Hersiliiadae" is the genus of the identification.  
Rather, we mean that the identification should be linked to a taxon (with 
unknown IRI) for which the specificEpithet part of its name string is 
"yaeyamaensis", which is included in the genus "Hersiliiadae".  This may seem 
odd, particularly if you are used to thinking of genus and specific epithet as 
properties of a taxon.  But the sets of DwC convenience properties are intended 
to be a temporary, string-based way to describe an unidentified resource to 
which the subject resource should be linked.  At some future time, if IRIs can 
be discovered, those sets of convenience properties might be dropped if 
dereferencing the IRIs provides the same information.  In these examples, one 
might replace with:

a collection item linked by dwciri:inCollection to 
http://grbio.org/cool/0rht-pj95
an identification instance linked to 
http://zoobank.org/75C9EA16-72B1-44C9-AD40-3C3D41323AB9

although I don't think either of these IRIs currently dereference to meaningful 
machine-readable RDF (although they have human-readable web pages).

I hope that this has provided you with some answers, or at least a starting 
point for additional exploration or questions.  Please feel free to reply if 
there were parts of what I wrote that weren't clear.

Steve Baskauf

[1] https://github.com/tdwg/rdf
[2] http://groups.google.com/group/tdwg-rdf
[3] 
https://github.com/tdwg/vocab/blob/master/code-examples/darwin-core/dwciri.ttl
[4] https://github.com/tdwg/vocab/blob/master/documentation-specification.md
[5] https://github.com/tdwg/vocab/blob/master/maintenance-specification.md
[6] https://github.com/tdwg/rdf/blob/master/DwCAncillary.md
[7] https://github.com/darwin-sw/dsw
[8] 
http://www.semantic-web-journal.net/content/darwin-sw-darwin-core-based-terms-expressing-biodiversity-data-rdf-1
[9] https://github.com/darwin-sw/dsw/wiki/TdwgContentEmailSummary
[10] https://github.com/darwin-sw/dsw/blob/master/img/acs-dsw-poster.pptx
[11] http://rdf.library.vanderbilt.edu/sparql?view
[12] 
https://github.com/HeardLibrary/semantic-web/blob/master/learning-sparql/learning-sparql-ch3-part2-answers.md
[13] https://github.com/darwin-sw/dsw/wiki/ClassOccurrence
[14] http://bioimages.vanderbilt.edu/org-jorgem/rec13_0004.rdf
[15] 
http://rs.tdwg.org/dwc/terms/guides/rdf/index.htm#2.7_Darwin_Core_convenience_terms
[16] 
http://rs.tdwg.org/dwc/terms/guides/rdf/index.htm#3.5_Darwin_Core_convenience_terms_that_are_expected_to_be_used_o



Douglas Campbell wrote:
Hi all,

I am implementing Darwin Core in RDF as part of our API at Te Papa (Museum of 
New Zealand).  My aim is to map our specimen metadata to rich Darwin Core RDF 
using JSON-LD, then 'dumb down' to Simple Darwin Core to contribute to virtual 
herbariums.  I have mocked-up some records, however there are some areas where 
I'm not quite sure how to interpret the Darwin Core RDF Guide.

The areas of confusion I have include:
* Best practice for connecting containers together
* Dereferencing of the DwC IRI namespace
* The overlapping scope of the Occurrence and Specimen types
* Conflicting usage of Taxon fields in the Identification object.

I'm hoping for suggestions:
1. Are there any implementations of DwC RDF data online that I could look at as 
examples to follow?
2. What/to whom is the best way to ask specific questions about DwC RDF?

At this stage our API prototype is only available internally but there is some 
documentation available publicly at:
https://github.com/te-papa/collections-api/wiki

Thanks in advance,
Douglas

Douglas Campbell
Business Analyst
Collections Information Services
Museum of New Zealand Te Papa Tongarewa

________________________________

Visit the Te Papa website http://www.tepapa.govt.nz
The email message together with the accompanying attachments may be 
CONFIDENTIAL. If you have received this message in error, please notify 
https://www.tepapa.govt.nz/about/contact-us/general-enquiries immediately and 
delete the original message. The views expressed in this message are those of 
the individual sender, except where the sender specifically states them to be 
views of Te Papa.  Te Papa employs strict virus checking measures and accepts 
no liability for any loss caused either directly or indirectly by a virus 
arising from the use of this message or any attached file.

________________________________
This email has been filtered by SMX. For more information visit 
smxemail.com<http://smxemail.com/>



--

Steven J. Baskauf, Ph.D., Senior Lecturer

Vanderbilt University Dept. of Biological Sciences



postal mail address:

PMB 351634

Nashville, TN  37235-1634,  U.S.A.



delivery address:

2125 Stevenson Center

1161 21st Ave., S.

Nashville, TN 37235



office: 2128 Stevenson Center

phone: (615) 343-4582,  fax: (615) 322-4942

If you fax, please phone or email so that I will know to look for it.

http://bioimages.vanderbilt.edu

http://vanderbilt.edu/trees



+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Visit the Te Papa website http://www.tepapa.govt.nz
The email message together with the accompanying attachments may be
CONFIDENTIAL. If you have received this message in error, please notify 
https://www.tepapa.govt.nz/about/contact-us/general-enquiries immediately and 
delete the original message. The views expressed in this message are 
those of the individual sender, except where the sender specifically 
states them to be views of Te Papa. Te Papa employs strict virus 
checking measures and accepts no liability for any loss caused either
directly or indirectly by a virus arising from the use of this message
or any attached file.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

______________________________________________________________________________

This email has been filtered by SMX.
For more information visit http://smxemail.com
______________________________________________________________________________

_______________________________________________
tdwg-content mailing list
[email protected]
http://lists.tdwg.org/mailman/listinfo/tdwg-content

Re: [tdwg-content] Implementing Darwin Core in RDF

Reply via email to