[back onlist]

On 12 March 2011 01:08, Joachim Baran <[email protected]> wrote:
>
> On 11-03-10 11:56 PM, "Peter Ansell" <[email protected]> wrote:
>>If neither the key or name are going to be URI's, then it would be
>>best to introduce another item to connect them using two RDF triples
>>with a shared URI subject.
>  Yes, that sounds like a reasonable approach. I could introduce a URI
> subject for each primary key in the mart tables (there is just one primary
> key from which everything else decents), then use predicates for each
> attribute in the mart and have the actual literal-values in the mart's
> tables as objects.
>
>  For example:
>  <biomart://dcc-dev.res.oicr.on.ca/pathway_config_1/kegg/238947>
> <http://...#pathway_id> "hsa05200"
>  <biomart://dcc-dev.res.oicr.on.ca/pathway_config_1/kegg/238947>
> <http://...#associated_gene_name> "TP53"

I didn't realise that the pathway_id (and other similar primary keys)
would be published identifiers. If there are permanent identifiers
provided by the dataset, such as "hsa05200" it is relatively easy to
construct a URI using that, without resorting to adding an identifier
like 238947 that may not be permanent.

<biomart://dcc-dev.res.oicr.on.ca/pathway_config_1/kegg/kegg_pathway/hsa05200>
<http://...#associated_gene_name> "TP53"

The issue of how to construct the subject URI is what is keeping RDF
out of general acceptance in science, so any well formed identifiable
URI is useful while there is still no consensus otherwise.

On the other hand though, BioMart may be in a good position to provide
a discussion table for this subject though. It is a great opportunity
to get a basic consensus and provide the software support for the
consensus given the large number of BioMart installations out there
that will soon be upgraded to 0.8 with this inbuilt RDF support
available. Ideally providers should be able to define a single
authoritative URL for each item and publish using it, but they haven't
been able to do this easily so far.

They may be able to easily define the URL format for each record in a
single place for each table, for example;

biomart-config:
    mart: kegg;
        table: kegg_pathway;
            Primary_URI_Structure:
biomart://dcc-dev.res.oicr.on.ca/pathway_config_1/kegg/kegg_pathway/${pathway_id}

where ${pathway_id} was replaced by that field for each record.

>  I have chosen the biomart:// URI to denote that this item is accessible
> within the mart, but not visible as such via HTTP-URL.
>
>  Alternatively, I could also use the same biomart:// URI for the objects
> too. Whilst looking bloated, do you think that would leave you with more
> opportunities how you would like to query the mart?
>
>  For example:
>  <biomart://.../238947> <http://...#pathway_id>
> <biomart://.../238947/hsa05200>
>  <biomart://.../238947> <http://...#associated_gene_name>
> <biomart://.../238947/TP53>

In my opinion it isn't useful to create URIs based on strings that
aren't designed as permanent unambiguous identifiers. For example, the
gene known as TP53 may change meaning, but hsa05200 is likely to
either stay as the same meaning or be completely deprecated rather
than have its meaning gradually change. This shouldn't need to be a
large part of the discussion as in relational databases we always have
primary key sets to fall back on and unique URIs can be created solely
based on them.

A mart could create URIs for other fields to either link to other
marts or other tables etc, but that shouldn't need to influence the
discussion of how to create the primary URI for each record.

>>Just to make sure I know what the situation is, could you give me some
>>sample data for these key name pairs?
>  The current situation is ambiguous, because I am just about to finish
> the automatic ontology generation for marts, whereas the SPARQL-results
> still return everything as literal. Let me know what you think of my two
> suggestion I made above, please feel free to contribute your own
> suggestions, and by the end of the (Canadian) day I can send you a mart
> URL where you can test the SPARQL interface.

Cheers,

Peter
_______________________________________________
Users mailing list
[email protected]
https://lists.biomart.org/mailman/listinfo/users

Reply via email to