Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-22 Thread Eric Jain
Phillip Lord wrote: Well, swissprot refers to isoforms I think. Push comes to shove, just use the sequence. Note that we do have stable identifiers for isoforms, for example in http://beta.uniprot.org/uniprot/P00750.rdf you can find URIs for the isoforms we describe, e.g. http://purl.unipro

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-21 Thread Phillip Lord
> "Alan" == Alan Ruttenberg <[EMAIL PROTECTED]> writes: Alan> Well, if I am restricted to using such Uniprot classes I will have Alan> trouble representing important scientific findings. If Uniprot only Alan> has one name for the two molecules, one of which has a snp that leads Alan>

RE: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-20 Thread Skinner, Karen (NIH/NIDA) [E]
les/NOT-GM-07-108.html Karen Skinner NIDA/NIH -Original Message- From: Eric Jain [mailto:[EMAIL PROTECTED] Sent: Friday, July 20, 2007 11:56 AM To: Alan Ruttenberg Cc: Phillip Lord; Matthias Samwald; public-semweb-lifesci@w3.org Subject: Re: Ambiguous names. was: Re: URL +1, LSID -1 Alan

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-20 Thread Eric Jain
Alan Ruttenberg wrote: "Remember that one of the reasons this came up was the claim that the Uniprot URI should be used to identify a set of real things." OK, I think that describes my current point of view. I get confused when I read statements that sound like "x means the same thing in in

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-20 Thread Alan Ruttenberg
On Jul 20, 2007, at 3:52 AM, Eric Jain wrote: Alan Ruttenberg wrote: Who's mission? Remember that one of the reasons this came up was the claim that the Uniprot URI identified the protein in the real world. Who claimed that? If we wanted to identify each protein in the real world we'd ha

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-20 Thread Eric Jain
Alan Ruttenberg wrote: Who's mission? Remember that one of the reasons this came up was the claim that the Uniprot URI identified the protein in the real world. Who claimed that? If we wanted to identify each protein in the real world we'd have to assign zillions of URIs just for the protein

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-19 Thread Alan Ruttenberg
Summary: Continued discussion of whether we need to have identifiers for protein classes in addition to those for records. Example finding is given to support my view that we do need them, in response to Phil's suggestion I examine my scenarios. [yah, I know I'm not being consistent

Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread Alan Ruttenberg
On Jul 19, 2007, at 4:16 AM, Eric Jain wrote: Alan Ruttenberg wrote: In that case, I would recommend that it is unwise to use Uniprot ids as identifiers of protein classes on the semantic web. Doing so would encourage exactly the kind of ambiguity that we need to avoid in order to write

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-19 Thread Alan Ruttenberg
On Jul 18, 2007, at 11:26 AM, Phillip Lord wrote: I think that there are many clear reasons for keeping statements about the informatics entities -- the database entries for example. No question about that. I totally agree. To do otherwise, runs the risk of enormous mission creep (always a

RE: protein entities (was Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread Michel_Dumontier
> Many post-translational modifications like glycosylation > (http://www.functionalglycomics.org/static/index.shtml)in proteins > fundamentally change the (functional) 'nature' of the protein (as also the > molecular structure of the protein in case of glycosylation through > addition of sugar cha

RE: protein entities (was Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread Michel_Dumontier
> An interesting issue, one of identity. What determines the identity of > a molecule, a protein in this case? I strongly believe that the identity of a molecule is only dependent on its physical (chemical) composition. > If you have a protein that becomes > phosphorylated, is the phosphoryla

Re: protein entities (was Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread Darren Natale
Quite a nice example! These are the sorts of issues that we must contend with while creating the PRO framework. In fact, this addresses another issue of scope; that is, whether or not (in the long or short term) to also account for homodimers, trimers, and so on (currently, GO handles heter

RE: protein entities (was Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread SATYA SANKET SAHOO
Original message >Date: Thu, 19 Jul 2007 16:29:18 -0400 >From: Michel_Dumontier <[EMAIL PROTECTED]> >Subject: RE: protein entities (was Re: Rules (was Re: Ambiguous names. was: >Re: URL +1, LSID -1) >To: Darren Natale <[EMAIL PROTECTED]>, Michel_Dumont

Re: protein entities (was Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread June Kinoshita
If I may put forward a key protein in Alzheimer disease as an example that we are grappling with, there is full-length APP (which itself has a number of forms as well as mutations); various peptides derived from cleavage of APP; and then multimeric forms of the peptides, particularly Abet

Re: protein entities (was Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread Waclaw Kusnierczyk
Michel_Dumontier wrote: Darren, Also, while we recognize that there are different qualities that can be ascribed to a basically identical biochemical entity in different structural conformations or states of ligand binding, we are not attempting (at least in the beginning) to describe these st

RE: protein entities (was Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread Michel_Dumontier
Darren, > Also, while we recognize > that there are different qualities that can be ascribed to a basically > identical biochemical entity in different structural conformations or > states of ligand binding, we are not attempting (at least in the > beginning) to describe these structural conforma

Re: protein entities (was Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread Waclaw Kusnierczyk
Michel_Dumontier wrote: Sequence form is again a placeholder term ... ... distinguish between a phosphorylated version of a protein and the non-phosphorylated version (as an example). The need for the latter derives from the fact that the two versions might have different functions. Inde

Re: protein entities (was Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread Darren Natale
as Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1) We don't yet have formal definitions for many of the classes and relations (the effort only began in earnest a few months ago). But, basically, there is a distinction made between the full-length (in terms of amino acid sequence) p

RE: protein entities (was Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread Michel_Dumontier
9, 2007 11:24 AM > To: Eric Jain > Cc: Alan Ruttenberg; Chris Mungall; Bijan Parsia; public-semweb-lifesci > hcls > Subject: Re: protein entities (was Re: Rules (was Re: Ambiguous names. > was: Re: URL +1, LSID -1) > > > We don't yet have formal definitions for many of t

protein entities (was Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread Darren Natale
Thank you Chris for including me on this thread. I can well see why you did so! We recently began a new Protein Ontology (PRO) effort geared precisely toward the formal definition of the "smaller entities" referred to by Alan. By "we" I mean the PRO Consortium, comprising the PIs Cathy Wu

Re: protein entities (was Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread Darren Natale
We don't yet have formal definitions for many of the classes and relations (the effort only began in earnest a few months ago). But, basically, there is a distinction made between the full-length (in terms of amino acid sequence) protein and the sub-length parts of proteins (commonly called

Re: protein entities (was Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread Darren Natale
Protein, in this scheme, is the amino acid polymer produced by a translation process using an mRNA as a template. I suppose this excludes peptides (also amino acid polymers) that are produced non-ribosomally, but perhaps that is okay for the time being. The precise definition will be constr

Re: protein entities (was Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread Eric Jain
Darren Natale wrote: We don't yet have formal definitions for many of the classes and relations (the effort only began in earnest a few months ago). But, basically, there is a distinction made between the full-length (in terms of amino acid sequence) protein and the sub-length parts of protei

Re: protein entities (was Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread Eric Jain
Darren Natale wrote: We recently began a new Protein Ontology (PRO) effort geared precisely toward the formal definition of the "smaller entities" referred to by Alan. By "we" I mean the PRO Consortium, comprising the PIs Cathy Wu of PIR (which is also a member organization of the UniProt Con

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-19 Thread Phillip Lord
> "Alan" == Alan Ruttenberg <[EMAIL PROTECTED]> writes: Alan> Summary: Answering Phil's questions, and clarifying one thing he Alan> asserts about what I said. >> What if they have a polymorphism? Alan> No. >> Are two isoforms from an alternate splice the same protein? Alan> No.

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-19 Thread Phillip Lord
> "MS" == Matthias Samwald <[EMAIL PROTECTED]> writes: >>  It would be more satisfying for us to know intentionally what we  mean >> by "protein". It would be good to have a clear set of  definitions. But, >> ultimately, I think it would be mistaken. If we  have the ability to >> expr

Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-19 Thread Eric Jain
Alan Ruttenberg wrote: In that case, I would recommend that it is unwise to use Uniprot ids as identifiers of protein classes on the semantic web. Doing so would encourage exactly the kind of ambiguity that we need to avoid in order to write statements that will not confuse semantic web agent

Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-18 Thread Alan Ruttenberg
On Jul 18, 2007, at 6:02 AM, Xiaoshu Wang wrote: But please note, just because "http://purl.uniprot.org/core/ Protein" contains the string "Protein" does not make it the identifier for *Protein*, unless everyone else agrees to it I wouldn't have thought that http://purl.uniprot.org/core/Pro

Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-18 Thread Xiaoshu Wang
I agree with Alan but feel sympathy for Eric as well. In the absence of a universally accepted ontology for describing biological entities, Eric has to develop something to start working on SW. But please note, just because "http://purl.uniprot.org/core/Protein"; contains the string "Protei

Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-17 Thread Chris Mungall
On Jul 17, 2007, at 1:44 AM, Eric Jain wrote: Chris Mungall wrote: We have also switched from talk of defining specific proteins to rules to automatically annotate protein records. You're right, small digression, hope it's of interest anyway :-) Definitely - although I don't think OWL/SW

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-17 Thread Alan Ruttenberg
As EricJ's recent note confirmed, and as I suspected, the problem goes substantially deeper than the issue of simply punning the record and the protein class. The fundamental problem is that the record, having not been designed as the definition of something, isn't the *unambiguous* defin

Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-17 Thread Alan Ruttenberg
In that case, I would recommend that it is unwise to use Uniprot ids as identifiers of protein classes on the semantic web. Doing so would encourage exactly the kind of ambiguity that we need to avoid in order to write statements that will not confuse semantic web agents (including peopl

Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-17 Thread Eric Jain
Chris Mungall wrote: We have also switched from talk of defining specific proteins to rules to automatically annotate protein records. You're right, small digression, hope it's of interest anyway :-) I read "broad classes of proteins" as being more inclusive than the class denoted by OPSD_H

Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-17 Thread Eric Jain
Alan Ruttenberg wrote: To clarify, no, I didn't mean this. I meant that the definition of Uniprot records are already broad in the sense that sometimes multiple splice variants are included in a single record, as are population and disease-causing variants, according to Eric. Basically I don't

Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-16 Thread Alan Ruttenberg
Thanks for the elaboration Chris - as usual better expressed than when I tried :) One minor clarification: On Jul 16, 2007, at 11:24 PM, Chris Mungall wrote: I read "broad classes of proteins" as being more inclusive than the class denoted by OPSD_HUMAN in my interpretation, but also in

Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-16 Thread Chris Mungall
On Jul 16, 2007, at 10:29 AM, Eric Jain wrote: Bijan Parsia wrote: Eric, I would be very much interested in some more details about the sort of rules used and how they are used. I personally tend to distinguish between the use of rules in modeling and the use of rules for data munging t

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Waclaw Kusnierczyk
Matthias Samwald wrote: The evidence for what I point out is found everywhere: "P12345 is expressed in some tissues"... according to Alan's points, this would be a wrong statement. When the Semantic Web should really find widespread adoption, they would be saying something like "C12345 is

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Alan Ruttenberg
Summary: Answering Phil's questions, and clarifying one thing he asserts about what I said. On Jul 16, 2007, at 12:22 PM, Phillip Lord wrote: "Alan" == Alan Ruttenberg <[EMAIL PROTECTED]> writes: Take these rhethorical questions: I am interpreting these as questions of fact, that "same"

Re: Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-16 Thread Eric Jain
Bijan Parsia wrote: Eric, I would be very much interested in some more details about the sort of rules used and how they are used. I personally tend to distinguish between the use of rules in modeling and the use of rules for data munging tasks. Obviously, where you draw this boundary can be a

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Matthias Samwald
> It would be more satisfying for us to know intentionally what we > mean by "protein". It would be good to have a clear set of > definitions. But, ultimately, I think it would be mistaken. If we > have the ability to express "the class of protein molecules defined > by the swissprot record OPSD_

Rules (was Re: Ambiguous names. was: Re: URL +1, LSID -1)

2007-07-16 Thread Bijan Parsia
On Jul 16, 2007, at 5:53 PM, Eric Jain wrote: Alan Ruttenberg wrote: We've got a SW language for making definitions - it's called OWL. One thing I can say here is that there is the trend that curators create rules (and check the outcome) instead of adding data themselves directly. Unfort

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Eric Jain
Alan Ruttenberg wrote: We've got a SW language for making definitions - it's called OWL. One thing I can say here is that there is the trend that curators create rules (and check the outcome) instead of adding data themselves directly. Unfortunately OWL is insufficient for the kind of ugly r

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Phillip Lord
> "Alan" == Alan Ruttenberg <[EMAIL PROTECTED]> writes: >> I agree. The argument is that it's very hard to describe what you mean by >> a "protein". We almost certainly don't mean a protein molecule. We might >> mean a type of protein. But then we don't know whether two protein >> mol

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Alan Ruttenberg
I'm not advocating that we build definitions around protein sequences, just that we build definitions, period. And that we don't confuse a page of html with a definition. The uniprot curators are great! They know what they are looking for and they are skilled at finding it. Let's put work i

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Eric Jain
Alan Ruttenberg wrote: I'm confused. I think we all would agree that there are instances of proteins and we have a good idea of what they are. We also know that there are groups of proteins that are built off the same template and share certain properties. If we define classes using such prope

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Alan Ruttenberg
On Jul 16, 2007, at 10:19 AM, Phillip Lord wrote: "MK" == Marijke Keet <[EMAIL PROTECTED]> writes: MK> Lack of sufficient knowledge about a particular (biological) entity is MK> a sideshow, not an argument, to the issue of distinguishing real proteins from MK> their records. I

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Matthias Samwald
On Mon, 16 Jul 2007 16:09:03 +0200, Eric Jain wrote: > http://purl.uniprot.org/uniprot/P12345 does not represent any > physical object, but it is a useful generalization of certain > physical objects that you might find. That sounds like defining a class of physical objects to me. - Matthias

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Marijke Keet
Eric Neumann ha scritto: Alan, the life science community has for years applied an implicit transitivity to records of things, so that when many say: "http://purl.uniprot.org/uniprot/P12345 is expressed only in species homo sapien" they usually imply that "the protein referenced by data

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Phillip Lord
> "MK" == Marijke Keet <[EMAIL PROTECTED]> writes: MK> Lack of sufficient knowledge about a particular (biological) entity is MK> a sideshow, not an argument, to the issue of distinguishing real proteins from MK> their records. I agree. The argument is that it's very hard to describe

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Matthias Samwald
> Having worked directly with bench scientists for many years, they > view data and databases as "extensions" to what they are really > interested in.  Uhm, probably this differentiates molecular biology from classic, organismal biology (my background). I would never make such statements. > Yo

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Eric Jain
Matthias Samwald wrote: Well, they might talk like database entries and physical objects would be the same, but this is not what they *think*. With the Semantic Web / ontologies we want to capture the semantics and the actual thinking, not the linguistic / textual surface representations. http

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Eric Neumann
Not true Matthias. Having worked directly with bench scientists for many years, they view data and databases as "extensions" to what they are really interested in. Your example of "bank" and "bank" are disjoint and non-related; in the case of gene and gene-data-record there is a kind of c

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Matthias Samwald
> the life science community has for years applied an implicit > transitivity to records of things, so that when many say: > "http://purl.uniprot.org/uniprot/P12345 is expressed only in > species homo sapien"   > they usually imply that "the protein referenced by > datarecord:http://purl.uniprot.

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Eric Neumann
Alan, the life science community has for years applied an implicit transitivity to records of things, so that when many say: "http://purl.uniprot.org/uniprot/P12345 is expressed only in species homo sapien" they usually imply that "the protein referenced by datarecord:http:// purl.unipro

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Eric Jain
Waclaw Kusnierczyk wrote: Oh, no. If there are two proteins out there, they are two, and you have nothing to *decide* about that. You may fuse them in that or another way, but this does not change the fact that at the previous time there were two. "Out there" you'll find all kinds of molec

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Waclaw Kusnierczyk
sorry! i got confused by the response to response pattern. vQ Marijke Keet wrote: p.s.: I did not write that, that was Eric. I know I ought to have taken up that point as well, but then, my time is limited. regards, marijke Waclaw Kusnierczyk ha scritto: Marijke Keet wrote: The problem

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Mark Montgomery
ic-semweb-lifesci" ; "Mark Wilkinson" <[EMAIL PROTECTED]>; "Benjamin Good" <[EMAIL PROTECTED]>; "Natalia Villanueva Rosales" <[EMAIL PROTECTED]> Sent: Monday, July 16, 2007 4:52 AM Subject: Re: Ambiguous names. was: Re: URL +1, LSID -1 Mari

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Eric Jain
Waclaw Kusnierczyk wrote: sure. how can you determine that *two* entities are *one* entity? (they may become one, but that's a different story.) You mean how we *decide* that they should be a single "entity"? I'm afraid I can't tell, not because it's our trade-secret, but because such decisi

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Eric Jain
Marijke Keet wrote: "...due to lack of knowledge...": and I presume it may be that biologists disagree also because of insufficient knowledge about the protein, and/or its (over-)simplification, that is, comparing apples and oranges at a too coarse level of granularity. Moreover, that we don't

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Waclaw Kusnierczyk
Marijke Keet wrote: The problem with proteins is that I haven't seen any biologists agree on a general way to determine whether two proteins are the same or not, sure. how can you determine that *two* entities are *one* entity? (they may become one, but that's a different story.) vQ

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Marijke Keet
Eric Jain ha scritto: Marijke Keet wrote: just because proteins are smaller than persons does not make them into mere abstractions--thingies of your imagination that only materialise by means of their representations in some information system. proteins were around for quite a while before

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Eric Jain
Marijke Keet wrote: and by analogy, then there is no real Eric Jain, just a webpage with that name, a blog, an URL http://eric.jain.name/, some database records in the uniprot HR systems with a string "Eric Jain" and related data, email trails in the hcls archive and, well, any person that

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Marijke Keet
Eric Jain ha scritto: Alan Ruttenberg wrote: There are proteins, and there are records about proteins. Records come in different formats. If I make a statement using this url, is is about the record? or the protein? How should the agent come to know? The concept of "protein" is abstract eno

Re: Ambiguous names. was: Re: URL +1, LSID -1

2007-07-16 Thread Eric Jain
Alan Ruttenberg wrote: There are proteins, and there are records about proteins. Records come in different formats. If I make a statement using this url, is is about the record? or the protein? How should the agent come to know? The concept of "protein" is abstract enough that anything you mi