Re: [spctools-discuss] Re: "base_name" constraint in pepXML

David Shteynberg Fri, 20 Nov 2009 10:51:20 -0800

I will try to reason this through using trying to think from the
original authors point of view.  The idea is that different searches
of the same data would happen in separate directories and the
base_name (full path to the data file) would identify one search of an
mzXML file representing one msms_run, and more than one search would
never happen in one directory on the same file.  Also it is natural to
keep all results from one search engine run on an mzXML file in one
place in the mzXML file. However, you could have more than one search
that references the same data and these don't necessarily have to be
placed together in the pepXML file.  Although the problem with this is
that you could have different paths to the same data and these would
all be listed.  In the iProphet tool (which combines results from
multiple searches of the same data), I don't look at either base_name
but rather the spectrum names themselves, with the combination of
experiment_label, which is a user specified parameter that identifies
data from the same experiment.  The idea is that the combination of
experiment_label and spectrum name will uniquely identify a spectrum
searched.  I hope this is helpful.  Let us know if you have other
questions.


-David



On Fri, Nov 20, 2009 at 10:30 AM, David Shteynberg
<dshteynb...@systemsbiology.org> wrote:
> OK I take that back.  I see where the unique constraint is listed.  I
> will have to consider your questions further.
>
> -David
>
> On Fri, Nov 20, 2009 at 10:27 AM, David Shteynberg
> <dshteynb...@systemsbiology.org> wrote:
>> Hi Hendrik,
>>
>> The element msms_pipeline_analysis/msms_run_summary has an attribute
>> base_name to specify the path to the datafile.  In case the searched
>> file specified is different from the original data file there is
>> another entry in the element
>> msms_pipeline_analysis/msms_run_summary/search_summary for base_name.
>> As far as I know, there is nothing in the schema that requires these
>> to be unique in the pepXML file. Can you point me to where this
>> constraint is specified in the schema.  I checked version 1.8.
>>
>> -David
>>
>> On Fri, Nov 13, 2009 at 12:31 AM, Eric Deutsch
>> <edeut...@systemsbiology.org> wrote:
>>>
>>>
>>> Hi Hendrik, I think we need to get an authoritative answer from David on
>>> this one. And he is currently traveling in the Land of the Finns. We will
>>> let/ask him to answer when he is next able.
>>>
>>> Regards,
>>> Eric
>>>
>>>
>>>> From: spctools-discuss@googlegroups.com [mailto:spctools-
>>>> disc...@googlegroups.com] On Behalf Of Hendrik Weisser
>>>>
>>>> Hi!
>>>>
>>>> I'm working on the pepXML parser in OpenMS. I've been confronted with
>>>> a type of pepXML file I hadn't seen before, where search results from
>>>> different search engines - but for the same experiment - were
>>>> collected in one file (with one "msms_run_summary" per search engine).
>>>> I've added (maybe prematurely) support for this to the OpenMS parser,
>>>> and then wanted to construct a simple pepXML file for testing
>>>> purposes.
>>>>
>>>> In doing so, I've now come across a constraint in the pepXML schema
>>>> (at least from v1.8 on) that says values of the "base_name" attribute
>>>> (supposed to contain the full path to the searched mzXML file) in the
>>>> "search_summary" element have to be unique within the document.
>>>> What is the rationale behind this constraint? Is it supposed to
>>>> prevent the above case, where different searches of the same
>>>> experiment end up in one file? Why would that be desirable/necessary?
>>>> (Also note that I can construct a valid and parseable pepXML file from
>>>> two different search runs of the same file if I change the path in
>>>> "base_name"...)
>>>>
>>>> In an earlier discussion (http://groups.google.com/group/spctools-
>>>> discuss/msg/7760dcda02877922?hl=en), it was mentioned that
>>>> "base_name"s in "msms_run_summary" elements had to be unique in the
>>>> document - however, as per the schema, that's not true. Also, the
>>>> "base_name" of an "msms_run_summary" is not tied to the "base_name" in
>>>> subordinate "search_summary"s. If there were such a constraint, it
>>>> would be impossible to have more than one "search_summary" under an
>>>> "msms_run_summary" - however, this is allowed in the schema.
>>>> When does it make sense to have different "base_name"s in an
>>>> "msms_run_summary" and its subordinate "search_summary"(s)? Judging
>>>> from the schema documentation and the files I've seen, it seems that
>>>> the values should be the same. On the other hand, why have the
>>>> attribute in both elements then?
>>>>
>>>> All this adds to my confusion about the appropriate use of
>>>> "base_name"...
>>>>
>>>> I would be happy if someone could clear things up for me.
>>>>
>>>>
>>>> Best regards
>>>>
>>>> Hendrik
>>>>
>>>>
>>>
>>>
>>> --~--~---------~--~----~------------~-------~--~----~
>>> You received this message because you are subscribed to the Google Groups 
>>> "spctools-discuss" group.
>>> To post to this group, send email to spctools-discuss@googlegroups.com
>>> To unsubscribe from this group, send email to 
>>> spctools-discuss+unsubscr...@googlegroups.com
>>> For more options, visit this group at 
>>> http://groups.google.com/group/spctools-discuss?hl=en
>>> -~----------~----~----~----~------~----~------~--~---
>>>
>>>
>>
>

--

You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To post to this group, send email to spctools-disc...@googlegroups.com.
To unsubscribe from this group, send email to 
spctools-discuss+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/spctools-discuss?hl=.

Re: [spctools-discuss] Re: "base_name" constraint in pepXML

Reply via email to