[spctools-discuss] Re: Issue with pepXML generation

2009-11-11 Thread Brian Pratt
No worries, a corrected Mascot2XML will be in the next TPP release. Brian On Wed, Nov 11, 2009 at 3:21 PM, Simon Michnowicz < simon.michnow...@gmail.com> wrote: > > > Unfortunately we have no control over what goes in the FASTA > databases! Matrix Science's pepXML generation code escapes the XML

[spctools-discuss] Re: Issue with pepXML generation

2009-11-11 Thread Simon Michnowicz
Unfortunately we have no control over what goes in the FASTA databases! Matrix Science's pepXML generation code escapes the XML if ($thisScript->param($urlParams{'prot_desc'})) { $prot_desc = &noXmlTag(&mustGetProteinDescription ($protein_list[0], \%fastaTitles)); } Where no

[spctools-discuss] Re: Issue with pepXML generation

2009-11-11 Thread Jimmy Eng
I'll add the substitutions to the getdb.* scripts in the TPP src/util directory. > Should a substitution be added to the IPI retrieval utility scripts in > the TPP distribution so that the problem doesn't show it's face if > they are being used? --~--~-~--~~~---~--~--

[spctools-discuss] Re: Issue with pepXML generation

2009-11-11 Thread Brian Pratt
Well, I'll go ahead and modify the mascot converter to emit proper XML for proteins with reserved XML characters, but it does sound like folks would do well to make that <> / [] substitution upstream from the search engines. The fact that the EBI IPI site does the substitution confirms my suspicion

[spctools-discuss] Re: Issue with pepXML generation

2009-11-11 Thread dctrud
Unfortunately the offending entries are present in commonly used public DBs. We recently bumped into exactly this problem, as there are 4 entries containing in the IPI human v3.66 fasta file: IPI00465120 Gene_Symbol=- 3-HSD 1 protein IPI00816409 Gene_Symbol=- V1 protein (Fragment) IPI00816761 Ge

[spctools-discuss] Re: Issue with pepXML generation

2009-11-11 Thread Brian Pratt
Yes, one would want to escape everything properly - happily there's a library call for that. And certainly it's only right to emit valid XML. But I do think that it might be wisest to sidestep the whole mess - it's valid FASTA but also unconventional (based on many years of TPP not bumping into t

[spctools-discuss] Re: Issue with pepXML generation

2009-11-11 Thread Matthew Chambers
What about the other reserved characters in XML that are valid in FASTA? " ' & Not escaping could also break downstream software - especially with & which should always begin an escape sequence. :( -Matt Brian Pratt wrote: > Granted, this is a defect - but that's still an unfortunate choice o

[spctools-discuss] Re: Issue with pepXML generation

2009-11-11 Thread Brian Pratt
Granted, this is a defect - but that's still an unfortunate choice of characters. Even with the correction I can imagine this tripping up other software downstream since the properly escaped XML would no longer match the FASTA on a literal basis. I don't suppose your users could be induced to use