Ben,

Have you done anything special to handle the scan numbers (which
presumably are not consecutive anymore starting from scan 1) and the
scan index?  If not, address those and re-test or find out if those
are important for MSGF-db.

On Thu, Jun 14, 2012 at 10:02 AM, Ben Temperton <btemper...@gmail.com> wrote:
> Hi there,
>
> I am trying to pull out a subset of data from an mzXML file to run against a
> database using MSGF-db (for instance, to re-run any non matching spectra
> against the database searching for phosphorylation). To generate the subset
> I am currently using:
>
> import lxml.etree as le
> SASHIMI_NAMESPACE =
> 'http://sashimi.sourceforge.net/schema_revision/mzXML_3.1'
>
> def makeHQSpectraFile(spectraFile, spectraList, outputFile):
>     """Takes a spectra file, a list of scan ids to include and an output
> file as parameters"""
>     with open(spectraFile,'r') as f:
>         doc=le.parse(f)
>         root = doc.getroot()
>         for elem in doc.xpath('/t:mzXML/t:msRun/t:scan', namespaces={'t' :
> SASHIMI_NAMESPACE}):
>             if not elem.attrib['num'] in spectraList:
>                 parent=elem.getparent()
>                 parent.remove(elem)
>         for elem in doc.xpath('/t:mzXML/t:index/t:offset', namespaces={'t' :
> SASHIMI_NAMESPACE}):
>             if not elem.attrib['id'] in spectraList:
>                 parent=elem.getparent()
>                 parent.remove(elem)
>     handle = open(outputFile, 'wb')
>     handle.write(le.tostring(doc) + '\n')
>     handle.close()
>
> However, when I run MSGF-db on the new file it throws a:
>
> Reading spectra...
> javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1]
> Message: Premature end of file.
> at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown
> Source)
> at
> org.systemsbiology.jrap.stax.IndexParser.parseIndexes(IndexParser.java:176)
> at
> org.systemsbiology.jrap.stax.MSXMLParser.randomInits(MSXMLParser.java:117)
> at org.systemsbiology.jrap.stax.MSXMLParser.<init>(MSXMLParser.java:134)
> at parser.MzXMLSpectraMap.<init>(MzXMLSpectraMap.java:39)
> at parser.MzXMLSpectraIterator.<init>(MzXMLSpectraIterator.java:36)
> at parser.MzXMLSpectraIterator.<init>(MzXMLSpectraIterator.java:26)
> at ui.MSGFDB.runMSGFDB(MSGFDB.java:269)
> at ui.MSGFDB.runMSGFDB(MSGFDB.java:106)
> at ui.MSGFDB.main(MSGFDB.java:82)
>
> Whilst the original (non-parsed version) works fine. I can't get the
> mzXMLValidator to work on our systems (see post
> here https://groups.google.com/d/msg/spctools-discuss/bAxu-In-ju4/z9_g3mdWSFcJ),
> so I was wondering if anyone else had ever encountered a similar issue and
> had any tips.
>
> Many thanks,
>
> Ben
>
> --
> You received this message because you are subscribed to the Google Groups
> "spctools-discuss" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/spctools-discuss/-/psC3ABG8sNcJ.
> To post to this group, send email to spctools-discuss@googlegroups.com.
> To unsubscribe from this group, send email to
> spctools-discuss+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/spctools-discuss?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To post to this group, send email to spctools-discuss@googlegroups.com.
To unsubscribe from this group, send email to 
spctools-discuss+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/spctools-discuss?hl=en.

Reply via email to