Ben, Have you done anything special to handle the scan numbers (which presumably are not consecutive anymore starting from scan 1) and the scan index? If not, address those and re-test or find out if those are important for MSGF-db.
On Thu, Jun 14, 2012 at 10:02 AM, Ben Temperton <btemper...@gmail.com> wrote: > Hi there, > > I am trying to pull out a subset of data from an mzXML file to run against a > database using MSGF-db (for instance, to re-run any non matching spectra > against the database searching for phosphorylation). To generate the subset > I am currently using: > > import lxml.etree as le > SASHIMI_NAMESPACE = > 'http://sashimi.sourceforge.net/schema_revision/mzXML_3.1' > > def makeHQSpectraFile(spectraFile, spectraList, outputFile): > """Takes a spectra file, a list of scan ids to include and an output > file as parameters""" > with open(spectraFile,'r') as f: > doc=le.parse(f) > root = doc.getroot() > for elem in doc.xpath('/t:mzXML/t:msRun/t:scan', namespaces={'t' : > SASHIMI_NAMESPACE}): > if not elem.attrib['num'] in spectraList: > parent=elem.getparent() > parent.remove(elem) > for elem in doc.xpath('/t:mzXML/t:index/t:offset', namespaces={'t' : > SASHIMI_NAMESPACE}): > if not elem.attrib['id'] in spectraList: > parent=elem.getparent() > parent.remove(elem) > handle = open(outputFile, 'wb') > handle.write(le.tostring(doc) + '\n') > handle.close() > > However, when I run MSGF-db on the new file it throws a: > > Reading spectra... > javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1] > Message: Premature end of file. > at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(Unknown > Source) > at > org.systemsbiology.jrap.stax.IndexParser.parseIndexes(IndexParser.java:176) > at > org.systemsbiology.jrap.stax.MSXMLParser.randomInits(MSXMLParser.java:117) > at org.systemsbiology.jrap.stax.MSXMLParser.<init>(MSXMLParser.java:134) > at parser.MzXMLSpectraMap.<init>(MzXMLSpectraMap.java:39) > at parser.MzXMLSpectraIterator.<init>(MzXMLSpectraIterator.java:36) > at parser.MzXMLSpectraIterator.<init>(MzXMLSpectraIterator.java:26) > at ui.MSGFDB.runMSGFDB(MSGFDB.java:269) > at ui.MSGFDB.runMSGFDB(MSGFDB.java:106) > at ui.MSGFDB.main(MSGFDB.java:82) > > Whilst the original (non-parsed version) works fine. I can't get the > mzXMLValidator to work on our systems (see post > here https://groups.google.com/d/msg/spctools-discuss/bAxu-In-ju4/z9_g3mdWSFcJ), > so I was wondering if anyone else had ever encountered a similar issue and > had any tips. > > Many thanks, > > Ben > > -- > You received this message because you are subscribed to the Google Groups > "spctools-discuss" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/spctools-discuss/-/psC3ABG8sNcJ. > To post to this group, send email to spctools-discuss@googlegroups.com. > To unsubscribe from this group, send email to > spctools-discuss+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/spctools-discuss?hl=en. -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to spctools-discuss@googlegroups.com. To unsubscribe from this group, send email to spctools-discuss+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.