Thanks, Jimmy! On Mon, Oct 12, 2015 at 2:42 PM, Jimmy Eng <jke...@gmail.com> wrote:
> Actually not only was the indexOffset off but the scan offsets were off > too. For this particular file, the offsets go bad at scan 38348, likely > where the file hits the 2GB size. > > <offset id="38345">2147400855</offset> > <offset id="38346">2147431719</offset> > <offset id="38347">2147457995</offset> > <offset id="38348">3</offset> > <offset id="38349">3</offset> > <offset id="38350">3</offset> > <offset id="38351">3</offset> > <offset id="38352">3</offset> > > > On Mon, Oct 12, 2015 at 2:28 PM, Jimmy Eng <jke...@gmail.com> wrote: > >> I'll reply on-list to give some closure to this thread for anyone >> interested in the problem. >> >> I asked Daniel to send me one of his >2GB mzXML files to take a look at >> myself. If it wasn't a Windows 32-bit binary issue, I suspected that the >> problem wasn't with Tandem, Comet or the TPP tools (as I recall we dealt >> with large files years ago) but rather it was with the conversion program >> itself. >> >> Running "tail t1.mzXML" returned: >> >> <offset id="41225">3</offset> >> <offset id="41226">3</offset> >> <offset id="41227">3</offset> >> <offset id="41228">3</offset> >> <offset id="41229">3</offset> >> <offset id="41230">3</offset> >> </index> >> <indexOffset>1</indexOffset> >> <sha1>23609787a67e3997d93fc3e1bcde4015474eeae6</sha1> >> </mzXML> >> >> And the first thing that jumps out is the indexOffset value is completely >> wrong and this would cause all tools to not be able to read this file. A >> quick fix is to run the TPP's "indexmzXML" tool on this file to re-index >> the file which will also generate a correct index offset value: >> >> indexmzXML t1.mzXML >> >> After running this command and re-naming the generated "t1.mzXML.new" >> file, I was able to read the mzXML file using both readmzXML and Comet. >> Anyways, something needs to be fixed with qtofpeakpicker to write correct >> >2GB mzXML files. Minimally it needs to be a 64-bit binary. A feasible >> but poor workaround is to simply run indexmzXML on each file. >> >> - Jimmy >> >> >> On Mon, Oct 12, 2015 at 10:55 AM, Daniel Hyduke <danielhyd...@gmail.com> >> wrote: >> >>> Thanks for the response Jimmy. I had compiled comet and the whole tpp >>> pipeline in a 64-bit environment (CENTOS GNU/Linux 7 both with gcc 4.8.3 >>> and gcc 6.0.0) and the problem persisted. Just to verify, I downloaded the >>> recent binaries and source for comet (2015021) and tried them, but the >>> problem persisted. >>> >>> Basically, what it feels like is that there is something in the mzXML >>> parsing portion that checks the file size (or an index or something) and >>> uses int (instead of int64). GNU/Linux is LP64 and MS Windows is LLP64 >>> both of which use a 32-bit representation for int ( >>> http://www.unix.org/whitepapers/64bit.html >>> https://en.wikipedia.org/wiki/64-bit_computing), so even if you use a >>> 64-bit compiler on a 64-bit system your int will still be 32-bit (unless I >>> misunderstand something) and as far as I know, there's no way to tell gcc >>> to substitute int64 for int. >>> >>> If there aren't any ideas on where to start in the codebase, I'll start >>> digging in with gdb >>> >>> >>> >>> >>> >>> On Fri, Oct 9, 2015 at 1:08 PM, Jimmy Eng <jke...@gmail.com> wrote: >>> >>>> Daniel, >>>> >>>> The issue you're seeing might be due to TPP windows programs compiled >>>> as 32-bit binaries. The first thing I'd try is grabbing a 64-bit binary of >>>> one of the tools to see if that fixes things. You can grab a 64-bit Comet >>>> binary from its SourceForge download site if you want to test this. I wish >>>> I could tell you definitively that the 64-bit Comet binary will work for >>>> you but I just don't have access to files >2GB to test with. >>>> >>>> >>>> On Fri, Oct 9, 2015 at 11:09 AM, Daniel Hyduke <danielhyd...@gmail.com> >>>> wrote: >>>> >>>>> I've recently noticed some failures when using comet and xtandem! from >>>>> TPP to search some centroided DDA files generated by qtofpeakpicker. >>>>> These >>>>> failures were all associated with files > 2GB. I was able to reduce the >>>>> file size by increasing the threshold which then lead to readmzXML, >>>>> tandem, >>>>> and comet actually reading the files and searching them. I'm guessing >>>>> that >>>>> there's a place in the TPP mzXML code that uses an int (which is 32-bit) >>>>> that's causing this problem. >>>>> >>>>> I've used TPP 4.8.0 on Windows 7 (installed via the exe provided on >>>>> sourceforge) and built the tpp from the svn on GNU/Linux (centos 7) and >>>>> encountered the same problem. >>>>> >>>>> I was wondering if anybody had any thoughts on where I should start >>>>> sifting through the code? >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "spctools-discuss" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to spctools-discuss+unsubscr...@googlegroups.com. >>>>> To post to this group, send email to spctools-discuss@googlegroups.com >>>>> . >>>>> Visit this group at http://groups.google.com/group/spctools-discuss. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "spctools-discuss" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/spctools-discuss/-3-ppv8-gVE/unsubscribe >>>> . >>>> To unsubscribe from this group and all its topics, send an email to >>>> spctools-discuss+unsubscr...@googlegroups.com. >>>> To post to this group, send email to spctools-discuss@googlegroups.com. >>>> Visit this group at http://groups.google.com/group/spctools-discuss. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "spctools-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to spctools-discuss+unsubscr...@googlegroups.com. >>> To post to this group, send email to spctools-discuss@googlegroups.com. >>> Visit this group at http://groups.google.com/group/spctools-discuss. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> > -- > You received this message because you are subscribed to a topic in the > Google Groups "spctools-discuss" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/spctools-discuss/-3-ppv8-gVE/unsubscribe > . > To unsubscribe from this group and all its topics, send an email to > spctools-discuss+unsubscr...@googlegroups.com. > To post to this group, send email to spctools-discuss@googlegroups.com. > Visit this group at http://groups.google.com/group/spctools-discuss. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discuss+unsubscr...@googlegroups.com. To post to this group, send email to spctools-discuss@googlegroups.com. Visit this group at http://groups.google.com/group/spctools-discuss. For more options, visit https://groups.google.com/d/optout.