I'll reply on-list to give some closure to this thread for anyone interested in the problem.
I asked Daniel to send me one of his >2GB mzXML files to take a look at myself. If it wasn't a Windows 32-bit binary issue, I suspected that the problem wasn't with Tandem, Comet or the TPP tools (as I recall we dealt with large files years ago) but rather it was with the conversion program itself. Running "tail t1.mzXML" returned: <offset id="41225">3</offset> <offset id="41226">3</offset> <offset id="41227">3</offset> <offset id="41228">3</offset> <offset id="41229">3</offset> <offset id="41230">3</offset> </index> <indexOffset>1</indexOffset> <sha1>23609787a67e3997d93fc3e1bcde4015474eeae6</sha1> </mzXML> And the first thing that jumps out is the indexOffset value is completely wrong and this would cause all tools to not be able to read this file. A quick fix is to run the TPP's "indexmzXML" tool on this file to re-index the file which will also generate a correct index offset value: indexmzXML t1.mzXML After running this command and re-naming the generated "t1.mzXML.new" file, I was able to read the mzXML file using both readmzXML and Comet. Anyways, something needs to be fixed with qtofpeakpicker to write correct >2GB mzXML files. Minimally it needs to be a 64-bit binary. A feasible but poor workaround is to simply run indexmzXML on each file. - Jimmy On Mon, Oct 12, 2015 at 10:55 AM, Daniel Hyduke <danielhyd...@gmail.com> wrote: > Thanks for the response Jimmy. I had compiled comet and the whole tpp > pipeline in a 64-bit environment (CENTOS GNU/Linux 7 both with gcc 4.8.3 > and gcc 6.0.0) and the problem persisted. Just to verify, I downloaded the > recent binaries and source for comet (2015021) and tried them, but the > problem persisted. > > Basically, what it feels like is that there is something in the mzXML > parsing portion that checks the file size (or an index or something) and > uses int (instead of int64). GNU/Linux is LP64 and MS Windows is LLP64 > both of which use a 32-bit representation for int ( > http://www.unix.org/whitepapers/64bit.html > https://en.wikipedia.org/wiki/64-bit_computing), so even if you use a > 64-bit compiler on a 64-bit system your int will still be 32-bit (unless I > misunderstand something) and as far as I know, there's no way to tell gcc > to substitute int64 for int. > > If there aren't any ideas on where to start in the codebase, I'll start > digging in with gdb > > > > > > On Fri, Oct 9, 2015 at 1:08 PM, Jimmy Eng <jke...@gmail.com> wrote: > >> Daniel, >> >> The issue you're seeing might be due to TPP windows programs compiled as >> 32-bit binaries. The first thing I'd try is grabbing a 64-bit binary of >> one of the tools to see if that fixes things. You can grab a 64-bit Comet >> binary from its SourceForge download site if you want to test this. I wish >> I could tell you definitively that the 64-bit Comet binary will work for >> you but I just don't have access to files >2GB to test with. >> >> >> On Fri, Oct 9, 2015 at 11:09 AM, Daniel Hyduke <danielhyd...@gmail.com> >> wrote: >> >>> I've recently noticed some failures when using comet and xtandem! from >>> TPP to search some centroided DDA files generated by qtofpeakpicker. These >>> failures were all associated with files > 2GB. I was able to reduce the >>> file size by increasing the threshold which then lead to readmzXML, tandem, >>> and comet actually reading the files and searching them. I'm guessing that >>> there's a place in the TPP mzXML code that uses an int (which is 32-bit) >>> that's causing this problem. >>> >>> I've used TPP 4.8.0 on Windows 7 (installed via the exe provided on >>> sourceforge) and built the tpp from the svn on GNU/Linux (centos 7) and >>> encountered the same problem. >>> >>> I was wondering if anybody had any thoughts on where I should start >>> sifting through the code? >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "spctools-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to spctools-discuss+unsubscr...@googlegroups.com. >>> To post to this group, send email to spctools-discuss@googlegroups.com. >>> Visit this group at http://groups.google.com/group/spctools-discuss. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "spctools-discuss" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/spctools-discuss/-3-ppv8-gVE/unsubscribe >> . >> To unsubscribe from this group and all its topics, send an email to >> spctools-discuss+unsubscr...@googlegroups.com. >> To post to this group, send email to spctools-discuss@googlegroups.com. >> Visit this group at http://groups.google.com/group/spctools-discuss. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "spctools-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to spctools-discuss+unsubscr...@googlegroups.com. > To post to this group, send email to spctools-discuss@googlegroups.com. > Visit this group at http://groups.google.com/group/spctools-discuss. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discuss+unsubscr...@googlegroups.com. To post to this group, send email to spctools-discuss@googlegroups.com. Visit this group at http://groups.google.com/group/spctools-discuss. For more options, visit https://groups.google.com/d/optout.