I'll reply on-list to give some closure to this thread for anyone
interested in the problem.

I asked Daniel to send me one of his >2GB mzXML files to take a look at
myself.  If it wasn't a Windows 32-bit binary issue, I suspected that the
problem wasn't with Tandem, Comet or the TPP tools (as I recall we dealt
with large files years ago) but rather it was with the conversion program
itself.

Running "tail t1.mzXML" returned:

       <offset id="41225">3</offset>
       <offset id="41226">3</offset>
       <offset id="41227">3</offset>
       <offset id="41228">3</offset>
       <offset id="41229">3</offset>
       <offset id="41230">3</offset>
     </index>
     <indexOffset>1</indexOffset>
     <sha1>23609787a67e3997d93fc3e1bcde4015474eeae6</sha1>
   </mzXML>

And the first thing that jumps out is the indexOffset value is completely
wrong and this would cause all tools to not be able to read this file.  A
quick fix is to run the TPP's "indexmzXML" tool on this file to re-index
the file which will also generate a correct index offset value:

   indexmzXML t1.mzXML

After running this command and re-naming the generated "t1.mzXML.new" file,
I was able to read the mzXML file using both readmzXML and Comet.  Anyways,
something needs to be fixed with qtofpeakpicker to write correct >2GB mzXML
files.  Minimally it needs to be a 64-bit binary.  A feasible but poor
workaround is to simply run indexmzXML on each file.

- Jimmy


On Mon, Oct 12, 2015 at 10:55 AM, Daniel Hyduke <danielhyd...@gmail.com>
wrote:

> Thanks for the response Jimmy.  I had compiled comet and the whole tpp
> pipeline in a 64-bit environment (CENTOS GNU/Linux 7 both with gcc 4.8.3
> and gcc 6.0.0) and the problem persisted.  Just to verify, I downloaded the
> recent binaries and source for comet (2015021) and tried them, but the
> problem persisted.
>
> Basically, what it feels like is that there is something in the mzXML
> parsing portion that checks the file size (or an index or something) and
> uses int (instead of int64).  GNU/Linux is LP64 and MS Windows is LLP64
> both of which use a 32-bit representation for int (
> http://www.unix.org/whitepapers/64bit.html
> https://en.wikipedia.org/wiki/64-bit_computing), so even if you use a
> 64-bit compiler on a 64-bit system your int will still be 32-bit (unless I
> misunderstand something) and as far as I know, there's no way to tell gcc
> to substitute int64 for int.
>
> If there aren't any ideas on where to start in the codebase, I'll start
> digging in with gdb
>
>
>
>
>
> On Fri, Oct 9, 2015 at 1:08 PM, Jimmy Eng <jke...@gmail.com> wrote:
>
>> Daniel,
>>
>> The issue you're seeing might be due to TPP windows programs compiled as
>> 32-bit binaries.  The first thing I'd try is grabbing a 64-bit binary of
>> one of the tools to see if that fixes things.  You can grab a 64-bit Comet
>> binary from its SourceForge download site if you want to test this.  I wish
>> I could tell you definitively that the 64-bit Comet binary will work for
>> you but I just don't have access to files >2GB to test with.
>>
>>
>> On Fri, Oct 9, 2015 at 11:09 AM, Daniel Hyduke <danielhyd...@gmail.com>
>> wrote:
>>
>>> I've recently noticed some failures when using comet and xtandem! from
>>> TPP to search some centroided DDA files generated by qtofpeakpicker.  These
>>> failures were all associated with files > 2GB.  I was able to reduce the
>>> file size by increasing the threshold which then lead to readmzXML, tandem,
>>> and comet actually reading the files and searching them.  I'm guessing that
>>> there's a place in the TPP mzXML code that uses an int (which is 32-bit)
>>> that's causing this problem.
>>>
>>> I've used TPP 4.8.0 on Windows 7 (installed via the exe provided on
>>> sourceforge) and built the tpp from the svn on GNU/Linux (centos 7) and
>>> encountered the same problem.
>>>
>>> I was wondering if anybody had any thoughts on where I should start
>>> sifting through the code?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "spctools-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to spctools-discuss+unsubscr...@googlegroups.com.
>>> To post to this group, send email to spctools-discuss@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/spctools-discuss.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "spctools-discuss" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/spctools-discuss/-3-ppv8-gVE/unsubscribe
>> .
>> To unsubscribe from this group and all its topics, send an email to
>> spctools-discuss+unsubscr...@googlegroups.com.
>> To post to this group, send email to spctools-discuss@googlegroups.com.
>> Visit this group at http://groups.google.com/group/spctools-discuss.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "spctools-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to spctools-discuss+unsubscr...@googlegroups.com.
> To post to this group, send email to spctools-discuss@googlegroups.com.
> Visit this group at http://groups.google.com/group/spctools-discuss.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to spctools-discuss+unsubscr...@googlegroups.com.
To post to this group, send email to spctools-discuss@googlegroups.com.
Visit this group at http://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/d/optout.

Reply via email to