Actually not only was the indexOffset off but the scan offsets were off
too.  For this particular file, the offsets go bad at scan 38348, likely
where the file hits the 2GB size.

    <offset id="38345">2147400855</offset>
    <offset id="38346">2147431719</offset>
    <offset id="38347">2147457995</offset>
    <offset id="38348">3</offset>
    <offset id="38349">3</offset>
    <offset id="38350">3</offset>
    <offset id="38351">3</offset>
    <offset id="38352">3</offset>


On Mon, Oct 12, 2015 at 2:28 PM, Jimmy Eng <jke...@gmail.com> wrote:

> I'll reply on-list to give some closure to this thread for anyone
> interested in the problem.
>
> I asked Daniel to send me one of his >2GB mzXML files to take a look at
> myself.  If it wasn't a Windows 32-bit binary issue, I suspected that the
> problem wasn't with Tandem, Comet or the TPP tools (as I recall we dealt
> with large files years ago) but rather it was with the conversion program
> itself.
>
> Running "tail t1.mzXML" returned:
>
>        <offset id="41225">3</offset>
>        <offset id="41226">3</offset>
>        <offset id="41227">3</offset>
>        <offset id="41228">3</offset>
>        <offset id="41229">3</offset>
>        <offset id="41230">3</offset>
>      </index>
>      <indexOffset>1</indexOffset>
>      <sha1>23609787a67e3997d93fc3e1bcde4015474eeae6</sha1>
>    </mzXML>
>
> And the first thing that jumps out is the indexOffset value is completely
> wrong and this would cause all tools to not be able to read this file.  A
> quick fix is to run the TPP's "indexmzXML" tool on this file to re-index
> the file which will also generate a correct index offset value:
>
>    indexmzXML t1.mzXML
>
> After running this command and re-naming the generated "t1.mzXML.new"
> file, I was able to read the mzXML file using both readmzXML and Comet.
> Anyways, something needs to be fixed with qtofpeakpicker to write correct
> >2GB mzXML files.  Minimally it needs to be a 64-bit binary.  A feasible
> but poor workaround is to simply run indexmzXML on each file.
>
> - Jimmy
>
>
> On Mon, Oct 12, 2015 at 10:55 AM, Daniel Hyduke <danielhyd...@gmail.com>
> wrote:
>
>> Thanks for the response Jimmy.  I had compiled comet and the whole tpp
>> pipeline in a 64-bit environment (CENTOS GNU/Linux 7 both with gcc 4.8.3
>> and gcc 6.0.0) and the problem persisted.  Just to verify, I downloaded the
>> recent binaries and source for comet (2015021) and tried them, but the
>> problem persisted.
>>
>> Basically, what it feels like is that there is something in the mzXML
>> parsing portion that checks the file size (or an index or something) and
>> uses int (instead of int64).  GNU/Linux is LP64 and MS Windows is LLP64
>> both of which use a 32-bit representation for int (
>> http://www.unix.org/whitepapers/64bit.html
>> https://en.wikipedia.org/wiki/64-bit_computing), so even if you use a
>> 64-bit compiler on a 64-bit system your int will still be 32-bit (unless I
>> misunderstand something) and as far as I know, there's no way to tell gcc
>> to substitute int64 for int.
>>
>> If there aren't any ideas on where to start in the codebase, I'll start
>> digging in with gdb
>>
>>
>>
>>
>>
>> On Fri, Oct 9, 2015 at 1:08 PM, Jimmy Eng <jke...@gmail.com> wrote:
>>
>>> Daniel,
>>>
>>> The issue you're seeing might be due to TPP windows programs compiled as
>>> 32-bit binaries.  The first thing I'd try is grabbing a 64-bit binary of
>>> one of the tools to see if that fixes things.  You can grab a 64-bit Comet
>>> binary from its SourceForge download site if you want to test this.  I wish
>>> I could tell you definitively that the 64-bit Comet binary will work for
>>> you but I just don't have access to files >2GB to test with.
>>>
>>>
>>> On Fri, Oct 9, 2015 at 11:09 AM, Daniel Hyduke <danielhyd...@gmail.com>
>>> wrote:
>>>
>>>> I've recently noticed some failures when using comet and xtandem! from
>>>> TPP to search some centroided DDA files generated by qtofpeakpicker.  These
>>>> failures were all associated with files > 2GB.  I was able to reduce the
>>>> file size by increasing the threshold which then lead to readmzXML, tandem,
>>>> and comet actually reading the files and searching them.  I'm guessing that
>>>> there's a place in the TPP mzXML code that uses an int (which is 32-bit)
>>>> that's causing this problem.
>>>>
>>>> I've used TPP 4.8.0 on Windows 7 (installed via the exe provided on
>>>> sourceforge) and built the tpp from the svn on GNU/Linux (centos 7) and
>>>> encountered the same problem.
>>>>
>>>> I was wondering if anybody had any thoughts on where I should start
>>>> sifting through the code?
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "spctools-discuss" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to spctools-discuss+unsubscr...@googlegroups.com.
>>>> To post to this group, send email to spctools-discuss@googlegroups.com.
>>>> Visit this group at http://groups.google.com/group/spctools-discuss.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "spctools-discuss" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/spctools-discuss/-3-ppv8-gVE/unsubscribe
>>> .
>>> To unsubscribe from this group and all its topics, send an email to
>>> spctools-discuss+unsubscr...@googlegroups.com.
>>> To post to this group, send email to spctools-discuss@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/spctools-discuss.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "spctools-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to spctools-discuss+unsubscr...@googlegroups.com.
>> To post to this group, send email to spctools-discuss@googlegroups.com.
>> Visit this group at http://groups.google.com/group/spctools-discuss.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to spctools-discuss+unsubscr...@googlegroups.com.
To post to this group, send email to spctools-discuss@googlegroups.com.
Visit this group at http://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/d/optout.

Reply via email to