Thanks, Jimmy!

On Mon, Oct 12, 2015 at 2:42 PM, Jimmy Eng <jke...@gmail.com> wrote:

> Actually not only was the indexOffset off but the scan offsets were off
> too.  For this particular file, the offsets go bad at scan 38348, likely
> where the file hits the 2GB size.
>
>     <offset id="38345">2147400855</offset>
>     <offset id="38346">2147431719</offset>
>     <offset id="38347">2147457995</offset>
>     <offset id="38348">3</offset>
>     <offset id="38349">3</offset>
>     <offset id="38350">3</offset>
>     <offset id="38351">3</offset>
>     <offset id="38352">3</offset>
>
>
> On Mon, Oct 12, 2015 at 2:28 PM, Jimmy Eng <jke...@gmail.com> wrote:
>
>> I'll reply on-list to give some closure to this thread for anyone
>> interested in the problem.
>>
>> I asked Daniel to send me one of his >2GB mzXML files to take a look at
>> myself.  If it wasn't a Windows 32-bit binary issue, I suspected that the
>> problem wasn't with Tandem, Comet or the TPP tools (as I recall we dealt
>> with large files years ago) but rather it was with the conversion program
>> itself.
>>
>> Running "tail t1.mzXML" returned:
>>
>>        <offset id="41225">3</offset>
>>        <offset id="41226">3</offset>
>>        <offset id="41227">3</offset>
>>        <offset id="41228">3</offset>
>>        <offset id="41229">3</offset>
>>        <offset id="41230">3</offset>
>>      </index>
>>      <indexOffset>1</indexOffset>
>>      <sha1>23609787a67e3997d93fc3e1bcde4015474eeae6</sha1>
>>    </mzXML>
>>
>> And the first thing that jumps out is the indexOffset value is completely
>> wrong and this would cause all tools to not be able to read this file.  A
>> quick fix is to run the TPP's "indexmzXML" tool on this file to re-index
>> the file which will also generate a correct index offset value:
>>
>>    indexmzXML t1.mzXML
>>
>> After running this command and re-naming the generated "t1.mzXML.new"
>> file, I was able to read the mzXML file using both readmzXML and Comet.
>> Anyways, something needs to be fixed with qtofpeakpicker to write correct
>> >2GB mzXML files.  Minimally it needs to be a 64-bit binary.  A feasible
>> but poor workaround is to simply run indexmzXML on each file.
>>
>> - Jimmy
>>
>>
>> On Mon, Oct 12, 2015 at 10:55 AM, Daniel Hyduke <danielhyd...@gmail.com>
>> wrote:
>>
>>> Thanks for the response Jimmy.  I had compiled comet and the whole tpp
>>> pipeline in a 64-bit environment (CENTOS GNU/Linux 7 both with gcc 4.8.3
>>> and gcc 6.0.0) and the problem persisted.  Just to verify, I downloaded the
>>> recent binaries and source for comet (2015021) and tried them, but the
>>> problem persisted.
>>>
>>> Basically, what it feels like is that there is something in the mzXML
>>> parsing portion that checks the file size (or an index or something) and
>>> uses int (instead of int64).  GNU/Linux is LP64 and MS Windows is LLP64
>>> both of which use a 32-bit representation for int (
>>> http://www.unix.org/whitepapers/64bit.html
>>> https://en.wikipedia.org/wiki/64-bit_computing), so even if you use a
>>> 64-bit compiler on a 64-bit system your int will still be 32-bit (unless I
>>> misunderstand something) and as far as I know, there's no way to tell gcc
>>> to substitute int64 for int.
>>>
>>> If there aren't any ideas on where to start in the codebase, I'll start
>>> digging in with gdb
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Oct 9, 2015 at 1:08 PM, Jimmy Eng <jke...@gmail.com> wrote:
>>>
>>>> Daniel,
>>>>
>>>> The issue you're seeing might be due to TPP windows programs compiled
>>>> as 32-bit binaries.  The first thing I'd try is grabbing a 64-bit binary of
>>>> one of the tools to see if that fixes things.  You can grab a 64-bit Comet
>>>> binary from its SourceForge download site if you want to test this.  I wish
>>>> I could tell you definitively that the 64-bit Comet binary will work for
>>>> you but I just don't have access to files >2GB to test with.
>>>>
>>>>
>>>> On Fri, Oct 9, 2015 at 11:09 AM, Daniel Hyduke <danielhyd...@gmail.com>
>>>> wrote:
>>>>
>>>>> I've recently noticed some failures when using comet and xtandem! from
>>>>> TPP to search some centroided DDA files generated by qtofpeakpicker.  
>>>>> These
>>>>> failures were all associated with files > 2GB.  I was able to reduce the
>>>>> file size by increasing the threshold which then lead to readmzXML, 
>>>>> tandem,
>>>>> and comet actually reading the files and searching them.  I'm guessing 
>>>>> that
>>>>> there's a place in the TPP mzXML code that uses an int (which is 32-bit)
>>>>> that's causing this problem.
>>>>>
>>>>> I've used TPP 4.8.0 on Windows 7 (installed via the exe provided on
>>>>> sourceforge) and built the tpp from the svn on GNU/Linux (centos 7) and
>>>>> encountered the same problem.
>>>>>
>>>>> I was wondering if anybody had any thoughts on where I should start
>>>>> sifting through the code?
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "spctools-discuss" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to spctools-discuss+unsubscr...@googlegroups.com.
>>>>> To post to this group, send email to spctools-discuss@googlegroups.com
>>>>> .
>>>>> Visit this group at http://groups.google.com/group/spctools-discuss.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "spctools-discuss" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/spctools-discuss/-3-ppv8-gVE/unsubscribe
>>>> .
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> spctools-discuss+unsubscr...@googlegroups.com.
>>>> To post to this group, send email to spctools-discuss@googlegroups.com.
>>>> Visit this group at http://groups.google.com/group/spctools-discuss.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "spctools-discuss" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to spctools-discuss+unsubscr...@googlegroups.com.
>>> To post to this group, send email to spctools-discuss@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/spctools-discuss.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "spctools-discuss" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/spctools-discuss/-3-ppv8-gVE/unsubscribe
> .
> To unsubscribe from this group and all its topics, send an email to
> spctools-discuss+unsubscr...@googlegroups.com.
> To post to this group, send email to spctools-discuss@googlegroups.com.
> Visit this group at http://groups.google.com/group/spctools-discuss.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to spctools-discuss+unsubscr...@googlegroups.com.
To post to this group, send email to spctools-discuss@googlegroups.com.
Visit this group at http://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/d/optout.

Reply via email to