Not a problem! In the end I decided just to remove all of the titins from 
my database - it shouldn't have a huge effect on my results - and I was 
indeed able to run all of my datasets to completion. Thanks for all of your 
help!
Emily

On Friday, November 6, 2020 at 8:25:56 PM UTC-5 David Shteynberg wrote:

> Hello again Emily,
>
> Apologies for the delay but I needed a bit more time to look into this.  
>  You are absolutely right about the titins causing this issue.  The problem 
> is the significant overlap in peptides in this very large titin group.  
>  Your database contains 343 variations of titin with different SAAPs, 
> which share large subsets of the same peptides.  Calculating this 
> enormous protein group is certainly stressing the ProteinProphet 
> algorithm, forcing it into a higher-order polynomial time complexity 
> problem.  I was looking into the code to see if there was a simple way to 
> speed it up, but unfortunately this doesn't seem to be the case.  Is there 
> any way you can reduce the number of titin entries in your database?  Have 
> you considered using PEFF?
>
> Thanks,
> -David
>
> On Sat, Oct 24, 2020 at 10:48 PM Emily Kawaler <e.ka...@gmail.com> wrote:
>
>> Another update: I've pinpointed a much smaller database that reproduces 
>> the error when run with just 10OV - uploaded to the same folder as above, 
>> named "titins_revs.fasta" (it contains a bunch of titins and some reverse 
>> decoy sequences). Something in the titins is causing this error, I think 
>> (I've run this set of titins with a few different sets of reverse decoys so 
>> I don't think it's caused by the decoys). I also think there are a couple 
>> of other sequences in the database that may have the same effect, but if we 
>> can figure out what's doing it in this set, it should be easier to know 
>> what to look for. Any thoughts?
>
>
>>
>> On Friday, October 23, 2020 at 3:45:08 PM UTC-4 Emily Kawaler wrote:
>>
>>> Okay - When I ran the working set of spectra with the database that 
>>> failed, it seems to have failed; when I ran the set of spectra that failed 
>>> with a database that worked, it ran to completion. I think we can probably 
>>> narrow the problem down to something in the database. 
>>>
>>> On Friday, October 23, 2020 at 1:56:18 AM UTC-4 Emily Kawaler wrote:
>>>
>>>> While those tests are still running, I pulled out all 185 of the 
>>>> proteins that are in the 10OV pepXMLs but not in 01-09OV, figuring that 
>>>> maybe one of those is causing the error. I've uploaded that to the same 
>>>> folder everything else is in (it's called 10OV_uniq.fasta) - I don't see 
>>>> anything that jumps out immediately. (There are no individual characters 
>>>> unique to either the headers or the sequences in 10OV, so I don't think 
>>>> there's an individual character messing things up.)
>>>>
>>>> On Thursday, October 22, 2020 at 3:49:18 PM UTC-4 David Shteynberg 
>>>> wrote:
>>>>
>>>>> I just re extracted that file and I don't see the issue anymore.  
>>>>> Perhaps this was a decompression issue.
>>>>>
>>>>> Thanks for checking.
>>>>>
>>>>> -David
>>>>>
>>>>> On Thu, Oct 22, 2020 at 12:19 PM Emily Kawaler <e.ka...@gmail.com> 
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>> Thanks so much for taking a look! I think the selenocysteines ("U") 
>>>>>> are likely not the problem, since I've got those in all of my databases, 
>>>>>> including the ones that run correctly. I'm looking at 
>>>>>> 03CPTAC_OVprospective_W_PNNL_20161212_B1S3_f13.pepXML and I don't see 
>>>>>> anything odd in line 171821 ("</modification_info>"), so I think our 
>>>>>> line 
>>>>>> numberings might not match up - what does your problematic line contain?
>>>>>>
>>>>>
>>>>>> When I try to run it on my end, it always sticks somewhere in the 
>>>>>> 10CPTAC_OV files. Right now I'm running a working set of spectra with a 
>>>>>> database that didn't work and vice versa, so hopefully that'll help me 
>>>>>> pin 
>>>>>> down whether it's a problem with my spectra or my database - will let 
>>>>>> you 
>>>>>> know how that turns out!
>>>>>>
>>>>>> Emily
>>>>>>
>>>>>> On Thursday, October 22, 2020 at 3:09:29 PM UTC-4 David Shteynberg 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Emily,
>>>>>>>
>>>>>>> I analyzed the search results that you sent and I am seeing some 
>>>>>>> strange things in at least one of the files you gave me.  This may be 
>>>>>>> causing some of the problems you saw.
>>>>>>> In file 03CPTAC_OVprospective_W_PNNL_20161212_B1S3_f13.pepXML on 
>>>>>>> line 171821 there are some strange characters (possibly binary) that 
>>>>>>> are 
>>>>>>> tripping up the TPP.  I think these might be caused by a bug in an 
>>>>>>> analysis 
>>>>>>> tool upstream of the TPP.  Not sure if there are other mistakes of this 
>>>>>>> sort.  Also I found some 'U' amino acids in the database which the TPP 
>>>>>>> complains about having a mass of 0.
>>>>>>>
>>>>>>> I hope this helps you somewhat.  Let me know what you find on 
>>>>>>> your end.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> -David
>>>>>>>
>>>>>>> On Tue, Oct 20, 2020 at 1:42 PM Emily Kawaler <e.ka...@gmail.com> 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Sure! The spectra are from the CPTAC2 ovarian propective dataset, 
>>>>>>>> though I removed all scans that matched to a standard reference 
>>>>>>>> database (I 
>>>>>>>> don't think the scan removal is the issue, since I'm also having this 
>>>>>>>> problem on a different dataset without removing any scans; I also 
>>>>>>>> checked 
>>>>>>>> with xmllint and it looks like the mzML pepXML files are valid). I've 
>>>>>>>> been 
>>>>>>>> running it with the philosopher pipeline, so the pepXML files were 
>>>>>>>> generated with MSFragger as part of that pipeline. The database is a 
>>>>>>>> customized variant database with contaminants and decoys added by 
>>>>>>>> philosopher's database tool. Are there any other specifics you'd like? 
>>>>>>>> I 
>>>>>>>> can upload my full philosopher.yml file if that would be helpful.
>>>>>>>>
>>>>>>>> On Tuesday, October 20, 2020 at 1:30:44 AM UTC-4 David Shteynberg 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Emily,
>>>>>>>>>
>>>>>>>>> I got the data and now I am trying to understand how you are 
>>>>>>>>> running the analysis.  Can you please describe those steps?
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> -David
>>>>>>>>>
>>>>>>>>> On Sat, Oct 17, 2020 at 12:54 PM Emily Kawaler <e.ka...@gmail.com> 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I've uploaded the pepXML files, the parameters I used, and the 
>>>>>>>>>> database here. 
>>>>>>>>>> <https://drive.google.com/drive/folders/1gJoi9fqsmIYg_0tl_2Ur-n04MJyuotyc?usp=sharing>
>>>>>>>>>> Please let me know if I should be uploading anything else! Thank 
>>>>>>>>>> you!
>>>>>>>>>>
>>>>>>>>>> On Saturday, October 17, 2020 at 12:04:21 AM UTC-4 Emily Kawaler 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thank you! I'm working on getting it transferred to Drive, so it 
>>>>>>>>>>> might take a little while, but I'll be in touch!
>>>>>>>>>>>
>>>>>>>>>>> On Tuesday, October 13, 2020 at 3:08:44 PM UTC-4 David 
>>>>>>>>>>> Shteynberg wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello Emily,
>>>>>>>>>>>>
>>>>>>>>>>>> If you are able to share the dataset including the pepXML file 
>>>>>>>>>>>> and the database I can try to replicate the issue here and try to 
>>>>>>>>>>>> troubleshoot the sticking point.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> -David
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 13, 2020 at 11:15 AM Emily Kawaler <
>>>>>>>>>>>> e.ka...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello, and thank you for your response! It doesn't look like 
>>>>>>>>>>>>> the process is using too much memory (I've allocated 300 GB and 
>>>>>>>>>>>>> it's maxing 
>>>>>>>>>>>>> out around 10), and I've kicked up the minprob parameter - it's 
>>>>>>>>>>>>> still 
>>>>>>>>>>>>> getting stuck, unfortunately. 
>>>>>>>>>>>>> Emily
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Friday, October 9, 2020 at 2:24:37 PM UTC-4 Luis wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello Emily,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is not a problem that we have seen much of.  Do you know 
>>>>>>>>>>>>>> which version of ProteinProphet / TPP you are using?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One potential issue is the large number of proteins (and 
>>>>>>>>>>>>>> peptides) that it is trying to process -- can you either monitor 
>>>>>>>>>>>>>> the memory 
>>>>>>>>>>>>>> usage of the machine when you run this dataset, and/or try on 
>>>>>>>>>>>>>> one with more 
>>>>>>>>>>>>>> memory?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hope this helps,
>>>>>>>>>>>>>> --Luis
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 6, 2020 at 6:32 PM Emily Kawaler <
>>>>>>>>>>>>>> e.ka...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello! I've been running ProteinProphet as part of the 
>>>>>>>>>>>>>>> Philosopher pipeline for a while now with no problems. However, 
>>>>>>>>>>>>>>> one of my 
>>>>>>>>>>>>>>> datasets seems to be getting stuck in the middle of this 
>>>>>>>>>>>>>>> function. It 
>>>>>>>>>>>>>>> doesn't throw an error or anything - just stops advancing (the 
>>>>>>>>>>>>>>> last 
>>>>>>>>>>>>>>> line of the output is "Computing degenerate peptides for 
>>>>>>>>>>>>>>> 69919 proteins: 0%...10%...20%...30%...40%...50%"). Has anyone 
>>>>>>>>>>>>>>> run into 
>>>>>>>>>>>>>>> this problem before?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>>>> Google Groups "spctools-discuss" group.
>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails 
>>>>>>>>>>>>>>> from it, send an email to spctools-discu...@googlegroups.com
>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/spctools-discuss/be33a8fb-a6ec-41b6-a988-981161f194fcn%40googlegroups.com
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/spctools-discuss/be33a8fb-a6ec-41b6-a988-981161f194fcn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>>>> Google Groups "spctools-discuss" group.
>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>>>> it, send an email to spctools-discu...@googlegroups.com.
>>>>>>>>>>>>>
>>>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>>>>> https://groups.google.com/d/msgid/spctools-discuss/6d28e150-40f0-4747-a8a3-02630b12379dn%40googlegroups.com
>>>>>>>>>>>>>  
>>>>>>>>>>>>> <https://groups.google.com/d/msgid/spctools-discuss/6d28e150-40f0-4747-a8a3-02630b12379dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>> .
>>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>> Google Groups "spctools-discuss" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>>> send an email to spctools-discu...@googlegroups.com.
>>>>>>>>>>
>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>> https://groups.google.com/d/msgid/spctools-discuss/de634f4a-0057-4fc1-b135-e639c0eb77een%40googlegroups.com
>>>>>>>>>>  
>>>>>>>>>> <https://groups.google.com/d/msgid/spctools-discuss/de634f4a-0057-4fc1-b135-e639c0eb77een%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "spctools-discuss" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>> send an email to spctools-discu...@googlegroups.com.
>>>>>>>>
>>>>>>> To view this discussion on the web visit 
>>>>>>>> https://groups.google.com/d/msgid/spctools-discuss/9c0b1f62-81a7-417b-9e8f-14900f87e134n%40googlegroups.com
>>>>>>>>  
>>>>>>>> <https://groups.google.com/d/msgid/spctools-discuss/9c0b1f62-81a7-417b-9e8f-14900f87e134n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "spctools-discuss" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to spctools-discu...@googlegroups.com.
>>>>>>
>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/spctools-discuss/8a49c6ac-a508-4f34-9369-53d0d6b503afn%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/spctools-discuss/8a49c6ac-a508-4f34-9369-53d0d6b503afn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "spctools-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to spctools-discu...@googlegroups.com.
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/spctools-discuss/57ecf1a8-fe36-4aa8-ba6e-27a526574774n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/spctools-discuss/57ecf1a8-fe36-4aa8-ba6e-27a526574774n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to spctools-discuss+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/spctools-discuss/88c3f21b-9ed1-4d78-8cbc-fc97a93af167n%40googlegroups.com.

Reply via email to