Not a problem! In the end I decided just to remove all of the titins from my database - it shouldn't have a huge effect on my results - and I was indeed able to run all of my datasets to completion. Thanks for all of your help! Emily
On Friday, November 6, 2020 at 8:25:56 PM UTC-5 David Shteynberg wrote: > Hello again Emily, > > Apologies for the delay but I needed a bit more time to look into this. > You are absolutely right about the titins causing this issue. The problem > is the significant overlap in peptides in this very large titin group. > Your database contains 343 variations of titin with different SAAPs, > which share large subsets of the same peptides. Calculating this > enormous protein group is certainly stressing the ProteinProphet > algorithm, forcing it into a higher-order polynomial time complexity > problem. I was looking into the code to see if there was a simple way to > speed it up, but unfortunately this doesn't seem to be the case. Is there > any way you can reduce the number of titin entries in your database? Have > you considered using PEFF? > > Thanks, > -David > > On Sat, Oct 24, 2020 at 10:48 PM Emily Kawaler <e.ka...@gmail.com> wrote: > >> Another update: I've pinpointed a much smaller database that reproduces >> the error when run with just 10OV - uploaded to the same folder as above, >> named "titins_revs.fasta" (it contains a bunch of titins and some reverse >> decoy sequences). Something in the titins is causing this error, I think >> (I've run this set of titins with a few different sets of reverse decoys so >> I don't think it's caused by the decoys). I also think there are a couple >> of other sequences in the database that may have the same effect, but if we >> can figure out what's doing it in this set, it should be easier to know >> what to look for. Any thoughts? > > >> >> On Friday, October 23, 2020 at 3:45:08 PM UTC-4 Emily Kawaler wrote: >> >>> Okay - When I ran the working set of spectra with the database that >>> failed, it seems to have failed; when I ran the set of spectra that failed >>> with a database that worked, it ran to completion. I think we can probably >>> narrow the problem down to something in the database. >>> >>> On Friday, October 23, 2020 at 1:56:18 AM UTC-4 Emily Kawaler wrote: >>> >>>> While those tests are still running, I pulled out all 185 of the >>>> proteins that are in the 10OV pepXMLs but not in 01-09OV, figuring that >>>> maybe one of those is causing the error. I've uploaded that to the same >>>> folder everything else is in (it's called 10OV_uniq.fasta) - I don't see >>>> anything that jumps out immediately. (There are no individual characters >>>> unique to either the headers or the sequences in 10OV, so I don't think >>>> there's an individual character messing things up.) >>>> >>>> On Thursday, October 22, 2020 at 3:49:18 PM UTC-4 David Shteynberg >>>> wrote: >>>> >>>>> I just re extracted that file and I don't see the issue anymore. >>>>> Perhaps this was a decompression issue. >>>>> >>>>> Thanks for checking. >>>>> >>>>> -David >>>>> >>>>> On Thu, Oct 22, 2020 at 12:19 PM Emily Kawaler <e.ka...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hello, >>>>>> Thanks so much for taking a look! I think the selenocysteines ("U") >>>>>> are likely not the problem, since I've got those in all of my databases, >>>>>> including the ones that run correctly. I'm looking at >>>>>> 03CPTAC_OVprospective_W_PNNL_20161212_B1S3_f13.pepXML and I don't see >>>>>> anything odd in line 171821 ("</modification_info>"), so I think our >>>>>> line >>>>>> numberings might not match up - what does your problematic line contain? >>>>>> >>>>> >>>>>> When I try to run it on my end, it always sticks somewhere in the >>>>>> 10CPTAC_OV files. Right now I'm running a working set of spectra with a >>>>>> database that didn't work and vice versa, so hopefully that'll help me >>>>>> pin >>>>>> down whether it's a problem with my spectra or my database - will let >>>>>> you >>>>>> know how that turns out! >>>>>> >>>>>> Emily >>>>>> >>>>>> On Thursday, October 22, 2020 at 3:09:29 PM UTC-4 David Shteynberg >>>>>> wrote: >>>>>> >>>>>>> Hi Emily, >>>>>>> >>>>>>> I analyzed the search results that you sent and I am seeing some >>>>>>> strange things in at least one of the files you gave me. This may be >>>>>>> causing some of the problems you saw. >>>>>>> In file 03CPTAC_OVprospective_W_PNNL_20161212_B1S3_f13.pepXML on >>>>>>> line 171821 there are some strange characters (possibly binary) that >>>>>>> are >>>>>>> tripping up the TPP. I think these might be caused by a bug in an >>>>>>> analysis >>>>>>> tool upstream of the TPP. Not sure if there are other mistakes of this >>>>>>> sort. Also I found some 'U' amino acids in the database which the TPP >>>>>>> complains about having a mass of 0. >>>>>>> >>>>>>> I hope this helps you somewhat. Let me know what you find on >>>>>>> your end. >>>>>>> >>>>>>> Cheers, >>>>>>> -David >>>>>>> >>>>>>> On Tue, Oct 20, 2020 at 1:42 PM Emily Kawaler <e.ka...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Sure! The spectra are from the CPTAC2 ovarian propective dataset, >>>>>>>> though I removed all scans that matched to a standard reference >>>>>>>> database (I >>>>>>>> don't think the scan removal is the issue, since I'm also having this >>>>>>>> problem on a different dataset without removing any scans; I also >>>>>>>> checked >>>>>>>> with xmllint and it looks like the mzML pepXML files are valid). I've >>>>>>>> been >>>>>>>> running it with the philosopher pipeline, so the pepXML files were >>>>>>>> generated with MSFragger as part of that pipeline. The database is a >>>>>>>> customized variant database with contaminants and decoys added by >>>>>>>> philosopher's database tool. Are there any other specifics you'd like? >>>>>>>> I >>>>>>>> can upload my full philosopher.yml file if that would be helpful. >>>>>>>> >>>>>>>> On Tuesday, October 20, 2020 at 1:30:44 AM UTC-4 David Shteynberg >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Emily, >>>>>>>>> >>>>>>>>> I got the data and now I am trying to understand how you are >>>>>>>>> running the analysis. Can you please describe those steps? >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> -David >>>>>>>>> >>>>>>>>> On Sat, Oct 17, 2020 at 12:54 PM Emily Kawaler <e.ka...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I've uploaded the pepXML files, the parameters I used, and the >>>>>>>>>> database here. >>>>>>>>>> <https://drive.google.com/drive/folders/1gJoi9fqsmIYg_0tl_2Ur-n04MJyuotyc?usp=sharing> >>>>>>>>>> Please let me know if I should be uploading anything else! Thank >>>>>>>>>> you! >>>>>>>>>> >>>>>>>>>> On Saturday, October 17, 2020 at 12:04:21 AM UTC-4 Emily Kawaler >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thank you! I'm working on getting it transferred to Drive, so it >>>>>>>>>>> might take a little while, but I'll be in touch! >>>>>>>>>>> >>>>>>>>>>> On Tuesday, October 13, 2020 at 3:08:44 PM UTC-4 David >>>>>>>>>>> Shteynberg wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello Emily, >>>>>>>>>>>> >>>>>>>>>>>> If you are able to share the dataset including the pepXML file >>>>>>>>>>>> and the database I can try to replicate the issue here and try to >>>>>>>>>>>> troubleshoot the sticking point. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> -David >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Oct 13, 2020 at 11:15 AM Emily Kawaler < >>>>>>>>>>>> e.ka...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hello, and thank you for your response! It doesn't look like >>>>>>>>>>>>> the process is using too much memory (I've allocated 300 GB and >>>>>>>>>>>>> it's maxing >>>>>>>>>>>>> out around 10), and I've kicked up the minprob parameter - it's >>>>>>>>>>>>> still >>>>>>>>>>>>> getting stuck, unfortunately. >>>>>>>>>>>>> Emily >>>>>>>>>>>>> >>>>>>>>>>>>> On Friday, October 9, 2020 at 2:24:37 PM UTC-4 Luis wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hello Emily, >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is not a problem that we have seen much of. Do you know >>>>>>>>>>>>>> which version of ProteinProphet / TPP you are using? >>>>>>>>>>>>>> >>>>>>>>>>>>>> One potential issue is the large number of proteins (and >>>>>>>>>>>>>> peptides) that it is trying to process -- can you either monitor >>>>>>>>>>>>>> the memory >>>>>>>>>>>>>> usage of the machine when you run this dataset, and/or try on >>>>>>>>>>>>>> one with more >>>>>>>>>>>>>> memory? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hope this helps, >>>>>>>>>>>>>> --Luis >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Oct 6, 2020 at 6:32 PM Emily Kawaler < >>>>>>>>>>>>>> e.ka...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello! I've been running ProteinProphet as part of the >>>>>>>>>>>>>>> Philosopher pipeline for a while now with no problems. However, >>>>>>>>>>>>>>> one of my >>>>>>>>>>>>>>> datasets seems to be getting stuck in the middle of this >>>>>>>>>>>>>>> function. It >>>>>>>>>>>>>>> doesn't throw an error or anything - just stops advancing (the >>>>>>>>>>>>>>> last >>>>>>>>>>>>>>> line of the output is "Computing degenerate peptides for >>>>>>>>>>>>>>> 69919 proteins: 0%...10%...20%...30%...40%...50%"). Has anyone >>>>>>>>>>>>>>> run into >>>>>>>>>>>>>>> this problem before? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>>>> Google Groups "spctools-discuss" group. >>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails >>>>>>>>>>>>>>> from it, send an email to spctools-discu...@googlegroups.com >>>>>>>>>>>>>>> . >>>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>> https://groups.google.com/d/msgid/spctools-discuss/be33a8fb-a6ec-41b6-a988-981161f194fcn%40googlegroups.com >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/spctools-discuss/be33a8fb-a6ec-41b6-a988-981161f194fcn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>> . >>>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>>>> Google Groups "spctools-discuss" group. >>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>>>> it, send an email to spctools-discu...@googlegroups.com. >>>>>>>>>>>>> >>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>> https://groups.google.com/d/msgid/spctools-discuss/6d28e150-40f0-4747-a8a3-02630b12379dn%40googlegroups.com >>>>>>>>>>>>> >>>>>>>>>>>>> <https://groups.google.com/d/msgid/spctools-discuss/6d28e150-40f0-4747-a8a3-02630b12379dn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>> . >>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "spctools-discuss" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to spctools-discu...@googlegroups.com. >>>>>>>>>> >>>>>>>>> To view this discussion on the web visit >>>>>>>>>> https://groups.google.com/d/msgid/spctools-discuss/de634f4a-0057-4fc1-b135-e639c0eb77een%40googlegroups.com >>>>>>>>>> >>>>>>>>>> <https://groups.google.com/d/msgid/spctools-discuss/de634f4a-0057-4fc1-b135-e639c0eb77een%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>> . >>>>>>>>>> >>>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "spctools-discuss" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to spctools-discu...@googlegroups.com. >>>>>>>> >>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/spctools-discuss/9c0b1f62-81a7-417b-9e8f-14900f87e134n%40googlegroups.com >>>>>>>> >>>>>>>> <https://groups.google.com/d/msgid/spctools-discuss/9c0b1f62-81a7-417b-9e8f-14900f87e134n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "spctools-discuss" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to spctools-discu...@googlegroups.com. >>>>>> >>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/spctools-discuss/8a49c6ac-a508-4f34-9369-53d0d6b503afn%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/spctools-discuss/8a49c6ac-a508-4f34-9369-53d0d6b503afn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> -- >> You received this message because you are subscribed to the Google Groups >> "spctools-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to spctools-discu...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/spctools-discuss/57ecf1a8-fe36-4aa8-ba6e-27a526574774n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/spctools-discuss/57ecf1a8-fe36-4aa8-ba6e-27a526574774n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to spctools-discuss+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/spctools-discuss/88c3f21b-9ed1-4d78-8cbc-fc97a93af167n%40googlegroups.com.