Deay Henry, I really appreciate all your comments. The IDs for case 3 and 4 were really the 2nd hit. Now all make sense to me. I agree with your points and hope the work will be done well.
Minyoung. On 3월23일, 오후2시01분, Henry Lam <heining...@gmail.com> wrote: > Dear Minyoung, > > Thanks for your message. Did you manage to check the actual IDs for > Cases 3 and 4? I suppose if our X filter is working properly, then > your IDs should be the second hits, not the top hits any more. > > In Case 4, if the top hit is thrown out, then the second hit is the > only hit. In such a special case, deltaCn is set to 1. > > Still, I think the way we are neglecting X-containing peptides need > some improvement. For instance, even if we're reporting the 2nd hit as > the ID in Case 3 (pretending the sequence of the top hit never existed > in the database), the deltaCn should be (xcorr(2) - xcorr(3)) / xcorr > (2). The currently reported 0.0417 is (xcorr(1) - xcorr(3)) / xcorr > (1), which makes no sense if we are throwing out the top hit. > > I will refer this problem to the developer who worked on the X filter, > and together we'll come up with a solution somehow. In any case, I > suppose the X containing peptides are quite rare, so this should not > have a big impact hopefully on your analysis. > > Henry > > On Mar 23, 10:43 am, Minyoung <minyoung....@gmail.com> wrote: > > > > > Hi Henry, > > Sorry for late reply. > > > I used Petunia, and made pepXML with default setting. > > I checked the amino acid X problem, and confirmed your comment. > > In addition, when the best hit has the X, the shtml deltaCn was > > calculated from delow the 3rd hit even though the 2nd hit does not > > contain X. The 3rd and 4th case is the example. > > > case 3 > > #1 e.qgxtdymgads...@ikr.k deltCn=0.0000 > > #2 L.LC*ELLYESEFDSQLW.I deltCn=0.0296 > > #3 a.ekic*eytytdie...@g.k deltCn=0.0417 > > then, deltaCn in shtml is 0.0417. > > > case 4 > > #1 I.LAXXXYEGLKEFZBCB.Z deltCn=0.0000 > > #2 B.ZAQLSLM#QLYLTNKSD.N deltCn=0.3882 > > The out file has only these two hits. > > then, deltaCn in shtml is 1. > > > If the defalut setting ignores X containing peptides, why the 3rd and > > 4th case output deltaCn and skip the 2nd hit? > > > On 3월21일, 오후11시49분, Henry Lam <heining...@gmail.com> wrote: > > > > Dear Minyoung, > > > > I think I found the issue. It does not have anything to do with the > > > calculation of deltaCn values, but the fact that in Minyoung's > > > examples, some of the lower hits have the "amino acid" X in it. The > > > default behavior of the .out to .pep.xml converter is that all X- > > > containing peptides are ignored, as if they are never searched. So > > > your examples all make sense if you pretend that all the X-containing > > > peptides disappear. > > > > For example, in your Case 2, the 2 homologs at the 2nd and 3rd > > > positions both contain an X. They are treated as invisible. So your > > > deltaCn becomes that of 1st - 4th. In your Case 3, your 2nd hit > > > contains an X, so your deltaCn becomes 1st - 3rd. etc. > > > > This behavior is designed to keep those X-containing peptides out of > > > the final result set, since they are often confusing. Whether or not > > > we should also apply this to lower hits (and hence altering the > > > deltaCn behavior) is, I think, is not an easy call. Any comments? > > > > If you want to get the expected behavior back, you need to run Out2XML > > > separately, and specify the -all option at the end. Then you can > > > xinteract normally, starting from the pep.xml files. > > > > Henry > > > > On Mar 21, 12:53 pm, Henry Lam <heining...@gmail.com> wrote: > > > > > Hi Minyoung, > > > > > I have trouble reproducing your results from here. Would you help me > > > > by telling me how exactly you come up with the .out files and .pep.xml > > > > files that show the anomaly? i.e. the exact command you run? Perhaps > > > > even copy and paste the out files and the portion of .pep.xml file for > > > > me? Thanks a lot in advance. > > > > > If you are familiar with how to build TPP from the code base directly, > > > > I can tell you where to get the reverted change right away and see if > > > > it fixes your problem. Or you'll have to wait for the official > > > > release. > > > > > Henry > > > > > On Mar 21, 8:16 am, Jimmy Eng <j...@systemsbiology.org> wrote: > > > > > > I apologize to all who should not care about these esoteric Sequest > > > > > details ... > > > > > > I guess the important point of note is that the previous code did not > > > > > calculate deltaCn value for the 2nd hit nor is there any placeholder > > > > > to > > > > > report that value anywhere in the pepXML. (We can calculate it and > > > > > add in another attribute if it's important.) There's only 1 deltaCn > > > > > value that is between top hit and first non-homologous hit, whether > > > > > that's x2 or xN. There's also a deltacnstar attribute with valid > > > > > values of 0/1 (true/false) to indicate if the deltacn value is between > > > > > top 2 hits or between top hit and something lower down in the list. > > > > > Hope that clarifies things. > > > > > > Henry Lam wrote: > > > > > > Hi Jimmy, > > > > > > > Oh no no. I know what deltaCn means for the top hit. It is the > > > > > > deltaCn > > > > > > of the lower hits I'm changing. The deltaCn of the top hit is what > > > > > > you > > > > > > described, x1-x2 in most cases, and x1-x(the highest non-homologous > > > > > > hit). The code change I made should not change that (or at least I > > > > > > thought). > > > > > > > But what is the deltaCn of the second hit? (I know we don't use the > > > > > > second hit at all in our pipeline, but it doesn't mean people > > > > > > won't.) > > > > > > In the old code, it is x1-x3. In the new code, it is x2-x3. I don't > > > > > > see why that doesn't make more sense. Similarly, the deltaCn of the > > > > > > 3rd hit is x3-x4 in the new code, x1-x4 in the old code. > > > > > > > That said, I was afraid that my code change had some unintended > > > > > > consequence that maybe I failed to see. Let me spend some time > > > > > > figuring this out. > > > > > > > Henry > > > > > > > On Mar 21, 12:21 am, Jimmy Eng <j...@systemsbiology.org> wrote: > > > > > >> Henry, > > > > > > >> I'm not going to have any time in the next week or so to look in > > > > > >> to the > > > > > >> problem. But your interpretation of what deltaCn means is wrong or > > > > > >> rather different than what it is meant to represent. > > > > > > >> The premise for the ad-hoc deltaCn value is to generate some > > > > > >> number to > > > > > >> quantify how different the top hit is from the next best hit. So > > > > > >> deltaCn is always just the normalized xcorr for hit 2 (or hit 3 or > > > > > >> hit > > > > > >> N). For the typical case, it is just the difference between top > > > > > >> hit and > > > > > >> 2nd best hit (i.e. xcorr(2)). When there's homology in the top > > > > > >> hits, > > > > > >> deltaCn was calculated to be the difference between the top hit and > > > > > >> first dis-similar hit. If that is the 3rd peptide then the output > > > > > >> value > > > > > >> should be normalized xcorr(3) and not xcorr(3)-xcorr(2). Hope that > > > > > >> makes sense. If you would like a different interpretation of what > > > > > >> number should go in that field, I guess we should discuss it > > > > > >> offline > > > > > >> including how it impacts PeptideProphet. But until then, I think > > > > > >> you > > > > > >> want to revert the correction you made for the next update release. > > > > > > >> - Jimmy > > > > > > >> Henry Lam wrote: > > > > > >>> Hi Jimmy, > > > > > >>> I made a change recently on SequestOut.cpp to retain the first > > > > > >>> deltaCn > > > > > >>> (regardless of homology) in the deltacnstar field. I also > > > > > >>> corrected > > > > > >>> the deltaCn of the lower hits (e.g. deltaCn of 2nd hit is now > > > > > >>> xcorr(3) > > > > > >>> - xcorr(2) rather than xcorr(3)). I looked at it again today but > > > > > >>> couldn't see why my changes would cause the behavior seen by > > > > > >>> Minyoung. > > > > > >>> Maybe it's unrelated, but perhaps this will point you to > > > > > >>> something: > > > > > >>>http://sashimi.svn.sourceforge.net/viewvc/sashimi/trunk/trans_proteom... > > > > > >>> Henry > > > > > >>> On Mar 19, 5:24 am, Jimmy Eng <j...@systemsbiology.org> wrote: > > > > > >>>> The deltaCn is calculated from the first non-similar peptide > > > > > >>>> compared to > > > > > >>>> the top hit. Similarity is based on sequence homology and the > > > > > >>>> cutoff is > > > > > >>>> 75%. The homology determination is definitely not optimally > > > > > >>>> calculated > > > > > >>>> though but that doesn't explain your problems. > > > > > >>>> Anyways, some of the deltaCn values in your examples below are > > > > > >>>> definitely wrong; the only exception is example 1. > > > > > >>>> Unfortunately I > > > > > >>>> haven't seen that behavior in any of my results. Someone would > > > > > >>>> need to > > > > > >>>> see your files (out and pep.xml) to try to figure out the > > > > > >>>> problem. > > > > > >>>> - Jimmy > > > > > >>>> Minyoung wrote: > > > > > >>>>> Hi. > > > > > >>>>> I wonder why deltaCn values from out file and from > > > > > >>>>> peptideprophet > > > > > >>>>> shtml are different. > > > > > >>>>> I observed the following: > > > > > >>>>> 1. > > > > > >>>>> when the best hit and the second best hit in a out file are very > > > > > >>>>> similar (identical sequence except PTM), > > > > > >>>>> shtml DeltaCn is calculated with reference to the third best > > > > > >>>>> hit. > > > > > >>>>> example> in some out file > > > > > >>>>> #1 P.C*HCCA.P deltCn=0.0000 > > > > > >>>>> #2 P.CHCC*A.P deltCn=0.0046 > > > > > >>>>> #3 R.HC*CCA.E deltCn=0.0558 > > > > > >>>>> then, deltaCn in shtml is 0.0558. > > > > > >>>>> 2. > > > > > >>>>> when the second best and the third best hit are very similar, > > > > > >>>>> shtml DeltaCn is calculated with the next best hit. > > > > > >>>>> example> > > > > > >>>>> #1 r.fqspagtealfe...@isvadsan@YSC*VYVDLKPPFGGSAPSER.L > > > > > >>>>> deltCn=0.0000 > > > > > >>>>> #2 c.eecgkafnqstnltrhkrihtaekpykceecgkafnh...@.l deltCn=0.0028 > > > > > >>>>> #3 c.eecgkafnqstnltrhkrihtaekpykceecgk...@hpxn.l deltCn=0.0220 > > > > > >>>>> #4 Q.KFPKPLPQEYQYFDELSGIPAEDLPYYGGSVEIADYC*PFS.Q deltCn=0.1644 > > > > > >>>>> then, deltaCn in shtml is 0.1644. > > > > > >>>>> 3. > > > > > >>>>> there is no sequence homology from the best hit to a reference > > > > > >>>>> hit, > > > > > >>>>> but shtml DeltaCn is calculated with the reference hit. > > > > > >>>>> example> > > > > > >>>>> #1 e.qgxtdymgads...@ikr.k deltCn=0.0000 > > ... > > 추가 정보 >>- 따온 텍스트 숨기기 - > > - 따온 텍스트 보기 - --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to spctools-discuss@googlegroups.com To unsubscribe from this group, send email to spctools-discuss+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en -~----------~----~----~----~------~----~------~--~---