Hi Farshad,
Thanks for the detailed analysis, but I want to clarify what the function 
you are referring to does and assure everyone it is behaving as intended. 
This is independent of how one might prefer to quantify their proteins, but 
you bring up great discussion points. So let's dig in!

First, the full quote for the the nanogram protein mass estimate 
calculation you are referring to is, "For statistical analysis, we 
converted SIN and dSIN values for each protein to nanogram estimations 
using the RPQ method [ref: https://pubmed.ncbi.nlm.nih.gov/20010810/]. In 
brief, each protein SIN is divided by the sum of all proteins’ SIN and 
multiplied by the protein load in nanograms." There was some key 
information there that was lost without seeing the full quote (like having 
your words taken out of context...). Namely, the nanogram estimation 
method, known as RPQ, is defined in a previous publication. StPeter's job 
is to replicate that function as originally described, which we believe it 
does, and that includes the units of the result. So I'd like to reiterate 
that the StPeter nanogram estimates are not computed incorrectly, but 
instead computed as defined in the publication, and the prior publication 
from which it was derived.

Second, it is good to understand what the RPQ means in terms of actual 
value or accuracy. In the RPQ equation, the sum of all the nanogram values 
should equal the total protein load onto the mass spectrometer. But StPeter 
(or any quantification method) can only quantify what was identified from 
the sample. There are perhaps thousands or more molecules in a sample that 
are never identified during a run, and thus were not included in the sum 
total of proteins quantified. The RPQ results are best described as a 
rescaling of the SIN to a range that resembles nanogram amounts, and are 
undoubtedly overestimates of the actual quantities, perhaps even if you've 
managed to quantify every protein in your sample.

Third, nothing in StPeter performs absolute protein quantification. It is 
all relative to the sample. That is, not necessarily a proteome, and 
changing sample preparation in any way can influence the quantities 
regardless of the sample load or whatever value you may choose to use in 
RPQ.

Hopefully that was clear, but the take home point is the nanogram estimates 
are computed using the published RPQ method, which StPeter has correctly 
replicated from its publication. The results are not necessarily precise 
nanogram estimates, but relative abundances scaled to fall within the total 
[and arbitrary] number of nanograms you wish to see. For most people who 
are uncomfortable with the log2(SIN) scale, this is the alternative they 
use, maybe even unwisely.

Whew, sorry that was so long. On to the discussion points you raise:

1. Yes, I agree completely that SIN/dSIN tries to quantify based on 
molecules, not mass. This is an important distinction, and I'm happy you 
pointed it out to everyone who is reading.

2. Regarding my thoughts, I prefer to quantify using log(SIN) or log(dSIN) 
and not RPQ, as illustrated in several examples in the StPeter publication, 
and believe use of RPQ should be done with caution and calibrated 
appropriately (e.g., with known quantities spiked into the sample, for 
starters).

3. StPeter isn't one quantification algorithm. It is one program with a 
collection of quantification algorithms. It is possible to use spectral 
indexes, or spectral counts, or distributed spectral counts, etc. So even 
if RPQ is offered, there is no obligation to use it. Instead, use what is 
appropriate for your research.

4. I agree, especially after this lengthy response ;) , that we could 
perhaps update StPeter to better clarify what RPQ is and is not. Maybe you 
have additional suggestions? I am thinking at the very least to describe it 
as "nanogram-scale" quantity estimates, but finding a concise way to also 
express that converting molecule counts to mass estimates might undermine 
the analysis. I'm not sure if labeling RPQ and copy numbers is accurate 
either, as actually estimating total copy numbers of any given complex 
mixture is bound to be exceptionally inaccurate.

Cheers,
Mike



On Wednesday, March 20, 2024 at 3:50:48 PM UTC-7 Farshad AbdollahNia wrote:

> Dear TPP developers and community,
>
> I wanted to point out a potential error in how StPeter estimates protein 
> mass (nanograms) in the proteome sample. As described in the paper 
> <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5891225/>, the program 
> normalizes the spectral index, dSI, by the protein length, L, and the total 
> spectral index from the sample, Sum(dSI), as in the formula below: 
> [image: image.png]
> This is correct for estimating the relative copy number, or *mole 
> fraction* (the fraction of the total number of protein molecules), of 
> each protein. However, for nanograms, or *mass fraction* (the fraction of 
> the total proteome mass), the normalization by L should be omitted.
>
> I hope this makes sense. The mass abundance of each protein is 
> proportional to *both* its length and its copy number, therefore, 
> normalization by length should not be performed for mass abundance 
> estimation.
>
> Unfortunately, as the StPeter paper says (and as I have verified in the 
> output), for calculating the nanograms "each protein SIN is divided by 
> the sum of all proteins’ SIN and multiplied by the protein load in 
> nanograms". This is effectively using mole fraction in place of mass 
> fraction, which is incorrect. 
>
> The authors (and other users) may not have noticed this error because it 
> is inconsequential for tracking changes between different 
> samples/conditions. However, it would be significant for consistency with 
> other mass quantitation methods.
>
> To check the consistency, when StPeter's SIN output is correctly used to 
> estimate mass fractions, i.e.  dSIN * L / Sum(dSIN * L) is calculated 
> instead of the above formula, the result is highly correlated with that of 
> spectral counting, as expected, and as you can see in an example below:
>
> [image: StPeter_PSMs_aer.png]
>
>
> The method of mass fraction estimation using spectral counting is already 
> established in the literature, for example in this paper 
> <https://www.embopress.org/doi/full/10.15252/msb.20145697> see the 
> "Absolute protein quantitation" section: "The absolute abundance of a 
> protein was calculated by dividing the total number of spectra of all 
> peptides for that protein by the total number of 14N spectra in the 
> sample." No normalization by protein length is done, because length has to 
> be included in the *mass *abundance of a protein. 
>
> The paper also verifies the consistency of this method with 15N-labeled 
> relative quantitation (see their supplementary figure S9). I have also 
> verified the agreement in my own relative quantitation experiments.
>
> I would be interested in learning your thoughts on this. For obtaining 
> protein mass abundances (or mass fractions), StPeter's "SIn" output (which 
> is log2[dSIN]) is currently usable in the way described above, but the 
> "ng" output needs to be corrected in the source code. Optionally, the 
> current "ng" calculation can also be re-labeled as "copy numbers" given a 
> total load of copy numbers (instead of total nanograms) provided by the 
> user, but that would probably be of less interest than nanograms.
>
> Please let me know what you think.
>
> Thank you,
> Farshad
>

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/spctools-discuss/17050bab-0238-4dfa-989a-80cf54e65afdn%40googlegroups.com.

Reply via email to