[Xplor-nih] marvin

John Kuszewski Fri, 15 Jun 2007 14:40:15 -0400

On Jun 15, 2007, at 5:26 AM, Gary S. Thompson wrote:

> John Kuszewski wrote:
>
> Hi John
>
> firstly many thanks for all the useful input! The good news was  
> that I had most of it right already ;-) but it was really nice to  
> have your comments to read and assure me that i had everything  
> right especially the deleting of the reference pdb. It would be  
> rerally nice to have some of the things in thia note in the eginput  
> directory and possibley a skeleton calculation that was easy to  
> pickup and use
>

Hi Gary,

The next release will see some significant simplification to the  
eginput.  I've put all of the invariant stuff into a TCL procedure, so
all that remains in the initMatch scripts is stuff that's relevant to  
the user.  Should be easier to understand and less error-prone to
change.

> unfortunatley it turned out that analysis can't output nmrstra with  
> shifts and peaks  (yet)
>

Exactly what are you using?

BTW, very few people seem to use NMR-STAR to hold their peak lists.   
And there's a lot of variability in the ones
that exist in the BioMagResBank.

>  I did have to write a short script to get the chemical shift  
> tranges from the nmrdraw peak table as they have a different format  
> to pipp:
>

Thanks!  I'll test it out (and double-check everything with Frank  
Delaglio) and, if you don't mind, put it into a future release.

> #note use of list command to protect string from commands such as  
> $noe peaks
> eval match3d \
>    -peakList [list [$noe peaks]] \
>

This isn't necessary.  [$noe peaks] returns a TCL list structure.   
Wrapping it in another list will almost certainly break something.
Same thing with wrapping the readSpectralRange calls--those procs  
also return TCL lists.
If you're getting errors, there's probably something else wrong.

> I think I have this correct! but as explained in the comments to my  
> code above I don't know if I need to allow for a 'picket fence  
> error of 1/2 a digital resolution (I hope I don't)
>

You do, depending on how your spectra were processed, and depending  
on exactly what Frank's headers are giving you.
The values in the spectral range flags need to be the *exact* shifts  
about which the folding/aliasing takes place.   Remember that
in the end, we're going to try to match peak positions to folded  
shift values to within a rather tight tolerance (0.02 ppm in 1H, 0.2  
ppm in heavyatoms),
so even small errors in the spectral range will screw things up.

> I am afraid there are more! I am quite intereested in what the  
> graphcs that are on the top of the cvn_3dc_pass1.peaks and  
> cvn_3dc_pass1.shiftAssignments are what they tell me and what I  
> should be looking out for
>

Good for you for looking at those headers to the peaks and  
shiftAssignments files.  I find them quite useful.

The graphics are mostly just histograms of various things.  Stuff  
that I keep track of includes:

1.  Fraction of unassigned peaks.  If your peak list was generated by  
software peak pickers (nmrDraw, CAPP, etc), it's quite common to see
large fractions (50% +) of unassignable peaks, depending upon the  
settings that were used.  Human-picked peak lists tend to be much
cleaner, and it's unusual to see more than 15-20% of human peaks  
unassignable.

2.  Number of peak assignments per assigned peak.  This is a  
distillation of the degeneracy histogram that I also print in  
the .peaks header.

3.  Number of long range peaks per residue.  If you have a reference  
structure, the number of good long range peaks per residue is more  
informative, of course.
If you have 2-3 per residue, you should be in good shape.

4.  If you have a reference structure, then the fraction of long- 
range information that's bad is also tremendously informative.  Using  
the network
filtering, I can often bring it down to 30-40%.

> so here are some questions
>

All of which, I notice, are related to stuff that's in the manuscript  
I'm working on now.  :-)

>
> 1. what exceptions and explicitinverseexceptions
>

In the sa_pass2 script that's used in the CVN example, I include  
information not only about the peaks that exist, but also about the
absence of peaks (which I've been calling "inverse NOEs").

Essentially, if  two shiftAssignments are close (< 4 A), they will be  
pushed apart unless one of two factors applies:
If they have an active peakAssignment linking them together (ie.,  
there's a NOESY peak that could be arising from their interaction), or
if the two shift assignments have an explicit inverse exception.   
Explicit inverse exceptions are listed out in the .exceptions file  
for each
spectrum.

Explicit inverse exceptions are created for one of two reasons:

1.  The peak that would be created by a particular pair of  
shiftAssignments is expected to be invisible, because it would appear  
close to
the diagonal or the solvent line.

2.  The network filter suspects that those shiftAssignments are close  
together, even if it can't find an actual peak linking them.

I realize that using the absence of NOE peaks as structural  
restraints is a bit dangerous, but I've done it in a fairly error- 
tolerant way:

1.  The repulsive forces use the same linear potential shape as the  
attractive forces (arising from the peak assignments).  So they can't
easily overwhelm a bunch of peakAssignments pulling things together.

2.  They're only used during pass 2.  The final pass of structure  
calculations doesn't use this information at all.

3.  The exceptions generated by the network filter work quite well.

> 2. for the shiftAssignments Number of peak assignments per  
> shiftAssignment  i take it this is the number of peaks assigned to  
> each shift
>

Close.  Just to bring everyone up to speed with the terminology,

a Peak is a data structure that represents a peak in a NOESY  
spectrum.  It knows the peak's location, intensity, and so forth.

a ShiftAssignment is a data structure that represents an entry in a  
chemical shift table.  It has an atom selection (for the protons, and  
optionally
for the attached heavy atoms), and knows its chemical shift value.

a PeakAssignment is a data structure that represents a (potential)  
match between a Peak and two ShiftAssignments (one for the 'from'  
dimension,
and one for the 'to' dimension).  A Peak can have any number of  
PeakAssignments.  PeakAssignments are what are actually used to  
calculate
energy & forces.  They are given estimates of their likelihood (as  
explained in the JACS paper on Marvin), and can be activated/ 
inactivated during
annealing.

So the number of peakAssignments per shiftAssignment is different  
from (and always >=) the number of peaks assigned using each shift
assignment, because there are generally several peakAssignments for  
each peak at the beginning of the structure calculation.

> 3. for the peak assignments what are 'completeness of targets'
> 4. for peak are 'Differences between target from proton shifts and  
> shiftAssignment values' the difference between the shift found in  
> the peaks for assignments and the shift in the shiftLists
>

These are both related to the stripe correction mechanism I added to  
Marvin.  If I recall correctly, ATNOS does something similar.
The idea is to correct the values of the chemical shifts to reflect  
the actual positions of the peaks in the NOESY (to account for  
differences
that crop up from changes in pH, etc).  Briefly, the way it works is  
by first performing a match between peaks and shifts using a very broad
chemical shift tolerance.  Then intraresidue peak assignments are  
pulled out, and the locations of the peaks corresponding to those
intraresidue peak assignments are treated as possible target values  
for an updated chemical shift.  Since I begin with a broad-tolerance
match, a bunch of bad target values can crop up.  I filter them out  
in a few ways.  One of them is by calculating the fraction of that
shiftAssignment's residue's intraresidue peaks that would still be  
assigned if I used that particular target shift with a tight tolerance.

> 5. what are the good peak assignments at this stage?
>

PeakAssignments are flagged as good (look down in the peaks table for  
-good flags) if they agree with a given reference structure
to within a given tolerance (generally 0.5 A).

> 6. what are 'Passing SA pair scores'
> 7. what is the network filter is it network anchoring ala petere  
> guntert?
>

Yes, the network filter is similar to CANDID's network anchoring.  As  
I've hinted above, I also use it to generate inverse exceptions
in cases where I suspect a particular pair of shiftAssignments is  
close, even if I can't find an actual peak linking them together.
The histograms of SA pair scores I show summarize the internal  
network filtering scoring system.

> many thanks
> gary
>

You're welcome.

--JK

[Xplor-nih] marvin

Reply via email to