On Jun 15, 2007, at 5:26 AM, Gary S. Thompson wrote:
> John Kuszewski wrote: > > Hi John > > firstly many thanks for all the useful input! The good news was > that I had most of it right already ;-) but it was really nice to > have your comments to read and assure me that i had everything > right especially the deleting of the reference pdb. It would be > rerally nice to have some of the things in thia note in the eginput > directory and possibley a skeleton calculation that was easy to > pickup and use > Hi Gary, The next release will see some significant simplification to the eginput. I've put all of the invariant stuff into a TCL procedure, so all that remains in the initMatch scripts is stuff that's relevant to the user. Should be easier to understand and less error-prone to change. > unfortunatley it turned out that analysis can't output nmrstra with > shifts and peaks (yet) > Exactly what are you using? BTW, very few people seem to use NMR-STAR to hold their peak lists. And there's a lot of variability in the ones that exist in the BioMagResBank. > I did have to write a short script to get the chemical shift > tranges from the nmrdraw peak table as they have a different format > to pipp: > Thanks! I'll test it out (and double-check everything with Frank Delaglio) and, if you don't mind, put it into a future release. > #note use of list command to protect string from commands such as > $noe peaks > eval match3d \ > -peakList [list [$noe peaks]] \ > This isn't necessary. [$noe peaks] returns a TCL list structure. Wrapping it in another list will almost certainly break something. Same thing with wrapping the readSpectralRange calls--those procs also return TCL lists. If you're getting errors, there's probably something else wrong. > I think I have this correct! but as explained in the comments to my > code above I don't know if I need to allow for a 'picket fence > error of 1/2 a digital resolution (I hope I don't) > You do, depending on how your spectra were processed, and depending on exactly what Frank's headers are giving you. The values in the spectral range flags need to be the *exact* shifts about which the folding/aliasing takes place. Remember that in the end, we're going to try to match peak positions to folded shift values to within a rather tight tolerance (0.02 ppm in 1H, 0.2 ppm in heavyatoms), so even small errors in the spectral range will screw things up. > I am afraid there are more! I am quite intereested in what the > graphcs that are on the top of the cvn_3dc_pass1.peaks and > cvn_3dc_pass1.shiftAssignments are what they tell me and what I > should be looking out for > Good for you for looking at those headers to the peaks and shiftAssignments files. I find them quite useful. The graphics are mostly just histograms of various things. Stuff that I keep track of includes: 1. Fraction of unassigned peaks. If your peak list was generated by software peak pickers (nmrDraw, CAPP, etc), it's quite common to see large fractions (50% +) of unassignable peaks, depending upon the settings that were used. Human-picked peak lists tend to be much cleaner, and it's unusual to see more than 15-20% of human peaks unassignable. 2. Number of peak assignments per assigned peak. This is a distillation of the degeneracy histogram that I also print in the .peaks header. 3. Number of long range peaks per residue. If you have a reference structure, the number of good long range peaks per residue is more informative, of course. If you have 2-3 per residue, you should be in good shape. 4. If you have a reference structure, then the fraction of long- range information that's bad is also tremendously informative. Using the network filtering, I can often bring it down to 30-40%. > so here are some questions > All of which, I notice, are related to stuff that's in the manuscript I'm working on now. :-) > > 1. what exceptions and explicitinverseexceptions > In the sa_pass2 script that's used in the CVN example, I include information not only about the peaks that exist, but also about the absence of peaks (which I've been calling "inverse NOEs"). Essentially, if two shiftAssignments are close (< 4 A), they will be pushed apart unless one of two factors applies: If they have an active peakAssignment linking them together (ie., there's a NOESY peak that could be arising from their interaction), or if the two shift assignments have an explicit inverse exception. Explicit inverse exceptions are listed out in the .exceptions file for each spectrum. Explicit inverse exceptions are created for one of two reasons: 1. The peak that would be created by a particular pair of shiftAssignments is expected to be invisible, because it would appear close to the diagonal or the solvent line. 2. The network filter suspects that those shiftAssignments are close together, even if it can't find an actual peak linking them. I realize that using the absence of NOE peaks as structural restraints is a bit dangerous, but I've done it in a fairly error- tolerant way: 1. The repulsive forces use the same linear potential shape as the attractive forces (arising from the peak assignments). So they can't easily overwhelm a bunch of peakAssignments pulling things together. 2. They're only used during pass 2. The final pass of structure calculations doesn't use this information at all. 3. The exceptions generated by the network filter work quite well. > 2. for the shiftAssignments Number of peak assignments per > shiftAssignment i take it this is the number of peaks assigned to > each shift > Close. Just to bring everyone up to speed with the terminology, a Peak is a data structure that represents a peak in a NOESY spectrum. It knows the peak's location, intensity, and so forth. a ShiftAssignment is a data structure that represents an entry in a chemical shift table. It has an atom selection (for the protons, and optionally for the attached heavy atoms), and knows its chemical shift value. a PeakAssignment is a data structure that represents a (potential) match between a Peak and two ShiftAssignments (one for the 'from' dimension, and one for the 'to' dimension). A Peak can have any number of PeakAssignments. PeakAssignments are what are actually used to calculate energy & forces. They are given estimates of their likelihood (as explained in the JACS paper on Marvin), and can be activated/ inactivated during annealing. So the number of peakAssignments per shiftAssignment is different from (and always >=) the number of peaks assigned using each shift assignment, because there are generally several peakAssignments for each peak at the beginning of the structure calculation. > 3. for the peak assignments what are 'completeness of targets' > 4. for peak are 'Differences between target from proton shifts and > shiftAssignment values' the difference between the shift found in > the peaks for assignments and the shift in the shiftLists > These are both related to the stripe correction mechanism I added to Marvin. If I recall correctly, ATNOS does something similar. The idea is to correct the values of the chemical shifts to reflect the actual positions of the peaks in the NOESY (to account for differences that crop up from changes in pH, etc). Briefly, the way it works is by first performing a match between peaks and shifts using a very broad chemical shift tolerance. Then intraresidue peak assignments are pulled out, and the locations of the peaks corresponding to those intraresidue peak assignments are treated as possible target values for an updated chemical shift. Since I begin with a broad-tolerance match, a bunch of bad target values can crop up. I filter them out in a few ways. One of them is by calculating the fraction of that shiftAssignment's residue's intraresidue peaks that would still be assigned if I used that particular target shift with a tight tolerance. > 5. what are the good peak assignments at this stage? > PeakAssignments are flagged as good (look down in the peaks table for -good flags) if they agree with a given reference structure to within a given tolerance (generally 0.5 A). > 6. what are 'Passing SA pair scores' > 7. what is the network filter is it network anchoring ala petere > guntert? > Yes, the network filter is similar to CANDID's network anchoring. As I've hinted above, I also use it to generate inverse exceptions in cases where I suspect a particular pair of shiftAssignments is close, even if I can't find an actual peak linking them together. The histograms of SA pair scores I show summarize the internal network filtering scoring system. > many thanks > gary > You're welcome. --JK
