John Kuszewski wrote:
Hi John
firstly many thanks for all the useful input! The good news was that I
had most of it right already ;-) but it was really nice to have your
comments to read and assure me that i had everything right especially
the deleting of the reference pdb. It would be rerally nice to have some
of the things in thia note in the eginput directory and possibley a
skeleton calculation that was easy to pickup and use
> Hi,
>
> The best way to get started with marvin is to build off the eginput
> example.
>
> Stuff you'll need to assemble:
>
> 1. a PSF file for your protein, of course.
> 2. a shift table for your protein.
> 3. a NOESY peak location table for your protein.
>
> Regarding the shift table, it doesn't have to be in PIPP format. I
> can also read
> NMR-STAR, although you have to be careful to get the degeneracy codes
> correct.
unfortunatley it turned out that analysis can't output nmrstra with
shifts and peaks (yet) so our friends at the ebi (wim vranken) whipped
up a very basic pipp shift output filter and this was enough along with
the nmrdraw peakoutput filter they have. I did have to write a short
script to get the chemical shift tranges from the nmrdraw peak table as
they have a different format to pipp:
#
# reads the parameters in a nmrDraw peak table and
# returns the center of the selected axis in PPM
# and its width in PPM [clearly this is _NOT_ true!]
#
proc readSpectralRangeFromnmrDrawPeakTable args {
set fname [requiredFlagVal $args -fileName]
set axis [requiredFlagVal $args -axis]
global errorInfo
# open file
if {[catch {set inUnit [open $fname r]}]} {
error "Error opening input file $fname"
}
# eat text until we hit a line that starts with DATA <column name>
set found 0
while {! $found} {
if {[catch {set l [nextLine $inUnit]}]} {
set errorInfo ""
break
}
if {([lindex $l 0] == "DATA") && ([lindex $l 1] == $axis)} {
set found 1
}
}
close $inUnit
if {! $found} {
error "DATA line for axis named $axis not found in file $fname"
}
#
# nmrDraw header format is
# DATA <axisDirection> <axisType> <unknown> <unknown> <maxShift ppm>
<minimum ppm>
#
# obvious the two fields <unknown> <unknown> contain something, but
thats one for frank delaglio!
#
# also i don't seem to have the number of data points in the spectrum...
set specStartPPM [lindex $l 5]
set specEndPPM [lindex $l 6]
if {[string first "ppm" $specStartPPM] == -1} {
error "start shift ($specStartPPM) for axis $axis doesn't appear
to have ppm units... (shift should finish with ppm eg 121.3ppm)"
}
if {[string first "ppm" $specEndPPM] == -1} {
error "end shift ($specEndPPM) for axis $axis doesn't appear to
have ppm units... (shift should finish with ppm eg 15.2ppm)"
}
set specStartPPM [string trimright $specStartPPM "pm"]
set specEndPPM [string trimright $specEndPPM "pm"]
#
# i am not quite clear if these are shifts for the edge of the
spectrum or the first
# and last point so i am not sure if we have a picket frnce error
another one for frank delaglio!
# so maybe: the spectrum actually begins 1/2 of a point before the
first point's frequency.
#
return [list $specStartPPM $specEndPPM]
}
n.b. there seems to be an error in the comments on the function
readSpectralRangeFromPippPeakTable
#
# reads the parameters in a PIPP .PCK table and
# returns the center of the selected axis in PPM
# and its width in PPM
#
it seems to be returning the start and finish of the spectrum instead:
return [list $specStartPPM $specEndPPM]
I have also started to do a couple of other things
1. annotate the tcl files in xplor/tcl with autodoc comments as an aid
to understanding them
2. annotating copys of the initMatch3dC.tcl etc to get my head round
what they do
3. writing a more generic version of initMatch3dC.tcl to try and make it
simpler to plugin most of a new project:
I hope these are of use. Should I post them back for examination and
possible distribution (I am currently just doing these things to help
myself)
package require aeneas
package require pipp
package require marvin
# modified from eginput/marvin/cvn gst Thu May 31 11:18:18 BST 2007
set quickMode [flagExists $argv -quick]
requiredFlagVal $argv -shifts
set shiftList [flagVal $argv -shifts]
requiredFlagVal $argv -peaks
set peakList [flagVal $argv -peaks]
requiredFlagVal $argv -psf
set psfFile [flagVal $argv -psf]
requiredFlagVal $argv -root
set fileRoot [flagVal $argv -root]
set inD2O 1
set peakFromProtonColumn "X"
set peakFromCarbonColumn "Z"
set peakToProtonColumn "Y"
set specFromProtonAxis "X_AXIS"
set specFromHeavyAxis "Z_AXIS"
set specToProtonAxis "Y_AXIS"
set heavyToleranceCoarse 0.75
set protonToleranceCoarse 0.075
set heavyToleranceFine 0.2
set protonToleranceFine 0.02
set solventTolerance 0.05
set diagonalTolerance 0.001
set stripeToleranceProton $protonToleranceFine
set stripeToleranceHeavy $heavyToleranceFine
set foldingFlags {\
-signChangesUponFoldingFromHeavyatomDimension \
-shiftsAliasedAlongFromHeavyatomDimension \
-shiftsAliasedAlongFromProtonDimension \
-shiftsAliasedAlongToProtonDimension \
-nonFoldedPeaksAreNegative
}
puts "parameters"
puts "----------"
puts ""
puts "shifts: $shiftList"
puts "peaks: $peakList"
puts "psf file: $psfFile"
puts "output file root: $fileRoot"
puts "in d2o: $inD2O"
puts ""
puts "peak table column:"
puts "from proton axis: $peakFromProtonColumn"
puts "from carbon axis: $peakFromCarbonColumn"
puts "to proton axis: $peakToProtonColumn"
puts ""
puts "peak table spectrum axes:"
puts "from proton axis: $specFromProtonAxis"
puts "from carbon axis: $specFromHeavyAxis"
puts "to proton axis: $specToProtonAxis"
puts ""
puts "peak width tolerances:"
puts "proton coarse: $protonToleranceCoarse"
puts "heavy coarse: $heavyToleranceCoarse"
puts "proton fine: $protonToleranceFine"
puts "heavy fine: $heavyToleranceFine"
puts ""
puts "stripe tolerances"
puts "proton: $stripeToleranceProton"
puts "heavy: $stripeToleranceHeavy"
puts ""
puts "artifact tolerances"
puts "solvent: $solventTolerance"
puts "diagonal: $diagonalTolerance"
puts ""
puts "folding flags"
puts $foldingFlags
initParamPsfPdb \
-psfFileName $psfFile \
-randomCoords
#
# quick mode will only calculate the structure of the first 36 residues
of cyanovirin
#
if {$quickMode} {
XplorCommand "delete sele (not (resid 1:36)) end"
}
set noe [create_MarvinNOEPotential marv]
set peakRemarks [list]
set saRemarks [list]
#
# read the chemical shift table
#
set shifts [readPippShiftTable -fileName $shiftList -remarksVariableName
saRemarks] ; noOutput
puts "using flexible!!!"
reportUnassignedAtoms \
-shiftList $shifts \
-flexibleRegionSelection [AtomSel -args "(resid 1:2)"] \
-remarksVariableName saRemarks
#removed -flexibleRegionSelection [AtomSel -args "(resid 1:2)"]
#
# this spectrum was taken in D2O. Therefore, don't create to
shiftAssignments
# for exchangeable protons
#
if $inD2O {
puts "in d2o!!!"
set tempString "(name h* and not (name hn or name ht* or (resn thr
and name hg1) or
(resn ser and name hg) or (resn lys and
name hz*) or
(resn tyr and name hh) or (resn arg and
name hh*) or
(resn arg and name he) or (resn asn and
name hd*) or
(resn gln and name he*)))"
}
createShiftAssignments \
-shiftList $shifts \
-pot $noe \
-fromProtonSelectionString "(name h* and bondedto (name c*))" \
-fromHeavyatomSelectionString "(name c*)" \
-toProtonSelectionString $tempString \
-fromProtonDimensionSolventRange [list 4.6 4.8] \
-namePrefix "3dc" \
-remarksVariableName saRemarks
expandStereospecificShiftAssignments \
-shiftAssignments [$noe shiftAssignments] \
-remarksVariableName saRemarks
recordToFromPartnersForShiftAssignments \
-pot $noe \
-remarksVariableName saRemarks
markMethylShiftAssignments \
-shiftAssignments [$noe shiftAssignments] \
-remarksVariableName saRemarks
#
# read the NOE peak table. Make sure that the
# columns are identified correctly
#
process3dCPippPeakTable -fileName $peakList \
-peakIDcolumnName PkID \
-fromProtonColumnName $peakFromProtonColumn \
-fromCarbonColumnName $peakFromCarbonColumn \
-toProtonColumnName $peakToProtonColumn \
-intensityColumnName Intensity \
-pot $noe \
-remarksVariableName peakRemarks
#
# First matching stage, to detect any shift referencing differences
# between the from and to proton dimensions.
#
# Enter the valid spectral range (in PPM) along each dimension.
# Make sure that the correct sign for a non-folded peak is given.
#
puts $errorInfo
puts "shift range [readSpectralRangeFromPippPeakTable -fileName
$peakList -axis $specFromProtonAxis]"
#note use of list command to protect string from commands such as $noe peaks
eval match3d \
-peakList [list [$noe peaks ]] \
-fromProtonTolerancePPM $protonToleranceCoarse \
-fromProtonSpectralRangePPM [list
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis
$specFromProtonAxis]] \
-fromHeavyatomTolerancePPM $heavyToleranceCoarse \
-fromHeavyatomSpectralRangePPM [list
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis
$specFromHeavyAxis]] \
-toProtonTolerancePPM $protonToleranceCoarse \
-toProtonSpectralRangePPM [list
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis
$specToProtonAxis]] \
-pot $noe \
-remarksVariableName peakRemarks \
$foldingFlags
# correct the shift referencing between the proton dimensions,
# correct the shift referencing between the proton dimensions,
# yielding updated peak locations
#
correctShiftReferencing \
-pot $noe \
-correctToProtonDimension \
-remarksVariableName peakRemarks
puts $peakRemarks
puts $errorInfo
#
# clean up the peak list based on global spectral features
#
removeDiagonalPeaks \
-peakList [$noe peaks] \
-pot $noe \
-tolerance $diagonalTolerance \
-remarksVariableName peakRemarks
removeSolventPeaks \
-peakList [$noe peaks] \
-pot $noe \
-autoDetect \
-fromProtonDimension \
-tolerance $solventTolerance \
-remarksVariableName peakRemarks
#
# Second matching stage, to detect differences between the NOE peak
locations
# and the shift table entries, based on peak stripes.
#
# Enter the valid spectral range (in PPM) along each dimension.
# Make sure that the correct sign for a non-folded peak is given.
#
#note use of list command to protect string from commands such as $noe peaks
eval match3d \
-peakList [list [$noe peaks]] \
-fromProtonTolerancePPM $protonToleranceCoarse \
-fromProtonSpectralRangePPM [list
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis
$specFromProtonAxis]] \
-fromHeavyatomTolerancePPM $heavyToleranceCoarse \
-fromHeavyatomSpectralRangePPM [list
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis
$specFromHeavyAxis]] \
-toProtonTolerancePPM $protonToleranceCoarse \
-toProtonSpectralRangePPM [list
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis
$specToProtonAxis]] \
-pot $noe \
-remarksVariableName peakRemarks \
$foldingFlags
puts $errorInfo
#
# correct the shift assignments' chemical shifts by detecting stripes,
# yielding updated shift assignment values
#
stripeCorrection\
-pot $noe \
-stripeGenerationProtonTolerance $stripeToleranceProton \
-stripeGenerationHeavyatomTolerance $stripeToleranceHeavy \
-estimateMissingTargets \
-remarksVariableName peakRemarks \
-geminalFilter \
-toFromFilter \
-toFromShare \
-useBackboneSequential
puts $errorInfo
puts $peakRemarks
#
# Third matching stage, done with tight tolerances
#
# Enter the valid spectral range (in PPM) along each dimension.
# Make sure that the correct sign for a non-folded peak is given.
# rematch
#
#note use of list command to protect string from commands such as $noe peaks
eval match3d \
-peakList [list [$noe peaks]] \
-fromProtonTolerancePPM $protonToleranceFine \
-fromProtonSpectralRangePPM [list
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis
$specFromProtonAxis]] \
-fromHeavyatomTolerancePPM $heavyToleranceFine \
-fromHeavyatomSpectralRangePPM [list
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis
$specFromHeavyAxis]] \
-toProtonTolerancePPM $protonToleranceFine \
-toProtonSpectralRangePPM [list
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis
$specToProtonAxis]] \
-pot $noe \
-remarksVariableName peakRemarks \
$foldingFlags
#
# generate the distance bounds
#
generateDistanceBounds -peakList [$noe peaks]
correctUpBoundsForMethyls -peakList [$noe peaks]
#
# mark good PAs
#
markGoodPeakAssignments \
-peakList [$noe peaks] \
-remarksVariableName peakRemarks \
-violCutoff 0.5
initializePeakAssignmentNumFiltersFailed -pot $noe
currentFilterSummary $noe
netFilter \
-pot $noe \
-intraresidueNeighborhoods \
-extendNeighborhoodsWithHighScoringPeakAssignments \
-failedFiltersCutoff 0 \
-numIterations 5 \
-expectedContactsPerShiftAssignment 10 \
-createInverseExceptions \
-preferIntra \
-resetNumFailedFilters \
-remarksVariableName peakRemarks
puts $errorInfo
currentFilterSummary $noe
primarySequenceDistanceFilter \
-pot $noe \
-intraresidue \
-likelihoodMode \
-remarksVariableName peakRemarks
currentFilterSummary $noe
initializeLikelihoodsFromFilters -pot $noe
createExplicitInverseExceptions -pot $noe -remarksVariableName saRemarks
-failedFiltersCutoff 0
writeExplicitInverseExceptions -pot $noe -filename
${fileRoot}_3dc.exceptions
#
# report statistics. If there is a known structure, you can
# enter it here to generate accuracy statistics in the starting
# NOE file.
#
initialShiftAssignmentAnalysis \
-pot $noe \
-minLikelihood 0.9 \
-remarksVariableName saRemarks
puts $errorInfo
writeShiftAssignments \
-fileName ${fileRoot}_3dc_pass1.shiftAssignments \
-shiftAssignments [$noe shiftAssignments] \
-remarks $saRemarks
puts $saRemarks
initialPeakAnalysis \
-pot $noe\
-violCutoff 0.5 \
-minLikelihood 0.9 \
-remarksVariableName peakRemarks
puts $errorInfo
writeMarvinPeaks \
-fileName ${fileRoot}_3dc_pass1.peaks \
-pot $noe \
-remarks $peakRemarks
puts $peakRemarks
puts $errorInfo
exit
(note this isn't complete and debugged...)
>
> If you want to read NMR-STAR shift tables, add the line "package
> require nmrstar" to the
> top of the initMatch* scripts. Then change the call to
> readPippShiftTable to readNmrStarShiftTable.
>
> In the call to createShiftAssignments, you can set the various
> selections to exclude
> atoms that aren't observable in your particular spectrum (eg.,
> backbone amides, if
> your 3dC was collected in D2O).
>
> Regarding the NOESY peak location table, it also doesn't have to be
> in PIPP format.
> I can read Xeasy and nmrDraw's formats as well. If you want to use
> one of those formats,
> add a "package require xeasy" or "package require nmrdraw" line to
> the top, and change
> the call to process3dCPippPeakTable to process3dCXeasyPeakTable or
> process3dCNmrDrawPeakTable.
>
> You'll have to identify each column in the peak table, using the
> flags to your process*PeakTable call,
> as illustrated in the CVN example.
>
> Then you'll have to set the peak folding/aliasing flags to match your
> spectrum. These flags are repeated
> for each call to match3d. The relevant ones are
>
> -nonFoldedPeaksArePositive or -nonFoldedPeaksAreNegative
> -signChangesUponFoldingFromHeavyatomDimension
> -signChangesUponFoldingFromProtonDimension
> -signChangesUponFoldingToProtonDimension
> -shiftsAliasedAlongFromHeavyatomDimension or -
> shiftsFoldedAlongFromHeavyatomDimension
> -shiftsAliasedAlongFromProtonDimension or -
> shiftsFoldedAlongFromProtonDimension
> -shiftsAliasedAlongToProtonDimension or -
> shiftsFoldedAlongToProtonDimension
>
> The meaning of these flags should be obvious. The last three flags
> are a bit more complex:
>
> -fromProtonSpectralRangePPM [list -1.1 9.5]
> -fromHeavyatomSpectralRangePPM [list 60 120]
> -toProtonSpectralRangePPM [list -1.2 6.0]
>
> (The numbers in the above lines are just examples. ) The list of two
> numbers after each flag define the chemical
> shifts around which the actual aliasings/foldings take place. For
> now, getting them right is left in your hands, unless
> you're using PIPP peak location tables, which include the necessary
> information in their headers. I'm working with
> Frank Delaglio to see if we can build something general-purpose that
> could read that information out of nmrPipe
> spectra, but it's not there yet.
>
> If you don't get the spectral range flag values correct, marvin will
> have a difficult time assigning peaks with folded/aliased
> positions.
>
I think I have this correct! but as explained in the comments to my code
above I don't know if I need to allow for a 'picket fence error of 1/2 a
digital resolution (I hope I don't)
> You'll need to change the names of the output .peaks,
> .shiftAssignments, and .exceptions files to something you like.
>
> And if you have a reference structure, replace the name of
> cvn_reference.pdb with the name of the PDB file for your
> reference structure. If you don't have one (as is usually the case,
> of course), just delete those lines.
>
> Finally, you'll need to update the sa_pass* and summarize_pass*
> files. The necessary changes are all just filename
> changes, which should be obvious.
indeed and I have produced generic versions of these as well
>
> I'm sure you'll run into more questions as you continue. Ask away.
>
> --JK
I am afraid there are more! I am quite intereested in what the graphcs
that are on the top of the cvn_3dc_pass1.peaks and
cvn_3dc_pass1.shiftAssignments are what they tell me and what I should
be looking out for
so here are some questions
1. what exceptions and explicitinverseexceptions
2. for the shiftAssignments Number of peak assignments per
shiftAssignment i take it this is the number of peaks assigned to each
shift
3. for the peak assignments what are 'completeness of targets'
4. for peak are 'Differences between target from proton shifts and
shiftAssignment values' the difference between the shift found in the
peaks for assignments and the shift in the shiftLists
5. what are the good peak assignments at this stage?
6. what are 'Passing SA pair scores'
7. what is the network filter is it network anchoring ala petere guntert?
many thanks
gary
>
>
> On Jun 11, 2007, at 6:19 PM, gary thompson wrote:
>
>> some questions about marvin
>>
>> where should I start? do i need to alter much other than
>>
>> 1. alters the input peak and shifts lists psf files and file name
>> roots
>> 2. remove opening of reference pdb files e.g . delete
>>
>> readPDB -fileName ./cvn_reference.pdb
>>
>> and
>>
>> initialShiftAssignmentAnalysis \
>> -pot $noe \
>> -referenceStructureFile ./cvn_reference.pdb \
>> -minLikelihood 0.9 \
>>
>> etc...
>>
>>
>> also
>> is there any more documentation on whats going on other than the
>> source code, paper, slides on th web and the cvn example?
>>
>>
>> regards and many thanks
>> gary
>>
>> n.b. what sort of hardware were the timings you reported (60
>> processors 12 hours) made on?
>> _______________________________________________
>> Xplor-nih mailing list
>> Xplor-nih at nmr.cit.nih.gov
>> http://dcb.cit.nih.gov/mailman/listinfo/xplor-nih
>
>
> .
>
--
-------------------------------------------------------------------
Dr Gary Thompson
Astbury Centre for Structural Molecular Biology,
University of Leeds, Astbury Building,
Leeds, LS2 9JT, West-Yorkshire, UK Tel. +44-113-3433024
email: garyt at bmb.leeds.ac.uk Fax +44-113-2331407
-------------------------------------------------------------------