After I sent that I did find a way to redirect the console output to a file - 
not pretty but it works:

import sys
import os
import subprocess
so = open("testing.log", 'w', 0)
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
os.dup2(so.fileno(), sys.stdout.fileno())

And then I can grep the matching chain ID and pipe it out to a file using a 
subprocess call: subprocess.call('grep -Em1 "to chain" testing.log | cut -d"'" 
"'" -f6 > grepped.log', shell=True) 

Agree re tail (and the subprocess.call() above!) being ugly - I need to read 
some python docs, rather than making frankensteinian shell/python combos. When 
all you have is a hammer...

No rush on fixes - it’s new years eve tomorrow! :)

Best,
Oliver.
> 
> On Dec 30, 2013, at 8:32 PM, Paul Emsley <pems...@mrc-lmb.cam.ac.uk> wrote:
> 
>> On 30/12/13 22:45, Oliver Clarke wrote:
>>> Great! Next scripting question that I’m puzzling over - how to take a set 
>>> of sequences (present in a single fasta file), in arbitrary order, match 
>>> and align them to their corresponding homologous subunits in a PDB file, 
>>> and use align_and_mutate to correct the sequence of each subunit (and 
>>> delete the gaps).
>>> 
>>> Right now, I’m stuck on the first step - how to take a single sequence from 
>>> a fasta file and output the matching chain ID.
>>> 
>>> I can use:
>>> 
>>> import subprocess
>>> align_to_closest_chain("%s"%(subprocess.check_output(["tail","+2","test.fasta"])).upper(),
>>>  0.95)
>> 
>> FWIW, I think that using "tail" is ugly and you should read the contents of 
>> the fasta file into a list of strings.
>> 
>> def file_name_to_string_list(file_name):
>>   if os.path.isfile(file_name):
>>       f = open(file_name, 'r')
>>       return f.readlines()
>>   else:
>>       return []
>> 
>> Maybe there is a python built-in for this already? :)
>> 
>> Then use (say):
>> 
>> fasta_seqs = file_name_to_string_list('all.fasta')
>> align_to_closest_chain(fasta_seqs[2].upper(), 0.95)
>> 
>>> 
>>> to check for a matching chain, but the output of align_to_closest_chain() 
>>> is either 0 or 1.
>>> 
>>> The matching chain id is present in the console output, but I haven’t yet 
>>> figured out how to parse that and do anything useful with it - the console 
>>> output does not seem be easily accessible via either stdout or stderr (was 
>>> hoping I could redirect the console output to a file and grep it for the 
>>> chain id - ugly but I don’t know a better way).
>> 
>> OK, that's a good point.
>> 
>> So the policy is that interesting variables/values should be returned by 
>> functions. In this case it would be good to know the selected chain-id, but 
>> you don't get that - so that's a problem. I'll fix that now.  In future this 
>> function will return False or the molecule number and chain-id.
>> 
>>> 
>>> Also it is not immediately clear to me how the match_fraction parameter 
>>> relates to sequence identity or alignment score for a given match (The 
>>> alignment also seems to be case sensitive, which doesn’t seem right, hence 
>>> the .upper() in the above).
>> 
>> Yes, I think the input sequence for coot is case sensitive (and it should 
>> not be).
>> 
>> IIRC, the alignment score has be at least as close as (as good as) 
>> match_fraction, otherwise it is not performed.
>> 
>>> 
>>> What am i doing wrong - thoughts/suggestions? Would it be better to do the 
>>> sequence handling steps externally?
>> 
>> I'll fix it and update later.
>> 
>> Paul.
>> 
>> 
> 

Reply via email to