After I sent that I did find a way to redirect the console output to a file - not pretty but it works:
import sys import os import subprocess so = open("testing.log", 'w', 0) sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0) os.dup2(so.fileno(), sys.stdout.fileno()) And then I can grep the matching chain ID and pipe it out to a file using a subprocess call: subprocess.call('grep -Em1 "to chain" testing.log | cut -d"'" "'" -f6 > grepped.log', shell=True) Agree re tail (and the subprocess.call() above!) being ugly - I need to read some python docs, rather than making frankensteinian shell/python combos. When all you have is a hammer... No rush on fixes - it’s new years eve tomorrow! :) Best, Oliver. > > On Dec 30, 2013, at 8:32 PM, Paul Emsley <pems...@mrc-lmb.cam.ac.uk> wrote: > >> On 30/12/13 22:45, Oliver Clarke wrote: >>> Great! Next scripting question that I’m puzzling over - how to take a set >>> of sequences (present in a single fasta file), in arbitrary order, match >>> and align them to their corresponding homologous subunits in a PDB file, >>> and use align_and_mutate to correct the sequence of each subunit (and >>> delete the gaps). >>> >>> Right now, I’m stuck on the first step - how to take a single sequence from >>> a fasta file and output the matching chain ID. >>> >>> I can use: >>> >>> import subprocess >>> align_to_closest_chain("%s"%(subprocess.check_output(["tail","+2","test.fasta"])).upper(), >>> 0.95) >> >> FWIW, I think that using "tail" is ugly and you should read the contents of >> the fasta file into a list of strings. >> >> def file_name_to_string_list(file_name): >> if os.path.isfile(file_name): >> f = open(file_name, 'r') >> return f.readlines() >> else: >> return [] >> >> Maybe there is a python built-in for this already? :) >> >> Then use (say): >> >> fasta_seqs = file_name_to_string_list('all.fasta') >> align_to_closest_chain(fasta_seqs[2].upper(), 0.95) >> >>> >>> to check for a matching chain, but the output of align_to_closest_chain() >>> is either 0 or 1. >>> >>> The matching chain id is present in the console output, but I haven’t yet >>> figured out how to parse that and do anything useful with it - the console >>> output does not seem be easily accessible via either stdout or stderr (was >>> hoping I could redirect the console output to a file and grep it for the >>> chain id - ugly but I don’t know a better way). >> >> OK, that's a good point. >> >> So the policy is that interesting variables/values should be returned by >> functions. In this case it would be good to know the selected chain-id, but >> you don't get that - so that's a problem. I'll fix that now. In future this >> function will return False or the molecule number and chain-id. >> >>> >>> Also it is not immediately clear to me how the match_fraction parameter >>> relates to sequence identity or alignment score for a given match (The >>> alignment also seems to be case sensitive, which doesn’t seem right, hence >>> the .upper() in the above). >> >> Yes, I think the input sequence for coot is case sensitive (and it should >> not be). >> >> IIRC, the alignment score has be at least as close as (as good as) >> match_fraction, otherwise it is not performed. >> >>> >>> What am i doing wrong - thoughts/suggestions? Would it be better to do the >>> sequence handling steps externally? >> >> I'll fix it and update later. >> >> Paul. >> >> >