Spyros Charonis wrote: > Dear python community, > > I have a file where I store sequences that each have a header. The > structure of the file is as such: > >>sp|(some code) =>1st header > ATTTTGGCGG > MNKPLOI > ..... > ..... > >>sp|(some code) => 2nd header > AAAAAA > GGGG ... > ......... > > ...... > > I am looking to implement a logical structure that would allow me to group > each of the sequences (spread on multiple lines) into a single string. So > instead of having the letters spread on multiple lines I would be able to > have 'ATTTTGGCGGMNKP....' as a single string that could be indexed. > > This snipped is good for isolating the sequences (=stripping headers and > skipping blank lines) but how could I concatenate each sequence in order > to get one string per sequence? > >>>> for line in align_file: > ... if line.startswith('>sp'): > ... continue > ... elif not line.strip(): > ... continue > ... else: > ... print line > > (... is just OS X terminal notation, nothing programmatic) > > Many thanks in advance.
Instead of printing the line directly collect it in a list (without trailing "\n"). When you encounter a line starting with ">sp" check if that list is non-empty, and if so print "".join(parts), assuming the list is called parts, and start with a fresh list. Don't forget to print any leftover data in the list once the for loop has terminated. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor