On Tue, May 8, 2012 at 4:00 PM, Spyros Charonis <[email protected]> wrote: > Hello python community, > > I'm having a small issue with list indexing. I am extracting certain > information from a PDB (protein information) file and need certain fields of > the file to be copied into a list. The entries look like this: > > ATOM 1512 N VAL A 222 8.544 -7.133 25.697 1.00 48.89 > N > ATOM 1513 CA VAL A 222 8.251 -6.190 24.619 1.00 48.64 > C > ATOM 1514 C VAL A 222 9.528 -5.762 23.898 1.00 48.32 > C > > I am using the following syntax to parse these lines into a list: ... > charged_res_coord.append(atom_coord[i].split()[1:9])
You're using split, assuming that there will be blank spaces between your fields. That's not true, though. PDB is a fixed length record format, according to the documentation I found here: http://www.wwpdb.org/docs.html If you just have a couple of items to pull out, you can just slice the string at the appropriate places. Based on those docs, you could pull the x, y, and z coordinates out like this: x_coord = atom_line[30:38] y_coord = atom_line[38:46] z_coord = atom_line[46:54] If you need to pull more of the data out, or you may want to reuse this code in the future, it might be worth actually parsing the record into all its parts. For a fixed length record, I usually do something like this: pdbdata = """ ATOM 1512 N VAL A 222 8.544 -7.133 25.697 1.00 48.89 N ATOM 1513 CA VAL A 222 8.251 -6.190 24.619 1.00 48.64 C ATOM 1514 C VAL A 222 9.528 -5.762 23.898 1.00 48.32 C ATOM 1617 N GLU A1005 11.906 -2.722 7.994 1.00 44.02 N """.splitlines() atom_field_spec = [ slice(0,6), slice(6,11), slice(12,16), slice(16,18), slice(17,20), slice(21,22), slice(22,26), slice(26,27), slice(30,38), slice(38,46), slice(46,54), slice(54,60), slice(60,66), slice(76,78), slice(78,80), ] for line in pdbdata: if line.startswith('ATOM'): data = [line[field_spec] for field_spec in atom_field_spec] print(data) You can build all kind of fancy data structures on top of that if you want to. You could use that extracted data to build a namedtuple for convenient access to the data by names instead of indexes into a list, or to create instances of a custom class with whatever functionality you need. -- Jerry _______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
