Dear drs. Yoo and johnson, Thank you very much for your help. I successully parsed my GO annotation from all 16,000 files. thanks again for your kind help
--- Danny Yoo <[EMAIL PROTECTED]> wrote: > > > >>> for m in mydata.findall('//functions'): > > print m.get('molecular_class').text > > > > >>> for m in mydata.findall('//functions'): > > print m.find('molecular_class').text.strip() > > > > >>> for process in > > mydata.findall('//biological_process'): > > print process.get('title').text > > > Hello, > > I believe we're running into XML namespace issues. > If we look at all the > tag names in the XML, we can see this: > > ###### > >>> from elementtree import ElementTree > >>> tree = ElementTree.parse(open('00004.xml')) > >>> for element in tree.getroot()[0]: print > element.tag > ... > {org:hprd:dtd:hprdr2}title > {org:hprd:dtd:hprdr2}alt_title > {org:hprd:dtd:hprdr2}alt_title > {org:hprd:dtd:hprdr2}alt_title > {org:hprd:dtd:hprdr2}alt_title > {org:hprd:dtd:hprdr2}alt_title > {org:hprd:dtd:hprdr2}omim > {org:hprd:dtd:hprdr2}gene_symbol > {org:hprd:dtd:hprdr2}gene_map_locus > {org:hprd:dtd:hprdr2}seq_entry > {org:hprd:dtd:hprdr2}molecular_weight > {org:hprd:dtd:hprdr2}entry_sequence > {org:hprd:dtd:hprdr2}protein_domain_architecture > {org:hprd:dtd:hprdr2}expressions > {org:hprd:dtd:hprdr2}functions > {org:hprd:dtd:hprdr2}cellular_component > {org:hprd:dtd:hprdr2}interactions > {org:hprd:dtd:hprdr2}EXTERNAL_LINKS > {org:hprd:dtd:hprdr2}author > {org:hprd:dtd:hprdr2}last_updated > ###### > > (I'm just doing a quick view of the toplevel > elements in the tree.) > > As we can see, each element's tag is being prefixed > with the namespace URL > provided in the XML document. If we look in our XML > document and search > for the attribute 'xmlns', we'll see where this > 'org:hprd:dtd:hprdr2' > thing comes from. > > > So we may need to prepend the namespace to get the > proper terms: > > ###### > >>> for process in > tree.find("//{org:hprd:dtd:hprdr2}biological_processes"): > ... print > process.findtext("{org:hprd:dtd:hprdr2}title") > ... > Metabolism > Energy pathways > ###### > > > To tell the truth, I don't quite understand how to > work fluently with XML > namespaces, so perhaps there's an easier way to do > what you want. But the > examples above should help you get started parsing > all your Gene Ontology > annotations. > > > > Good luck! > > Send instant messages to your online friends http://in.messenger.yahoo.com _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor