> >>> for m in mydata.findall('//functions'):
>       print m.get('molecular_class').text
>
> >>> for m in mydata.findall('//functions'):
>       print m.find('molecular_class').text.strip()
>
> >>> for process in
> mydata.findall('//biological_process'):
>       print process.get('title').text


Hello,

I believe we're running into XML namespace issues.  If we look at all the
tag names in the XML, we can see this:

######
>>> from elementtree import ElementTree
>>> tree = ElementTree.parse(open('00004.xml'))
>>> for element in tree.getroot()[0]: print element.tag
...
{org:hprd:dtd:hprdr2}title
{org:hprd:dtd:hprdr2}alt_title
{org:hprd:dtd:hprdr2}alt_title
{org:hprd:dtd:hprdr2}alt_title
{org:hprd:dtd:hprdr2}alt_title
{org:hprd:dtd:hprdr2}alt_title
{org:hprd:dtd:hprdr2}omim
{org:hprd:dtd:hprdr2}gene_symbol
{org:hprd:dtd:hprdr2}gene_map_locus
{org:hprd:dtd:hprdr2}seq_entry
{org:hprd:dtd:hprdr2}molecular_weight
{org:hprd:dtd:hprdr2}entry_sequence
{org:hprd:dtd:hprdr2}protein_domain_architecture
{org:hprd:dtd:hprdr2}expressions
{org:hprd:dtd:hprdr2}functions
{org:hprd:dtd:hprdr2}cellular_component
{org:hprd:dtd:hprdr2}interactions
{org:hprd:dtd:hprdr2}EXTERNAL_LINKS
{org:hprd:dtd:hprdr2}author
{org:hprd:dtd:hprdr2}last_updated
######

(I'm just doing a quick view of the toplevel elements in the tree.)

As we can see, each element's tag is being prefixed with the namespace URL
provided in the XML document.  If we look in our XML document and search
for the attribute 'xmlns', we'll see where this 'org:hprd:dtd:hprdr2'
thing comes from.


So we may need to prepend the namespace to get the proper terms:

######
>>> for process in tree.find("//{org:hprd:dtd:hprdr2}biological_processes"):
...     print process.findtext("{org:hprd:dtd:hprdr2}title")
...
Metabolism
Energy pathways
######


To tell the truth, I don't quite understand how to work fluently with XML
namespaces, so perhaps there's an easier way to do what you want.  But the
examples above should help you get started parsing all your Gene Ontology
annotations.



Good luck!

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to