hi, 
 I am a new python learner. i am trying to parse a
file using regular expressions. 


LOCUS       NM_005417               4145 bp    mRNA   
linear   PRI 04-DEC-2005
DEFINITION  Homo sapiens v-src sarcoma (Schmidt-Ruppin
A-2) viral oncogene
            homolog (avian) (SRC), transcript variant
1, mRNA.
ACCESSION   NM_005417
VERSION     NM_005417.3  GI:38202215

 CDS             450..2060
                     /gene="SRC"
                     /EC_number="2.7.1.112"
                     /note="v-src avian sarcoma
(Schmidt-Ruppin A-2) viral
                     oncogene homolog; protooncogene
SRC, Rous sarcoma;
                     tyrosine-protein kinase SRC-1;
tyrosine kinase pp60c-src;
                      go_component: ubiquitin ligase
complex [goid 0000151]
                     [evidence NAS] [pmid 14976165];
                     go_function: ATP binding [goid
0005524] [evidence IEA];
                     go_function: nucleotide binding
[goid 0000166] [evidence
                     IEA];
                     go_function: protein binding
[goid 0005515] [evidence IPI]
                     [pmid 15546919];
                     go_function: protein binding
[goid 0005515] [evidence IPI]
                     [pmid 15749833];
                     go_function: transferase activity
[goid 0016740] [evidence
                     IEA];
                     go_function: SH3/SH2 adaptor
activity [goid 0005070]
                     [evidence TAS] [pmid 9020193];
                     go_function: protein-tyrosine
kinase activity [goid
                     0004713] [evidence IEA];
                     go_function: protein-tyrosine
kinase activity [goid
                     0004713] [evidence TAS] [pmid
9020193];
                     go_process: protein kinase
cascade [goid 0007243]
                     [evidence TAS] [pmid 9020193];
                     go_process: signal complex
formation [goid 0007172]
                     [evidence TAS] [pmid 9924018];
                     go_process: protein amino acid
phosphorylation [goid
                     0006468] [evidence IEA]"
                     /codon_start=1
                     /product="proto-oncogene
tyrosine-protein kinase SRC"


I want to pullout LOCUS, go_process, go_function,
go_compartment into columns so that i can fill my
postgres tables. although i know well to some extenet
sql stuff, i am finding it difficult to get these
elements from this raw text. 

an idea that might have worked however, due to
incompetence it did not work. i seek help of this
forum. 


import re
f1  = open('this_raw_text','r')
dat = f1.readlines()
pat1 = re.compile('LOCUS')
pat2 = re.compile('go_component')
pat3 = re.compile('go_process')
pat4 = re.compile('go_function')


for line in dat:
     a = pat1.match(line)
     b = pat2.match(line)
     c = pat3.match(line)
     d = pat4.match(line)
     loc = a.group(1)
     comp =b.group(1)
     proc = c. group(1)
     func = d. group(1)
     print loc+'\t'+func
     print loc+'\t'+comp
     print loc+'\t'+func
     print loc+'\t'+proc
Traceback (most recent call last):
  File "<pyshell#12>", line 6, in ?
    loc = a.group(1)
IndexError: no such group


In the second attempt:
>>>for line in dat:
     a = pat1.match(line)
     b = pat2.match(line)
     c = pat3.match(line)
     d = pat4.match(line)
     print a, b, c, d

     
<_sre.SRE_Match object at 0x00D53330> None None None

rest all lines are NONE. However, b,c,d are there in
the file. 


I do not know if this is the correct procedure. 

Can any one help me please.

thanks a  lot. 
M


                
__________________________________________________________ 
Yahoo! India Matrimony: Find your partner now. Go to http://yahoo.shaadi.com
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to